Process Data Tasks taking too long to finish - stuck BQ status
Description
PRDE - Bug default text according to the team DoR (Definition of Ready)
01 - PERSON OF CONTACT (PERSON THAT CAN ANSWER QUESTIONS ABOUT THE PROBLEM): Robson Poffo / Furtado / DE
02 - PROBLEM (WHAT'S THE ISSUE?):
We identified some tasks taking time to move on from the BQ Job.
The BQ Job is done, we checked the execution plan, but Carol spends a few hours waiting for it before registering the stale status.
Cases BomFuturo
TaskID: 79c8194595084bb7b9b34b2edfb6ce51
Execution Plan: https://global.carol.ai/bomfuturo/carol-ui/explore/execution-plan/051ed4e0-cf2a-4c99-9737-dea224a9ece8
TaskID: 1c7a54df304143099b66eb677739a9d8
Execution Plan: https://global.carol.ai/bomfuturo/carol-ui/explore/execution-plan/cc60be37-9b12-4a46-a06c-0fe0c8a3d207
We received some reports from clockin complaining about these jobs taking more than usual.
Another scenario from 2024-01-05:
Techfin Case:
Execution plan: https://totvstechfindev.carol.ai/antecipaprotheusunif/carol-ui/explore/execution-plan/7f13b99d-690a-4a07-a389-23bad95e6444
05 - EXPECTED BEHAVIOR (LIST THE EXPECTED BEHAVIORS TO CONSIDER THIS BUG AS DONE):
- Identify the problem why it gets stuck.
- Review the stale period for SQL Tasks. It is expected 60 minutes without updates.
- We identified cases it took 2 hours: https://totvstechfindev.carol.ai/antecipaprotheusunif/carol-ui/tasks/activity/2c9c5f85137c47f29a1c56594d5264ed?p=1&ps=100&sort=dateUpdated&order=DESC&filters=%5B%7B%22hideInternal%22:%22false%22%7D,%7B%22dateCreated%22:%5B%22range%22,%222023-12-13T06:00:00%2B00:00,2023-12-13T21:00:00%2B00:00%22%5D%7D,%7B%22taskType%22:%5B%22BIGQUERY_PROCESS_DATA%22%5D%7D,%7B%22dataModelName%22:%5B%22arinvoiceinstallment%22,%22arinvoicepayments%22%5D%7D%5D
TaskId:2c9c5f85137c47f29a1c56594d5264ed
- We identified cases it took 2 hours: https://totvstechfindev.carol.ai/antecipaprotheusunif/carol-ui/tasks/activity/2c9c5f85137c47f29a1c56594d5264ed?p=1&ps=100&sort=dateUpdated&order=DESC&filters=%5B%7B%22hideInternal%22:%22false%22%7D,%7B%22dateCreated%22:%5B%22range%22,%222023-12-13T06:00:00%2B00:00,2023-12-13T21:00:00%2B00:00%22%5D%7D,%7B%22taskType%22:%5B%22BIGQUERY_PROCESS_DATA%22%5D%7D,%7B%22dataModelName%22:%5B%22arinvoiceinstallment%22,%22arinvoicepayments%22%5D%7D%5D
06 - ACCEPTANCE CRITERIA:
- Optimize the
StaleTaskCheckJob
job so that it prioritizes the deallocation of tasks that are RUNNING and the worker owner is no longer available.
This issue was automatically transitioned to WAITING DEPLOY, as its PR was just merged into master branch in Github.
@Geny Isam Hamud Herrera @Jonathan Willian Moraes @Gabriel DAmore Marciano Regression and manual tests are OK for this branch cc @Renan Schroeder
Github user douglascoimbra has just approved a PR (added as Shard Assignee in this Jira issue).
fix: https://totvslabs.atlassian.net/browse/CAPL-5194#icft=CAPL-5194 Deallocation running tasks taking to long time to finish
SQL PROCESSING IN PARALLEL
POD TERMINATION
TASK UPDATED ON THE DATABASE TO BE ORPHAN AFTER THE WORKER RESTARTED
Github user rfschroeder has just commited and issue was sent back to the REVIEW column.
This issue was automatically transitioned to QA REVIEW, as its PR was just approved in Github.
This issue was automatically transitioned to REVIEW, as its PR (not DRAFT and not WIP) was just created in Github.
fix: https://totvslabs.atlassian.net/browse/CAPL-5194#icft=CAPL-5194 Deallocation running tasks taking to long time to finish
The same is happening here: https://totvslabs.atlassian.net/browse/DAEN-4228
This issue was automatically transitioned to REVIEW, as its PR (not DRAFT and not WIP) was just created in Github.
fix: https://totvslabs.atlassian.net/browse/CAPL-5194#icft=CAPL-5194 Deallocation running tasks taking to long time to finish
The same problem is happening on Painel Protheus: https://totvslabs.atlassian.net/browse/DAEN-4205
@Robson Thanael Poffo ,
@Jonathan Willian Moraes , @Renan Schroeder , @Reinaldo Oliveira Machado Junior , @Renan Schroeder
This issue was planned to be delivered until 2024-02-12. You can check that by consulting the issue in the Due Date field.
Dates already planned for this issue: 2024-01-23, 2024-02-12
If External Issue Link field is filled, customer was also informed on JIRA TOTVS.
@Renan Schroeder The same issue was mentioned by @Bruno Furtado on Slack today: https://totvscarol.slack.com/archives/C03LA7B048G/p1705423327216149 Only one of the 5 tasks listed is still running, it’s been 4 hours until now. The query job has run in 3 seconds.
https://meuposto.carol.ai/lideranca/carol-ui/tasks/activity/29dcbbb568f44cb5a5c15dad11488391
@Robson Thanael Poffo ,
@Gabriel DAmore Marciano , @Renan Schroeder
This issue was planned to be delivered until 2024-01-23. You can check that by consulting the issue in the Due Date field.
Dates already planned for this issue: 2024-01-23
If External Issue Link field is filled, customer was also informed on JIRA TOTVS.
Job finished in 9 sec whoever Carol did not validate this and the task went to stale flow.
JobId: carol-c3488160d07511e881ad:US.051ed4e0-cf2a-4c99-9737-dea224a9ece8