Process Data Tasks taking too long to finish - stuck BQ status

Description

PRDE - Bug default text according to the team DoR (Definition of Ready)

01 - PERSON OF CONTACT (PERSON THAT CAN ANSWER QUESTIONS ABOUT THE PROBLEM): Robson Poffo / Furtado / DE

02 - PROBLEM (WHAT'S THE ISSUE?):

We identified some tasks taking time to move on from the BQ Job.

The BQ Job is done, we checked the execution plan, but Carol spends a few hours waiting for it before registering the stale status.

Cases BomFuturo

Task: https://global.carol.ai/bomfuturo/carol-ui/tasks/activity/79c8194595084bb7b9b34b2edfb6ce51?p=1&ps=25&sort=dateCreated&order=DESC&filters=%5B%7B%22term%22:%2279c819%22%7D,%7B%22hideInternal%22:%22false%22%7D,%7B%22taskType%22:%5B%22BIGQUERY_PROCESS_DATA%22%5D%7D,%7B%22dataModelName%22:%5B%22clockinrecords%22%5D%7D%5D

TaskID: 79c8194595084bb7b9b34b2edfb6ce51

Execution Plan: https://global.carol.ai/bomfuturo/carol-ui/explore/execution-plan/051ed4e0-cf2a-4c99-9737-dea224a9ece8

Task: https://global.carol.ai/bomfuturo/carol-ui/tasks/activity/1c7a54df304143099b66eb677739a9d8?p=1&ps=25&sort=dateCreated&order=DESC&filters=%5B%7B%22term%22:%2279c819%22%7D,%7B%22hideInternal%22:%22false%22%7D,%7B%22taskType%22:%5B%22BIGQUERY_PROCESS_DATA%22%5D%7D,%7B%22dataModelName%22:%5B%22clockinrecords%22%5D%7D%5D

TaskID: 1c7a54df304143099b66eb677739a9d8

Execution Plan: https://global.carol.ai/bomfuturo/carol-ui/explore/execution-plan/cc60be37-9b12-4a46-a06c-0fe0c8a3d207

We received some reports from clockin complaining about these jobs taking more than usual.


Another scenario from 2024-01-05:

Task: https://global.carol.ai/bomfuturo/carol-ui/tasks/activity/06076b09bdb245cf930a94c2a44ab3de?p=1&ps=25&sort=dateCreated&order=DESC&filters=%5B%7B%22hideInternal%22:%22false%22%7D,%7B%22taskType%22:%5B%22BIGQUERY_PROCESS_DATA%22%5D%7D,%7B%22dataModelName%22:%5B%22clockinrecords%22%5D%7D%5D

BQ Job: https://global.carol.ai/bomfuturo/carol-ui/explore/execution-plan/40e9ff8b-f7a1-4ea7-a934-f844fd3105bd


Techfin Case:

Task: https://totvstechfindev.carol.ai/antecipaprotheusunif/carol-ui/tasks/activity/2c9c5f85137c47f29a1c56594d5264ed?p=1&ps=100&sort=dateUpdated&order=DESC&filters=%5B%7B%22hideInternal%22:%22false%22%7D,%7B%22dateCreated%22:%5B%22range%22,%222023-12-13T06:00:00%2B00:00,2023-12-13T21:00:00%2B00:00%22%5D%7D,%7B%22taskType%22:%5B%22BIGQUERY_PROCESS_DATA%22%5D%7D,%7B%22dataModelName%22:%5B%22arinvoiceinstallment%22,%22arinvoicepayments%22%5D%7D%5D

Execution plan: https://totvstechfindev.carol.ai/antecipaprotheusunif/carol-ui/explore/execution-plan/7f13b99d-690a-4a07-a389-23bad95e6444

05 - EXPECTED BEHAVIOR (LIST THE EXPECTED BEHAVIORS TO CONSIDER THIS BUG AS DONE):

06 - ACCEPTANCE CRITERIA:

  • Optimize the StaleTaskCheckJob job so that it prioritizes the deallocation of tasks that are RUNNING and the worker owner is no longer available.

Activity

Automation for Jira 2 February 2024, 14:58 Jira Internal Users

This issue was automatically transitioned to WAITING DEPLOY, as its PR was just merged into master branch in Github.

Douglas Coimbra Lopes 1 February 2024, 22:19 Jira Internal Users

@Geny Isam Hamud Herrera @Jonathan Willian Moraes @Gabriel DAmore Marciano Regression and manual tests are OK for this branch cc @Renan Schroeder

Automation for Jira 1 February 2024, 22:19 Jira Internal Users

Github user douglascoimbra has just approved a PR (added as Shard Assignee in this Jira issue).

fix: https://totvslabs.atlassian.net/browse/CAPL-5194#icft=CAPL-5194 Deallocation running tasks taking to long time to finish

Douglas Coimbra Lopes 1 February 2024, 17:27 Jira Internal Users

SQL PROCESSING IN PARALLEL

POD TERMINATION

TASK UPDATED ON THE DATABASE TO BE ORPHAN AFTER THE WORKER RESTARTED

Automation for Jira 1 February 2024, 14:24 Jira Internal Users

Github user rfschroeder has just commited and issue was sent back to the REVIEW column.

Automation for Jira 31 January 2024, 22:05 Jira Internal Users

This issue was automatically transitioned to QA REVIEW, as its PR was just approved in Github.

Automation for Jira 29 January 2024, 21:32 Jira Internal Users

This issue was automatically transitioned to REVIEW, as its PR (not DRAFT and not WIP) was just created in Github.

fix: https://totvslabs.atlassian.net/browse/CAPL-5194#icft=CAPL-5194 Deallocation running tasks taking to long time to finish

Bruno Furtado 26 January 2024, 10:19 Jira Internal Users
Automation for Jira 26 January 2024, 02:11 Jira Internal Users

This issue was automatically transitioned to REVIEW, as its PR (not DRAFT and not WIP) was just created in Github.

fix: https://totvslabs.atlassian.net/browse/CAPL-5194#icft=CAPL-5194 Deallocation running tasks taking to long time to finish

Bruno Furtado 23 January 2024, 15:29 Jira Internal Users

The same problem is happening on Painel Protheus: https://totvslabs.atlassian.net/browse/DAEN-4205

Automation for Jira 22 January 2024, 20:01 Jira Internal Users

@Robson Thanael Poffo ,
@Jonathan Willian Moraes , @Renan Schroeder , @Reinaldo Oliveira Machado Junior , @Renan Schroeder

This issue was planned to be delivered until 2024-02-12. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-01-23, 2024-02-12

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

Cindy de Araujo Soares Moore 16 January 2024, 18:52 Jira Internal Users

@Renan Schroeder The same issue was mentioned by @Bruno Furtado on Slack today: https://totvscarol.slack.com/archives/C03LA7B048G/p1705423327216149 Only one of the 5 tasks listed is still running, it’s been 4 hours until now. The query job has run in 3 seconds.

https://meuposto.carol.ai/lideranca/carol-ui/tasks/activity/29dcbbb568f44cb5a5c15dad11488391

Automation for Jira 15 January 2024, 15:34 Jira Internal Users

@Robson Thanael Poffo ,
@Gabriel DAmore Marciano , @Renan Schroeder

This issue was planned to be delivered until 2024-01-23. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-01-23

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

Geny Isam Hamud Herrera 18 December 2023, 19:59 Jira Internal Users

Job finished in 9 sec whoever Carol did not validate this and the task went to stale flow.


JobId: carol-c3488160d07511e881ad:US.051ed4e0-cf2a-4c99-9737-dea224a9ece8