Task BIGQUERY_PROCESS_DATA missing property temporaryTableName and records processed
Description
PRDE - Bug default text according to the team DoR (Definition of Ready)
01 - PERSON OF CONTACT (PERSON THAT CAN ANSWER QUESTIONS ABOUT THE PROBLEM):
- Andre / Robson Poffo
02 - PROBLEM (WHAT'S THE ISSUE?):
- Tasks processing data through SQL Processing have two internal fields empty, when they are expected to have values:
temporaryTableName
- processedRecords ( I saw called sometimes as createdRecords):
- The internal concurrency isn’t allowing multiple updates for the same table record.
More examples:
For SQL processing created through orquestrator, it is missing:
- Unif SQL tasks also missing the link:
Another tenant without temporary table name: https://totvstechfin.carol.ai/tenant88889a28b7a211eab6fd0a58646001d7/carol-ui/tasks/activity/9cc46b6b4bc7431b9609376855f07da2?p=1&ps=25&sort=dateUpdated&order=DESC&filters=%5B%7B%22hideInternal%22:%22false%22%7D%5D
Current scenario:
select tbNameStatus, count(*)
from (
select
(case
when tbName is null then 'empty'
when tbName = '' then 'empty'
else 'defined'
end) as tbNameStatus
from (
select trim(data ->> 'temporaryTableName') as tbName, id
from task ta
where ta.task_type in ('BIGQUERY_PROCESS_DATA')
and ta.created BETWEEN (now() at time zone 'utc') - INTERVAL '3 DAY' AND (now() at time zone 'utc')
) ta
) t
group by 1
2023-11-10: Updated cases:
select task.id, JSON_VALUE(task.data, '$.processedRecords') processedRecords, task.* except (id), taskLog.logmessage
from `labs-app-mdm-production.mdm_tasks.mdm_tasks` task
inner join (
select *
FROM (SELECT jsonPayload.tasklog.* FROM `labs-app-mdm-production.mdm_tasks.stdout` where timestamp > timestamp_add(current_timestamp(), interval -2 DAY))
where logmessage like '%Number of records processed:%'
) taskLog on taskLog.taskid = task.id
where capturedatetime = (select max(capturedatetime) from `labs-app-mdm-production.mdm_tasks.mdm_tasks`)
and task_type = 'BIGQUERY_PROCESS_DATA'
and (JSON_VALUE(task.data, '$.processedRecords') is null or cast((JSON_VALUE(task.data, '$.processedRecords')) as int64) < 0)
order by created
limit 1000
03 - STEPS TO REPRODUCE (STEP (1...N), VIDEO, SCREENSHOTS, LOGS FOLDER, HEARTBEAT, ETC. – IF IS NOT POSSIBLE TO REPRODUCE EXPLAIN THE REASON):
04 - LINKS (ADD A LINK TO THE BUG OR TO THE TENANT):
05 - EXPECTED BEHAVIOR (LIST THE EXPECTED BEHAVIORS TO CONSIDER THIS BUG AS DONE):
- we expect that all BIGQUERY_PROCESS_DATA task with Golden Records created, also have a
temporaryTableName
and data->>createdRecords