Ingestion is accepting data ingestion when BQ is on provisioning flow causing data loss
Description
Issue Component(s) on Oct 25, 2023, 12:13:10 AM: CONNECTOR | DATA_INGESTION
PRDE - Bug default text according to the team DoR (Definition of Ready)
01 - PERSON OF CONTACT (PERSON THAT CAN ANSWER QUESTIONS ABOUT THE PROBLEM):
02 - PROBLEM (WHAT'S THE ISSUE?):
Detected that happened data loss when a tenant was running the provisioning flow and data was sent to this tenant.
Query to get the missing records:
select createdReportDateTime, clockinsStruct.*,
from (select * from `labs-app-mdm-production.a_poffo.divergences_clockinrecords` where createdReportDateTime = "2023-10-24 23:34:57.112457 UTC") c
left join `labs-app-mdm-production.clockin.tenants` t on t.tenantId = c.tenantId
where timestamp(clockinsStruct.cloockinDateTime) < "2023-10-24" and clockinsStruct.cloockinDateTime > "2023-10-23"
Report: https://docs.google.com/spreadsheets/d/16V5cH3MxLxh2uEoXGnT9tt1CXqRuCwCZ-IX8bghZwEA/edit?usp=sharing
Task provisioning BQ: https://ksi.carol.ai/ksi/carol-ui/tasks/activity/d5d65a2da88b4216a6c58acf684d7634?p=3&ps=100&sort=dateUpdated&order=DESC&filters=%5B%7B%22hideInternal%22:%22true%22%7D%5D
Table resulted as INACTIVE, but it is accepting ingestion of new data:
After the investigation, that was sent to carol again (bqInsertFlow
):
03 - STEPS TO REPRODUCE (STEP (1...N), VIDEO, SCREENSHOTS, LOGS FOLDER, HEARTBEAT, ETC. – IF IS NOT POSSIBLE TO REPRODUCE EXPLAIN THE REASON):
04 - LINKS (ADD A LINK TO THE BUG OR TO THE TENANT):
05 - EXPECTED BEHAVIOR (LIST THE EXPECTED BEHAVIORS TO CONSIDER THIS BUG AS DONE):
Do not accept data ingestion during the BQ provisioning flowReturn an error message indicating the tenant is on BQ provisioning flow (400).
- @Geny Isam Hamud Herrera
Related to the AC above, I believe it came from Product as the AC. However, those are the hot topic we have created for this card after investigation:
- The provisioning Task, after it has called the Data API, is only a
skeleton
. In other words, it must not be vinculated to a worker. This way, if a worker dies, it shouldn’t affect the task already running. - When calling the DATA API endpoint for provisioning, if the return is with status 409, we must not fail the task, but only log on the task saying that there is already a provisioning running and no colateral effect should happen (like changing the staging/dm status or tenant status).
- Log on the task when we have changes on the
task_status
of the Task. - Log on the task when we have changes on the Tenant field
mdmBigQueryStatusType
- The Tenant field
mdmBigQueryStatusType
should be controlled by DATA Team. The exception is the valuePROVISIONING
where the Platform put this status when we call the DATA API to start the process.