Ingestion is accepting data ingestion when BQ is on provisioning flow causing data loss

Description

Issue Component(s) on Oct 25, 2023, 12:13:10 AM: CONNECTOR | DATA_INGESTION

PRDE - Bug default text according to the team DoR (Definition of Ready)

01 - PERSON OF CONTACT (PERSON THAT CAN ANSWER QUESTIONS ABOUT THE PROBLEM):
02 - PROBLEM (WHAT'S THE ISSUE?):

Detected that happened data loss when a tenant was running the provisioning flow and data was sent to this tenant.

Query to get the missing records:

select createdReportDateTime, clockinsStruct.*,
  from (select * from `labs-app-mdm-production.a_poffo.divergences_clockinrecords` where createdReportDateTime	 = "2023-10-24 23:34:57.112457 UTC") c
  left join `labs-app-mdm-production.clockin.tenants` t on t.tenantId = c.tenantId
  where timestamp(clockinsStruct.cloockinDateTime) < "2023-10-24" and clockinsStruct.cloockinDateTime > "2023-10-23"

Report: https://docs.google.com/spreadsheets/d/16V5cH3MxLxh2uEoXGnT9tt1CXqRuCwCZ-IX8bghZwEA/edit?usp=sharing

Task provisioning BQ: https://ksi.carol.ai/ksi/carol-ui/tasks/activity/d5d65a2da88b4216a6c58acf684d7634?p=3&ps=100&sort=dateUpdated&order=DESC&filters=%5B%7B%22hideInternal%22:%22true%22%7D%5D

Table resulted as INACTIVE, but it is accepting ingestion of new data:

After the investigation, that was sent to carol again (bqInsertFlow):

03 - STEPS TO REPRODUCE (STEP (1...N), VIDEO, SCREENSHOTS, LOGS FOLDER, HEARTBEAT, ETC. – IF IS NOT POSSIBLE TO REPRODUCE EXPLAIN THE REASON):
04 - LINKS (ADD A LINK TO THE BUG OR TO THE TENANT):
05 - EXPECTED BEHAVIOR (LIST THE EXPECTED BEHAVIORS TO CONSIDER THIS BUG AS DONE):

  • Do not accept data ingestion during the BQ provisioning flow
    • Return an error message indicating the tenant is on BQ provisioning flow (400).
  • @Geny Isam Hamud Herrera Related to the AC above, I believe it came from Product as the AC. However, those are the hot topic we have created for this card after investigation:
  • The provisioning Task, after it has called the Data API, is only a skeleton . In other words, it must not be vinculated to a worker. This way, if a worker dies, it shouldn’t affect the task already running.
  • When calling the DATA API endpoint for provisioning, if the return is with status 409, we must not fail the task, but only log on the task saying that there is already a provisioning running and no colateral effect should happen (like changing the staging/dm status or tenant status).
  • Log on the task when we have changes on the task_status of the Task.
  • Log on the task when we have changes on the Tenant field mdmBigQueryStatusType
  • The Tenant field mdmBigQueryStatusType should be controlled by DATA Team. The exception is the value PROVISIONING where the Platform put this status when we call the DATA API to start the process.