[SQL Processing] Identify different trigger sources of a SQL Processing task
Description
CAPL - Story default text according to the team DoR (Definition of Ready)
01 - PERSON OF CONTACT (PERSON THAT CAN ANSWER QUESTIONS ABOUT THE PROBLEM):
@Renan Schroeder @Geny Isam Hamud Herrera
02 - PROBLEM (WHAT'S THE CURRENT PROBLEM SCENARIO OR PAIN TO BE RESOLVED?):
Today the task entity has a flag indicating if it was triggered by a pipeline scheduler or not (mdmTaskFromSchedule
). This field is a boolean type and there isn't a default value assigned, so every task triggered by Carol Platform has this flag assigned to false
, except when them were triggered by schedule pipeline (ScheduledTask.getTaskToSubmit
).
The orchestrator also triggers SQL processing tasks independently through a HTTP request in endpoints:
- /bigQuery/processQuery
- carolApps/pipelines/process
- carolApps/ {carolAppId}/pipelines/process/{pipelineName}
* tenantApps/pipelines/process
* tenantApps/{carolAppId}/pipelines/process/
{pipelineName}
03 - GOAL (DESCRIBE THE PROPOSED SOLUTION):
We need to identify these different sources that are triggering a SQL Processing task on Carol Platform to enable the control of where the source will be to get the last datetime of successful task.
Maybe if we create a enum on the Task class identifying the source of trigger with a couple possible options:
- TaskTriggerSource (Enum):
- SCHEDULER:
- All SQL Processing tasks scheduled by pipelines in Carol Apps.
- USER:
- All SQL Processing tasks that were triggered directly by user in UI.
- ORCHESTRATOR:
- All SQL Processing tasks that were triggered from /processQuery endpoint by orchestrator application.
- PYCAROL:
- All SQL Processing tasks that were triggered from /processQuery endpoint by
pyCarol
Python library.
- All SQL Processing tasks that were triggered from /processQuery endpoint by
- PLATFORM:
- Any other tasks which aren’t SQL Processing; or
- Any HTTP requests to endpoints mentioned above that do not contains any known User-Agent assigned.
- SCHEDULER:
Example:
If the last execution of a SQL Processing task was from a scheduler pipeline, may the next time of execution of this task can be by orchestrator request, for reasons of architectural deficiency or any other else, and further back to be triggered by scheduler.
The only exception in this rule is when the user wants to trigger the SQL Processing task directly on UI, the control of task execution efficiency should ignore the last successful task execution and proceed to task execution.
Datetime | Trigger Source | Last Trigger Source | checkExistsDataToProcess | Rule |
---|---|---|---|---|
2023-01-11 00:05:00 | SCHEDULER | SCHEDULER | True | Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed |
2023-01-11 00:17:00 | ORCHESTRATOR | SCHEDULER | True | Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed |
2023-01-11 00:18:00 | PLATFORM | ORCHESTRATOR | False | Proceed with task execution without to check if there is data to be processed |
2023-01-11 00:18:30 | PLATFORM | ORCHESTRATOR | True | Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed |
2023-01-11 00:25:00 | SCHEDULER | PLATFORM | True | Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed |
2023-01-11 00:27:00 | PYCAROL | SCHEDULER | False | Proceed with task execution without to check if there is data to be processed |
2023-01-11 00:28:00 | PYCAROL | PYCAROL | True | Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed |
04 - WHO CAN USE THIS FEATURE (USER ROLES):
05 - ASSETS (FIGMA LINKS, RELEVANT DOCUMENTATION LINKS, JSON EXAMPLES, ETC):
06 - ACCEPTANCE CRITERIA:
- Every task should be assigned with one of sources available in TaskTriggerSource. The default value for any task should be “PLATFORM", but in SQL processing tasks it shouldn’t be the default value, only ("PYCAROL“, "ORCHESTRATOR“, "SCHEDULER“ or "USER“).
- Every request over any endpoints mentioned above, that can create SQL processing tasks, should create tasks with source originating from (USER / ORCHESTRATOR / PYCAROL), nothin else.
- It will depend of the User-Agent sent in header of requests.
- Every schedule task should be originated only with SCHEDULER, unless the user trigger the reprocess task from UI (so it must be USER).
- All other tasks provided from Platform should be originated with PLATFORM.