[SQL Processing] Identify different trigger sources of a SQL Processing task

Description

CAPL - Story default text according to the team DoR (Definition of Ready)

01 - PERSON OF CONTACT (PERSON THAT CAN ANSWER QUESTIONS ABOUT THE PROBLEM):

@Renan Schroeder @Geny Isam Hamud Herrera

02 - PROBLEM (WHAT'S THE CURRENT PROBLEM SCENARIO OR PAIN TO BE RESOLVED?):

Today the task entity has a flag indicating if it was triggered by a pipeline scheduler or not (mdmTaskFromSchedule). This field is a boolean type and there isn't a default value assigned, so every task triggered by Carol Platform has this flag assigned to false, except when them were triggered by schedule pipeline (ScheduledTask.getTaskToSubmit).

The orchestrator also triggers SQL processing tasks independently through a HTTP request in endpoints:

/bigQuery/processQuery
carolApps/pipelines/process
carolApps/ {carolAppId}/pipelines/process/{pipelineName}
* tenantApps/pipelines/process
* tenantApps/{carolAppId}
/pipelines/process/
{pipelineName}

03 - GOAL (DESCRIBE THE PROPOSED SOLUTION):

We need to identify these different sources that are triggering a SQL Processing task on Carol Platform to enable the control of where the source will be to get the last datetime of successful task.

Maybe if we create a enum on the Task class identifying the source of trigger with a couple possible options:

TaskTriggerSource (Enum):
- SCHEDULER:
  - All SQL Processing tasks scheduled by pipelines in Carol Apps.
- USER:
  - All SQL Processing tasks that were triggered directly by user in UI.
- ORCHESTRATOR:
  - All SQL Processing tasks that were triggered from /processQuery endpoint by orchestrator application.
- PYCAROL:
  - All SQL Processing tasks that were triggered from /processQuery endpoint by pyCarol Python library.
- PLATFORM:
  - Any other tasks which aren’t SQL Processing; or
  - Any HTTP requests to endpoints mentioned above that do not contains any known User-Agent assigned.

Example:

If the last execution of a SQL Processing task was from a scheduler pipeline, may the next time of execution of this task can be by orchestrator request, for reasons of architectural deficiency or any other else, and further back to be triggered by scheduler.

The only exception in this rule is when the user wants to trigger the SQL Processing task directly on UI, the control of task execution efficiency should ignore the last successful task execution and proceed to task execution.

Datetime	Trigger Source	Last Trigger Source	checkExistsDataToProcess	Rule
2023-01-11 00:05:00	SCHEDULER	SCHEDULER	True	Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed
2023-01-11 00:17:00	ORCHESTRATOR	SCHEDULER	True	Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed
2023-01-11 00:18:00	PLATFORM	ORCHESTRATOR	False	Proceed with task execution without to check if there is data to be processed
2023-01-11 00:18:30	PLATFORM	ORCHESTRATOR	True	Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed
2023-01-11 00:25:00	SCHEDULER	PLATFORM	True	Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed
2023-01-11 00:27:00	PYCAROL	SCHEDULER	False	Proceed with task execution without to check if there is data to be processed
2023-01-11 00:28:00	PYCAROL	PYCAROL	True	Get last successful datetime from task WHERE task_trigger_source = 'SCHEDULER/ORCHESTRATOR' to check if there is data to be processed

Delete this comment?

Delete this attachment?

[SQL Processing] Identify different trigger sources of a SQL Processing task

Description

Attachments

Details

Priority

Assignee

Reporter

Labels

Due Date

Fix versions

Components

Created

Updated

More fields

Original estimate

Time tracking

Affects versions

Rest of custom fields