[SQL Efficiency] Pipeline (repository) from customer tenant don't syncronize when unif tenant do a checkout

Description

PRDE - Bug default text according to the team DoR (Definition of Ready)

01 - PERSON OF CONTACT (PERSON THAT CAN ANSWER QUESTIONS ABOUT THE PROBLEM):

@Renan Schroeder @André Pereira de Oliveira

02 - PROBLEM (WHAT'S THE ISSUE?):

Today, the generic SQL efficiency saves the pipeline for each tenant in DB (Postgres), even the customer tenant. It is being saved to avoid getting the respective pipeline from unified on ES every time through the Tenant App (installed on the unified tenant). The main problem is that we are not updating the pipeline entity for customer tenants, even after pipeline checkouts on unified tenants.

The risk is that some requests that come from /processQuery (via orchestrator) try to check if there is new data to be processed and the staging table list stored in the Postgres database for the customer pipeline is out of date compared with some newer version from the same pipeline reference on the unified tenant (that is, original).

03 - STEPS TO REPRODUCE (STEP (1...N), VIDEO, SCREENSHOTS, LOGS FOLDER, HEARTBEAT, ETC. – IF IS NOT POSSIBLE TO REPRODUCE EXPLAIN THE REASON):

  • Create a UNified Tenant
  • Create a Customer Tenant
  • Create a Carol App
  • Attach the Carol app to the unified app
  • Link the Carol app to a git repo
  • Release the app for approval
  • Approve the carol app
  • Install the Carol app on Unified and Customer
  • Run the SQL processing by Orchestrator once
  • From git pipeline add a new staging table
  • From Unified tenant, perform a checkout
  • Run a Second ORCHESTRATOR SQL Processing now with the new table

04 - LINKS (ADD A LINK TO THE BUG OR TO THE TENANT):
05 - EXPECTED BEHAVIOR (LIST THE EXPECTED BEHAVIORS TO CONSIDER THIS BUG AS DONE):

SCOPE OF THIS CARD

  • Requests on /bigQuery/processQuery (via orchestrator) to run SQL tasks with efficiency enabled should respect the following:
    • If the tenant is Customer AND
      • Carol App Installed is SQL ONLY:
        • RULE 1
      • Carol App Installed is HYBRID and Customer Tenant is on the Allowed List:
        • RULE 1
      • Carol App Installed is Hybrid and Customer Tenant is not on the Allowed List:
        • RULE 2
      • Carol App Installed does not have a Unified Tenant linked
        • RULE 2
    • If the tenant is Development:
      • RULE 3
  • RULE 1: Search the pipelines of the Unified Tenant to check if there is new data on the stagings mapped to process the Task
  • RULE 2: Execute the Task without efficiency enabled, even though the request was asking to be enabled. We should log that on the task.
  • RULE 3: Use the pipelines materialized on the Development Tenant.

NOT ON SCOPE OF THIS CARD

  • If the tenant is Development AND carolApp used is not linked with the current tenant
    • Use the Conditionals for the Customer Tenant above

Activity

MARCOS STUMPF 4 April 2024, 19:06 Jira Internal Users

Testes concluídos com SUCESSO

Novas evidências abaixo:

ATUALIZAÇÃO SOBRE O CENÁRIO 2

Na primeira execução com o manifesto sem sofrer alteração (stg product), notou-se que o procedimento utilizado foi parcial, ou seja, havia no manifesto uma outra pipeline que fazia uso da mesma staging, portanto, deveria também ter sido executada a request do orquestrador para esta pipeline.

Ao refazer a primeira execução com novos dados e processando ambas as pipelines pelo orquestrador, o cenário foi concluído com sucesso, tendo na observabilidade o summary com o status PROCESSED.

image-20240404-185500.png

Na segunda execução com alteração no manifesto (stg newproduct), notou-se que o procedimento utilizado precisava seguir uma determinada sequência (1º checkout das alterações e depois o envio dos batches e summary), pois há um job interno que executa de tempos em tempos e por consequência acabou por colocar o status de NO_PIPELINE_EXECUTED devido eu ter executado o procedimento na ordem inversa (1º enviei os registros em batches e o summary e depois efetuei o checkout com as alterações). Aqui cabe um ponto de melhoria na implementação (novo card) para aumentarmos a resiliência do processo, haja visto que o procedimento da forma que foi executado por mim, pode vir a ser feito também nesta sequência pelos usuários internos (DE) e externos (clientes) em uma maior frequência.

Enfim, ao seguir a ordem sugerida, o evento de observabilidade foi concluído com sucesso, alterando o status do Summary para PROCESSED.

image-20240404-190519.png
MARCOS STUMPF 4 April 2024, 15:48 Jira Internal Users
Cenário 2 - Eficiência Batch - Orquestrador e checkout Unificada

Problema Atual

image-20240404-154424.png

Comportamento Esperado:

Sincronizar pipeline na tenant cliente (orquestrador) no checkout da Unificada no evento de observabilidade

Passos para Reprodução

  • [x] Da pipeline git, altere o parâmetro para "useBatchNotification": true

  • [x] Do Tenant Unificado, realize um checkout

  • [x] Adicione novos dados na tenant cliente utilizando batchID, batchSequenceID e Summary

  • [x] Atualize a organização para configurar o webhook de recebimento de eventos

  • [x] Verifique os dados de observabilidade na tabela

  • [x] Execute um primeiro processamento SQL do ORCHESTRATOR

  • [x] Da pipeline git, adicione uma nova tabela de staging

  • [x] Do Tenant Unificado, realize um novo checkout

  • [x] Adicione novos dados na tenant cliente utilizando batchID, batchSequenceID e Summary

  • [x] Verifique os dados de observabilidade na tabela

  • [x] Execute um segundo processamento SQL do ORCHESTRATOR agora com a nova tabela

  • [ ] Execute um terceiro processamento SQL do ORCHESTRATOR

Comportamento Apresentado: FALHA

Primeira execução - Eficiência Batch 🟢

image-20240404-154501.png
image-20240404-154512.png
image-20240404-154523.png
image-20240404-154534.png

CONSIDERAÇÕES

Embora tenha funcionado o processamento da task de eficiência, neste cenário, o status do summary ficou travado em processamento, devido o envio 4 registros ao invés de 2 como era esperado pelo Summary.


Segunda execução após checkout na Unificada (nova stg) - Eficiência Batch 🔴

image-20240404-154708.png

Demonstrando que há dados

image-20240404-154718.png

Summary confirmando a espera de 2 registros, porém no status de pipeline não executada

image-20240404-154730.png
image-20240404-154735.png
MARCOS STUMPF 3 April 2024, 17:09 Jira Internal Users
Evidências - Cenário 1 (Eficiência Genérica)- Orquestrador e checkout Unificada

Problema Atual

image-20240403-170531.png

Comportamento Esperado:

Sincronizar pipeline na tenant cliente (orquestrador) no checkout da Unificada no evento de observabilidade

Passos para Reprodução

  • [x] Crie um Tenant Unificado

  • [x] Crie um Tenant do Cliente

  • [x] Crie um App Carol

  • [x] Anexe o App Carol ao app unificado

  • [x] Vincule o App Carol a um repositório git

  • [x] Libere o app para aprovação

  • [x] Aprove o app Carol

  • [x] Instale o App Carol no Unificado e no Cliente

  • [x] Adicione novos dados na tenant cliente

  • [x] Execute o processamento SQL pelo Orquestrador uma vez

  • [x] Da pipeline git, adicione uma nova tabela de staging

  • [x] Do Tenant Unificado, realize um checkout

  • [x] Adicione novos dados na tenant cliente

  • [x] Execute um segundo processamento SQL do ORCHESTRATOR agora com a nova tabela

  • [x] Execute um terceiro processamento SQL do ORCHESTRATOR

Comportamento Apresentado: SUCESSO

Primeira execução - Eficiência Genérica (sem batch)

image-20240403-170609.png

Segunda execução após checkout na Unificada (nova stg) - Eficiência Genérica (sem batch)

image-20240403-170621.png

Terceira execução sem envio de novos dados - Eficiência Genérica (sem batch)

image-20240403-172335.png

Automation for Jira 27 March 2024, 20:13 Jira Internal Users

Esta issue foi automaticamente movida para REGRESSION, pois o PR foi mergeado na branch QA no Github.

Automation for Jira 27 March 2024, 20:13 Jira Internal Users

Esta issue foi automaticamente movida para REGRESSION, pois o PR foi mergeado na branch QA no Github.

Automation for Jira 25 March 2024, 19:30 Jira Internal Users

@MARCOS STUMPF ,
@Jonathan Willian Moraes , @Renan Schroeder , @André Pereira de Oliveira , @Douglas Coimbra Lopes , @Gabriel DAmore Marciano , @Renan Schroeder

This issue was planned to be delivered until 2024-04-15. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-04-15, 2024-03-01, 2024-03-25

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

Automation for Jira 25 March 2024, 17:51 Jira Internal Users

This issue was automatically transitioned to TESTED & MERGED, as its PR was just merged into develop branch in Github. PR Approved by Damore,douglascoimbra,olivandre.

Automation for Jira 25 March 2024, 17:38 Jira Internal Users

Github user olivandre has just approved a PR (added as Shard Assignee in this Jira issue).

fix: https://totvslabs.atlassian.net/browse/CAPL-5388#icft=CAPL-5388 Pipelines from customer tenants don't synchronize when unif tenant do a checkout

Automation for Jira 25 March 2024, 11:59 Jira Internal Users

Github user douglascoimbra has just approved a PR (added as Shard Assignee in this Jira issue).

fix: https://totvslabs.atlassian.net/browse/CAPL-5388#icft=CAPL-5388 Pipelines from customer tenants don't synchronize when unif tenant do a checkout

Automation for Jira 25 March 2024, 11:32 Jira Internal Users

This issue was automatically transitioned to QA REVIEW, as its PR was just approved in Github.

Automation for Jira 19 March 2024, 18:33 Jira Internal Users

Testes finalizados frente a observabilidade (fluxo de batch)
• Cenaros de data subscription OK para dev + unificada. Abertos 2 cards de produto
• Automacao em andamento
• Serah testado a otimizacao generica uma vez que o card estiver na `develop` e o PR for atualizado com a holder branch

Mensagem editada no Slack - plataforma-carol-internal - Douglas Coimbra Lopes

Automation for Jira 19 March 2024, 18:32 Jira Internal Users

Testes finalizados frente a observabilidade (fluxo de batch)
• Automacao em andamento
• Serah testado a otimizacao generica uma vez que o card estiver na `develop` e o PR for atualizado com a holder branch

Mensagem enviada pelo Slack - plataforma-carol-internal - Douglas Coimbra Lopes

Automation for Jira 14 March 2024, 12:36 Jira Internal Users

@MARCOS STUMPF ,
@Renan Schroeder , @André Pereira de Oliveira , @Douglas Coimbra Lopes , @Gabriel DAmore Marciano , @Renan Schroeder

Flag was removed since you have just transitioned the issue status/column.

Automation for Jira 11 March 2024, 14:56 Jira Internal Users

@MARCOS STUMPF ,
@Gabriel DAmore Marciano , @Renan Schroeder , @André Pereira de Oliveira , @Douglas Coimbra Lopes , @Gabriel DAmore Marciano , @Renan Schroeder

This issue was just linked to issue(s) https://totvslabs.atlassian.net/browse/CAPL-5589#icft=CAPL-5589, according to mention in a prior comment.

Gabriel DAmore Marciano 11 March 2024, 14:56 Jira Internal Users

Flag added

Waiting analysis on https://totvslabs.atlassian.net/browse/CAPL-5589

Automation for Jira 1 March 2024, 19:01 Jira Internal Users

@MARCOS STUMPF ,
@Pedro Buzzi , @Renan Schroeder , @André Pereira de Oliveira , @Douglas Coimbra Lopes , @Gabriel DAmore Marciano , @Renan Schroeder

This issue was planned to be delivered until 2024-03-25. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-03-01, 2024-03-25

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

Douglas Coimbra Lopes 27 February 2024, 21:18 Jira Internal Users

@Renan Schroeder Card validated by the QA team. It is pending only the code review cc @Gabriel DAmore Marciano

Douglas Coimbra Lopes 27 February 2024, 13:48 Jira Internal Users

@Renan Schroeder When a staging table is updated on the SQL Manifest, the SQL Processing Taks is working on the previous staging.

Steps:

  • Create a unified + customer + carol app (dev tenant)

  • Add the customer on the allowed list

  • Run a SQL Processing with the A Staging

  • Update the staging on the SQL Manifest

  • Perform a checkout on unified tenant

  • Now, Run a SQL Task with the B staging

previous staging: douglas

replacer staging: product

pipeline: https://github.com/douglascoimbra/test-sample-pipeline/blob/main/soapcheck.json

SQL Task:

Douglas Coimbra Lopes 27 February 2024, 13:45 Jira Internal Users
Automation for Jira 27 February 2024, 13:44 Jira Internal Users

:rotating_light: Cenario encontrado no
Update: topico resolvido :white_check_mark:
• Uma vez realizado um checkout de pipeline para alteracao de staging, a otimizacao generica nao esta sendo ativada para a nova staging
• Detalhes + evidencias no card.

Edited on Slack - platform-internal - Douglas Coimbra Lopes

Automation for Jira 26 February 2024, 22:17 Jira Internal Users

:rotating_light: Cenario encontrado no
• Uma vez realizado um checkout de pipeline para alteracao de staging, a otimizacao generica nao esta sendo ativada para a nova staging
• Detalhes + evidencias no card.

Sent by Slack - platform-internal - Douglas Coimbra Lopes

Douglas Coimbra Lopes 26 February 2024, 22:17 Jira Internal Users

@Renan Schroeder When we update a staging table from the pipeline, the generic optimization is not activated for the respective staging table

  • SQL Processing has been executed for ORCHESTRATOR scenario, but it reproduces for unified also

previous staging: product

new staging: douglas

pipeline: https://github.com/douglascoimbra/test-sample-pipeline/blob/main/soapcheck.json

sandbox: https://totvsdariusreset.qarol.ai/foradalista/carol-ui/tasks/activity/d9967e1ca14742c6bdf83506685aaba1?filters=%5B%7B%22dateUpdated%22:%5B%22after%22,%222024-02-26T16:51:54.800%2B00:00%22%5D%7D,%7B%22hideInternal%22:%22true%22%7D%5D

Douglas Coimbra Lopes 23 February 2024, 23:56 Jira Internal Users

@Renan Schroeder For tenants running SQL Processing using customer ORCHESTRATOR, when a customer is out of the Unified Allowed list, we are receiving errors that the tenant is not on the allowed list. We are logging already as a warning like the image below.

SANDBOX: https://totvsfeedbackqa.qarol.ai/klientefinalhybrid/carol-ui/tasks/activity/69628fc15b10466989572d606125627e?p=1&ps=25&sort=dateUpdated&order=DESC&filters=%5B%7B%22hideInternal%22:%22true%22%7D%5D

GCP: https://cloudlogging.app.goo.gl/8EwtsjP81nF5CkZH7

Douglas Coimbra Lopes 23 February 2024, 23:53 Jira Internal Users

ENABLING GENERIC OPTIMIZATION

DEV ORCHESTRATOR

FIRST TABLE ON GENERIC OPTIMIZATION

SWITCHING TABLE

ORCHESTRATION WITHOUT UNIFIED ALLOWED LIST

Automation for Jira 23 February 2024, 13:32 Jira Internal Users

This issue was automatically transitioned to REVIEW, as its PR (not DRAFT and not WIP) was just created in Github.

fix: https://totvslabs.atlassian.net/browse/CAPL-5388#icft=CAPL-5388 Pipelines from customer tenants don't synchronize when unif tenant do a checkout

Robson Thanael Poffo 12 February 2024, 19:24 Jira Internal Users

Query para mapear casos atuais send afetados:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

 
  
select appName, pipelineName, count(distinct tenantId) from (
SELECT
  string(JSON_EXTRACT(json_payload, '$.environmentId')) tenantId, 
  string(JSON_EXTRACT(json_payload, '$.message')) message, 
   REGEXP_EXTRACT(string(JSON_EXTRACT(json_payload, '$.message')), r"\bapp\s+(\w+)\b") as appName,
   REGEXP_EXTRACT(string(JSON_EXTRACT(json_payload, '$.message')), r"\bPipeline\s+(\w+)\b") as pipelineName,
  timestamp, severity, resource.type, log_name, text_payload, proto_payload, json_payload,
  --*
FROM
  `labs-app-mdm-production.global.mdmStack._AllLogs`
WHERE
  timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
  and string(JSON_EXTRACT(json_payload, '$.message')) like '%not found for carol app %'
  and REGEXP_EXTRACT(string(JSON_EXTRACT(json_payload, '$.message')), r"\bapp\s+(\w+)\b") = 'clockinapp'
)
group by 1,2

https://cloudlogging.app.goo.gl/k8rHTBpkisdoLr6g9

Avaliando casos exceto mdmuser e frauds, temos uma média de 10% das tenants clockin sendo afetadas.

@Bruno Furtado , para acompanha ro tema.

Automation for Jira 12 February 2024, 19:19 Jira Internal Users

@MARCOS STUMPF ,
@Gabriel DAmore Marciano , @Gabriel DAmore Marciano , @Renan Schroeder

This issue was planned to be delivered until 2024-03-04. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-03-04

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

André Pereira de Oliveira 12 February 2024, 15:07 Jira Internal Users

Check where is used the pipeline

Automation for Jira 9 February 2024, 16:45 Jira Internal Users

Cenario encontrado no .
• :white_check_mark: *Update: Topico Resolvido. Issue ja mapeada no card *
• Uma vez atualizado de `true`-> `false` ou o parametro ja em `false` value do parametro `checkExistsDataToProcess,` a otimizacao generica nao esta sendo desativada para o data model mdmpurchaseorder
• Detalhes no card + sandbox

Edited on Slack - platform-internal - Douglas Coimbra Lopes

Automation for Jira 9 February 2024, 16:44 Jira Internal Users

Cenario encontrado no .
• :white_check_mark: *Topico Resolvido: Issue ja mapeada no card *
• Uma vez atualizado de `true`-> `false` ou o parametro ja em `false` value do parametro `checkExistsDataToProcess,` a otimizacao generica nao esta sendo desativada para o data model mdmpurchaseorder
• Detalhes no card + sandbox

Sent by Slack - platform-internal - Douglas Coimbra Lopes