Online process was canceled by the operator
Description
Contact
- @Bruno Furtado
Problem
An online process from Carol App was cancelled and it is used 24/7, so it must always be active.
The task does not have 30 days running.
- Tenant: clockin/clockinweb
- Carol app: clockinwebapp
- Online process: clockinwebapp_api
- Task: 59f2c53da98d48b4adbb44f84b9e9187
- Datetime error: 2024-01-20 02:03pm BRT
More details
It looks like mdm send a request to operator and the operator delete the app.
https://cloudlogging.app.goo.gl/AJPMv2hHzXq3oCqZ8
"DELETE /api/apps/clockinweb/clockinwebapp HTTP/1.1" 200
After that, mdm not found the task anymore:
https://cloudlogging.app.goo.gl/Q7pUpvomsQuKf7478
Application for task 59f2c53da98d48b4adbb44f84b9e9187
(clockinweb|clockinwebapp) is not running on the operator,
marking as cleanup.
Links
- Tenant: https://clockin.carol.ai/clockinweb/
- Carol App: https://clockin.carol.ai/clockinweb/carol-ui/carol-app-dev/fa28987156a44bb0afab55ec70cf6d42/process
- Cancelled task: https://clockin.carol.ai/clockinweb/carol-ui/tasks/activity/59f2c53da98d48b4adbb44f84b9e9187
- Slack thread: https://totvscarol.slack.com/archives/C0330CSV2/p1706020061328879
Expected behavior
- Online process must always be active, being canceled only by user actions (API or UI).
Activity
Show:
@MARCOS STUMPF ,
@Jonathan Willian Moraes , @Rodrigo Bechtold , @André Pereira de Oliveira , @Douglas Coimbra Lopes , @Gabriel DAmore Marciano , @Geny Isam Hamud Herrera
This issue was planned to be delivered until 2024-04-15. You can check that by consulting the issue in the Due Date field.
Dates already planned for this issue: 2024-04-15, 2024-02-12, 2024-03-01, 2024-03-25
If External Issue Link field is filled, customer was also informed on JIRA TOTVS.
This issue was automatically transitioned to REGRESSION, as its PR was just merged into qa branch in Github.
This issue was automatically transitioned to REGRESSION, as its PR was just merged into qa branch in Github.
This issue was automatically transitioned to REGRESSION, as its PR was just merged into qa branch in Github.
This issue was automatically transitioned to TESTED & MERGED, as its PR was just merged into develop branch in Github. PR Approved by olivandre,douglascoimbra.
Github user rodrigo-bechtold has just commited and issue was sent back to the REVIEW column.
@MARCOS STUMPF ,
@Pedro Buzzi , @Rodrigo Bechtold , @André Pereira de Oliveira , @Douglas Coimbra Lopes , @Geny Isam Hamud Herrera
This issue was planned to be delivered until 2024-03-25. You can check that by consulting the issue in the Due Date field.
Dates already planned for this issue: 2024-02-12, 2024-03-01, 2024-03-25
If External Issue Link field is filled, customer was also informed on JIRA TOTVS.
Github user douglascoimbra has just approved a PR (added as Shard Assignee in this Jira issue).
fix: https://totvslabs.atlassian.net/browse/CAPL-5389#icft=CAPL-5389 - Add resiliency to OperatorReconcileJob
This issue was automatically transitioned to QA REVIEW, as its PR was just approved in Github.
@Rodrigo Bechtold The card has been validated by the QA team. It is pending only the code review.
. Erro de badgatewy ao tentar interromper um process online de carol app.
• Detalhes no card + sandbox link
Edited on Slack - platform-internal - Douglas Coimbra Lopes
. Erro de badgatewy ao tentar interromper um process de online de carol app.
• Detalhes no card + sandbox link
Sent by Slack - platform-internal - Douglas Coimbra Lopes
EVALUATING DOCKER BUILD PROCESS
This issue was automatically transitioned to REVIEW, as its PR (not DRAFT and not WIP) was just created in Github.
fix: https://totvslabs.atlassian.net/browse/CAPL-5389#icft=CAPL-5389 - Add resiliency to OperatorReconcileJob
Problema:
Mesmo o documento existindo no ES http://localhost:19201/*/mdmTenantAppAIProcess/_search?q=%2BmdmTenantAppId:fc4cad5f6557432e85bcc4e0469494df+%2B nos logs ele aponta que não encontrou o documento https://console.cloud.google.com/logs/query;cursorTimestamp=2024-01-20T17:02:11.778366123Z;endTime=2024-01-20T17:02:13.000Z;query=timestamp%3D"2024-01-20T17:02:11.775717560Z" insertId%3D"upzbcb7p0dti1hsf";startTime=2024-01-20T17:02:08.000Z;summaryFields=:false:32:beginning?project=labs-app-mdm-production
Por conta de não ter encontrado esse documento a classe OperatorReconcileJob fez o invalidate do AI Process que por sua vez chamou o DELETE no Operator https://cloudlogging.app.goo.gl/AJPMv2hHzXq3oCqZ8 que matou o Pod e posteriormente cancelando a Task.
Melhorias:
Dentro do methodo getByTenantAppId na classe AbstractTenantAppFKWithSpaceService que é generico a gente sempre lança uma exception informando que não encontrou o TENANT_APP_FILE, mesmo quando está buscando outro tipo de entidade como é o caso do OperatorReconcileJob:139 que está buscando TenantAppAIProcess
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
2. Tambem entendo que não precisamos fazer a busca no ES do documento TenantAppAIProcess referenciando a FK do TenantApp visto que temos o TenantAppAIProcess Id. Isso tambem poderia ser modificado para buscar pelo Id, fazendo com que a nossa pesquisa aconteça primeiramente no CB e não no ES. Na classe
OperatorReconcileJob
1
2
3
3. Já que iremos modificar esse método para a busca pelo Id, vale a pena tambem aproveitar o momento e adicionar um FailSafe, para que caso tenhamos indisponibilidade novamente não realizar o cancelamento de um Online Process.
cc @Gabriel DAmore Marciano @Rodrigo Bechtold
@MARCOS STUMPF ,
@Gabriel DAmore Marciano , @Rodrigo Bechtold , @Geny Isam Hamud Herrera
This issue was planned to be delivered until 2024-03-04. You can check that by consulting the issue in the Due Date field.
Dates already planned for this issue: 2024-02-12, 2024-03-04
If External Issue Link field is filled, customer was also informed on JIRA TOTVS.
@MARCOS STUMPF ,
@Rodrigo Bechtold , @Geny Isam Hamud Herrera
This issue was planned to be delivered until 2024-02-12. You can check that by consulting the issue in the Due Date field.
Dates already planned for this issue: 2024-02-12
If External Issue Link field is filled, customer was also informed on JIRA TOTVS.
Removing qa story points because it depends on the analysis of the developer to point where was the problem and how the qa team can test it.