Online process was canceled by the operator

Description

Contact

  • @Bruno Furtado

Problem

An online process from Carol App was cancelled and it is used 24/7, so it must always be active.
The task does not have 30 days running.

  • Tenant: clockin/clockinweb
  • Carol app: clockinwebapp
  • Online process: clockinwebapp_api
  • Task: 59f2c53da98d48b4adbb44f84b9e9187
  • Datetime error: 2024-01-20 02:03pm BRT

More details

It looks like mdm send a request to operator and the operator delete the app.
https://cloudlogging.app.goo.gl/AJPMv2hHzXq3oCqZ8

"DELETE /api/apps/clockinweb/clockinwebapp HTTP/1.1" 200

After that, mdm not found the task anymore:
https://cloudlogging.app.goo.gl/Q7pUpvomsQuKf7478

Application for task 59f2c53da98d48b4adbb44f84b9e9187
(clockinweb|clockinwebapp) is not running on the operator,
marking as cleanup.

Links

Expected behavior

  1. Online process must always be active, being canceled only by user actions (API or UI).

Activity

Automation for Jira 25 March 2024, 20:30 Jira Internal Users

@MARCOS STUMPF ,
@Jonathan Willian Moraes , @Rodrigo Bechtold , @André Pereira de Oliveira , @Douglas Coimbra Lopes , @Gabriel DAmore Marciano , @Geny Isam Hamud Herrera

This issue was planned to be delivered until 2024-04-15. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-04-15, 2024-02-12, 2024-03-01, 2024-03-25

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

Automation for Jira 25 March 2024, 12:25 Jira Internal Users

This issue was automatically transitioned to REGRESSION, as its PR was just merged into qa branch in Github.

Automation for Jira 25 March 2024, 12:25 Jira Internal Users

This issue was automatically transitioned to REGRESSION, as its PR was just merged into qa branch in Github.

Automation for Jira 25 March 2024, 12:25 Jira Internal Users

This issue was automatically transitioned to REGRESSION, as its PR was just merged into qa branch in Github.

Automation for Jira 18 March 2024, 18:13 Jira Internal Users

This issue was automatically transitioned to TESTED & MERGED, as its PR was just merged into develop branch in Github. PR Approved by olivandre,douglascoimbra.

Automation for Jira 11 March 2024, 14:09 Jira Internal Users

Github user rodrigo-bechtold has just commited and issue was sent back to the REVIEW column.

Automation for Jira 1 March 2024, 20:01 Jira Internal Users

@MARCOS STUMPF ,
@Pedro Buzzi , @Rodrigo Bechtold , @André Pereira de Oliveira , @Douglas Coimbra Lopes , @Geny Isam Hamud Herrera

This issue was planned to be delivered until 2024-03-25. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-02-12, 2024-03-01, 2024-03-25

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

Automation for Jira 22 February 2024, 15:34 Jira Internal Users

Github user douglascoimbra has just approved a PR (added as Shard Assignee in this Jira issue).

fix: CAPL-5389 Done - Add resiliency to OperatorReconcileJob

Automation for Jira 21 February 2024, 21:37 Jira Internal Users

This issue was automatically transitioned to QA REVIEW, as its PR was just approved in Github.

Douglas Coimbra Lopes 21 February 2024, 21:35 Jira Internal Users

@Rodrigo Bechtold The card has been validated by the QA team. It is pending only the code review.

Automation for Jira 21 February 2024, 15:35 Jira Internal Users

. Erro de badgatewy ao tentar interromper um process online de carol app.
• Detalhes no card + sandbox link

Edited on Slack - platform-internal - Douglas Coimbra Lopes

Automation for Jira 21 February 2024, 15:35 Jira Internal Users

. Erro de badgatewy ao tentar interromper um process de online de carol app.
• Detalhes no card + sandbox link

Sent by Slack - platform-internal - Douglas Coimbra Lopes

Douglas Coimbra Lopes 21 February 2024, 15:33 Jira Internal Users

EVALUATING DOCKER BUILD PROCESS

Automation for Jira 21 February 2024, 00:27 Jira Internal Users

This issue was automatically transitioned to REVIEW, as its PR (not DRAFT and not WIP) was just created in Github.

fix: CAPL-5389 Done - Add resiliency to OperatorReconcileJob

Geny Isam Hamud Herrera 15 February 2024, 18:33 Jira Internal Users

Problema:
Mesmo o documento existindo no ES http://localhost:19201/*/mdmTenantAppAIProcess/_search?q=%2BmdmTenantAppId:fc4cad5f6557432e85bcc4e0469494df+%2B nos logs ele aponta que não encontrou o documento https://console.cloud.google.com/logs/query;cursorTimestamp=2024-01-20T17:02:11.778366123Z;endTime=2024-01-20T17:02:13.000Z;query=timestamp%3D"2024-01-20T17:02:11.775717560Z" insertId%3D"upzbcb7p0dti1hsf";startTime=2024-01-20T17:02:08.000Z;summaryFields=:false:32:beginning?project=labs-app-mdm-production

Por conta de não ter encontrado esse documento a classe OperatorReconcileJob fez o invalidate do AI Process que por sua vez chamou o DELETE no Operator https://cloudlogging.app.goo.gl/AJPMv2hHzXq3oCqZ8 que matou o Pod e posteriormente cancelando a Task.

Melhorias:

  1. Dentro do methodo getByTenantAppId na classe AbstractTenantAppFKWithSpaceService que é generico a gente sempre lança uma exception informando que não encontrou o TENANT_APP_FILE, mesmo quando está buscando outro tipo de entidade como é o caso do OperatorReconcileJob:139 que está buscando TenantAppAIProcess
    @Override
      public T getByTenantAppId(
          UserAccessDetails uad,
          String tenantAppId,
          EntitySpaceType entitySpaceType,
          boolean checkBothSpace)
          throws RecordNotFoundException {
        T result =
            getTenantAppFKDao()
                .findOneByForeignKeys(
                    uad.getTenantId(), getForeignKeys(tenantAppId), entitySpaceType, checkBothSpace);
        if (result == null) {
          throw new RecordNotFoundException(
              TypeConstants.TENANT_APP_FILE,
              MdmTenantAppFKEntityWithSpace.MDM_TENANT_APP_ID,
              tenantAppId,
              uad.getTenantId());
        }
        return result;
      }

2. Tambem entendo que não precisamos fazer a busca no ES do documento TenantAppAIProcess referenciando a FK do TenantApp visto que temos o TenantAppAIProcess Id. Isso tambem poderia ser modificado para buscar pelo Id, fazendo com que a nossa pesquisa aconteça primeiramente no CB e não no ES. Na classe OperatorReconcileJob

tenantAppAIProcess =
                MdmServiceFactory.getInstance(TenantAppAIProcessService.class)
                    .getByTenantAppId(uad, appTenantId, EntitySpaceType.PRODUCTION, false);

3. Já que iremos modificar esse método para a busca pelo Id, vale a pena tambem aproveitar o momento e adicionar um FailSafe, para que caso tenhamos indisponibilidade novamente não realizar o cancelamento de um Online Process.

cc @Gabriel DAmore Marciano @Rodrigo Bechtold

Automation for Jira 12 February 2024, 19:44 Jira Internal Users

@MARCOS STUMPF ,
@Gabriel DAmore Marciano , @Rodrigo Bechtold , @Geny Isam Hamud Herrera

This issue was planned to be delivered until 2024-03-04. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-02-12, 2024-03-04

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

Automation for Jira 7 February 2024, 18:01 Jira Internal Users

@MARCOS STUMPF ,
@Rodrigo Bechtold , @Geny Isam Hamud Herrera

This issue was planned to be delivered until 2024-02-12. You can check that by consulting the issue in the Due Date field.

Dates already planned for this issue: 2024-02-12

If External Issue Link field is filled, customer was also informed on JIRA TOTVS.

Gabriel DAmore Marciano 26 January 2024, 20:48 Jira Internal Users

Removing qa story points because it depends on the analysis of the developer to point where was the problem and how the qa team can test it.