[EU] Can't login to the platform
Incident Report for Cycode
Postmortem

Incident Summary:
Users were unable to access the application.

Time of Incident:
Friday, 11:38 AM

Issue Detected:
Reports were received indicating that users could not log in to the application.

Initial Findings:

  • Numerous alerts from the authentication service indicated connectivity issues with the database.
  • A event log showed an database failover occurred near the time of the incident.

Immediate Action Taken:

  • All authentication service pods were reset.
  • After the reset, new pods started functioning correctly, and users were able to log in successfully.

Root Cause:
The application’s authentication and authorization mechanism relies on auth-service. Following the database failover, auth-service instances did not reconnect to the database endpoint, causing login failures. Although tests indicate auth-service is designed to handle database failovers, the instances failed to recover the connection in this instance. The exact cause remains unclear, and further investigation is ongoing.

Actions Taken:

  1. Added error logs to a dedicated channel to monitor similar cases in the future.
  2. Increased database memory allocation to reduce the likelihood of future failovers.
  3. Continued investigation into why auth-service did not recover its connection after the DB failover.
  4. Implemented synthetic and domain monitoring to proactively identify login issues.
Posted Dec 01, 2024 - 12:55 UTC

Resolved
This incident has been resolved.
Posted Nov 22, 2024 - 13:29 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 22, 2024 - 09:55 UTC
Investigating
We are currently investigating this issue.
Posted Nov 22, 2024 - 09:51 UTC
This incident affected: EU Environment (Application/UI (EU Environment), API (EU Environment)).