[EU] Can't login to the platform

Major incident EU Environment API (EU Environment) Application/UI (EU Environment)
2024-11-22 11:51 IST · 3 hours, 38 minutes

Updates

Update

Incident Summary:
Users were unable to access the application.

Time of Incident:
Friday, 11:38 AM

Issue Detected:
Reports were received indicating that users could not log in to the application.

Initial Findings:

  • Numerous alerts from the authentication service indicated connectivity issues with the database.
  • A event log showed an database failover occurred near the time of the incident.

Immediate Action Taken:

  • All authentication service pods were reset.
  • After the reset, new pods started functioning correctly, and users were able to log in successfully.

Root Cause:
The application’s authentication and authorization mechanism relies on auth-service. Following the database failover, auth-service instances did not reconnect to the database endpoint, causing login failures. Although tests indicate auth-service is designed to handle database failovers, the instances failed to recover the connection in this instance. The exact cause remains unclear, and further investigation is ongoing.

Actions Taken:

  1. Added error logs to a dedicated channel to monitor similar cases in the future.
  2. Increased database memory allocation to reduce the likelihood of future failovers.
  3. Continued investigation into why auth-service did not recover its connection after the DB failover.
  4. Implemented synthetic and domain monitoring to proactively identify login issues.
December 1, 2024 · 14:55 IST
Resolved

This incident has been resolved.

November 22, 2024 · 15:29 IST
Monitoring

A fix has been implemented and we are monitoring the results.

November 22, 2024 · 11:55 IST
Investigating

We are currently investigating this issue.

November 22, 2024 · 11:51 IST

← Back