Degraded Performance in SCA Scans

Minor incident EU Environment API (EU Environment) Application/UI (EU Environment)
2026-01-22 11:43 IST · 2 days, 21 hours, 13 minutes

Updates

Post-mortem

Summary
Between January 14 and January 25, 2026, customers experienced significant delays and failures in Software Composition Analysis (SCA) scans, particularly those involving dependency restoration. The issue resulted in SCA scans being blocked or taking much longer than usual to complete, while other scan types were largely unaffected. The root cause was traced to a process that repeatedly retried failed dependency restoration requests, leading to a backlog of messages and duplicate processing. This created a bottleneck, causing new SCA scan requests to be delayed or time out. The issue was resolved by updating the processing logic, increasing processing capacity, and clearing the backlog. SCA scan performance has since returned to normal.

Key Timeline (IDT)

January 14, 2026, 19:00 IDT: First alerts received about failing SCA scans and timeouts.
January 15, 2026, 10:00 IDT: Significant increase in failed lock creation requests observed.
January 16–21, 2026: Investigation revealed repeated retries and message duplication causing a backlog.
January 22, 2026, 18:00 IDT: Updates deployed to improve processing and prevent duplicate retries.
January 23, 2026, 09:00 IDT: Backlog began to clear; SCA scan performance improved.
January 25, 2026, 09:00 IDT: Incident closed after confirming normal scan processing and backlog resolution.

Root Cause
The incident was triggered by a process responsible for handling dependency restoration in SCA scans. When a restoration request failed or timed out, the process retried the request multiple times, resulting in a large volume of duplicate messages. This overwhelmed the processing system, causing a backlog and delays for new SCA scan requests. The issue was exacerbated by multiple processing units running the same retry logic in parallel, further increasing message duplication.

Actions Taken

  1. Increased processing capacity to handle the backlog of requests.
  2. Updated the retry logic to prevent duplicate processing of failed requests.
  3. Cleared the backlog of duplicate messages.
  4. Monitored system performance to ensure normal operation resumed.

Action Items

  1. Refactor the restoration retry process to ensure only a single instance handles retries.
  2. Add safeguards to prevent duplicate message processing.
  3. Improve monitoring and alerting for abnormal retry or backlog patterns.
  4. Review and optimize dependency restoration workflows to prevent similar issues in the future
January 29, 2026 · 17:34 IST
Resolved

Resolved: The issue impacting SCA scan result ingestion has been fully mitigated and service performance has returned to normal. Backlogged events have been processed, and SCA scan results are now being ingested and reflected in the system as expected.

January 25, 2026 · 08:56 IST
Update

Mitigations have been applied to reduce the impact and restore performance, and the system is now recovering. However, there is still a backlog of queued events to process, which may continue to delay ingestion and availability of SCA scan results until processing catches up.

January 22, 2026 · 17:56 IST
Issue

We are experiencing degraded performance in the SCA scanning pipeline, which may cause delays in scan result ingestion and availability for some repositories.

January 22, 2026 · 11:43 IST

← Back