Repository Detections Has Been Auto Resolve Upon Branch Deletion

Minor incident EU Environment API (EU Environment) Application/UI (EU Environment) US Environment API (US Environment) Application/UI (US Environment)
2025-06-19 12:21 IDT · 2 days, 23 hours, 39 minutes

Updates

Issue

Summary

On June 16, 2025, the service that holds the detections was updated with code that mistakenly sent an incorrect deletion message. Instead of reporting a branch deletion, the message was interpreted as a full repository deletion. Consequently, the service that holds the detections automatically resolved all violations across multiple repositories, even though these repositories were not actually deleted. The publishing of problematic messages lasted approximately 14 hours, until June 17, 2025. The issue was discovered following customer reports, and the full remediation, including data restoration, continued until June 22, 2025.

Key Timeline

16.06.25 08:36 PM (IDT) – Commit was deployed to the service that holds the detections, sending incorrect repository-level deletion messages.
17.06.25 10:35 AM (IDT) – Last known messages with faulty deletion semantics were published.
18.06.25 (IDT) – Customer reports of erroneously resolved violations were received.
19.06.25 01:12 PM (IDT) – The problematic feature was disabled, and the responsible code was removed.
19.06.25 03:18 PM (IDT) – The affected repositories were identified, and work began on restoring the violations.
21–22.06.25 (IDT) – An internal API was developed and executed to restore violations in the affected repositories.

Root Cause Analysis

The root cause of the incident was an error in the message construction logic within the service that holds the detections. The code sent a “delete repository” message instead of a “delete branch” message, leading the service that holds the detections to misinterpret the event and resolve violations across the entire repository.

Actions Taken

  • Immediate Disablement: The feature causing the issue was disabled immediately, and the problematic code was removed.
  • Developed Restoration Tool: An internal API was developed and executed to restore the mistakenly resolved violations.
  • Code Fix: The message construction logic was corrected to prevent recurrence.

Action items

  • Explore solutions for message versioning in Kafka.
  • Adding an additional step to the pipeline of running integration tests over the images that are in production, which can help us identify dependencies between different services
  • Identify entry points and protect them with feature flags.
  • Identifying anomalies of events, like too many repositories deleted in a small period of time
June 24, 2025 · 15:04 IDT
Issue

This incident has been resolved.

June 24, 2025 · 14:48 IDT

← Back