PR Scans are Delayed
Updates
Summary:
PR scanning was delayed from 11:21 AM to 14:42 IST on 18th October 2024 (Israel time).
Timeline:
- 18.10.24 14:05 PM - Support ticket opened regarding “PR scans pending.”
- 18.10.24 14:12 PM - R&D team identified that the secret detector pods were down.
- 18.10.24 14:19 PM - DevOps team rebuilt the secret detector service.
- 18.10.24 14:24 PM - DevOps team deployed the rebuilt service.
- 18.10.24 14:28 PM - Lag began to decrease, and PR scans resumed processing.
- 18.10.24 14:42 PM - Incident fully resolved; lag reduced to zero.
Root Cause:
During our RCA, we identified that the secret detector Docker image was missing the latest-<env>
tag. This tag marks the image as deployed in a specific environment (e.g., STG, EU, US) and is crucial because docker registry image cleanup policy retains any image with the latest
tag prefix. Additionally, every image is tagged with a main-<sha>
tag, and another cleanup policy removes images tagged this way after 90 days, unless they also have the latest
tag. However, the image in question should not have been deleted, as it was created only seven days prior.
Conclusion:
Adding the latest-<env>
tag will prevent the image from being inadvertently deleted in the future.
Action Items:
-
Update the secret detector CI pipeline to ensure every deployed image is tagged with
latest-<env>
. - Ensure all services follow the same tagging logic.
- Update the alert system in the internal monitoring tool to clearly indicate when multiple failures occur, instead of aggregated notifications.
A fix has been implemented and we are monitoring the results.
We have identified an issue causing delays with our Secret Scan PR feature and are working to address it.
← Back