PR Scans are Delayed
Updates
Summary
On June 30, 2025, some Cycode customers using GitHub integrations experienced interruptions in their development workflows due to issues affecting pull request (PR) status checks and scan execution. These interruptions occurred in three distinct but related areas:
-
Stuck PR Status Checks (GitHub only):
For customers using GitHub, PR status checks initiated by Cycode were stuck in a pending state. -
Delays in PR Scan Execution (GitHub only):
Temporary delays in both new and ongoing scans tied to GitHub. -
Scan Failures for Certain Policy Types:
A subset of PR scans failed across platform - specifically those related to SCA with custom policies, license, and CI/CD scans.
Timeline
07.08.25 07:10 AM (GMT+3) – Monitoring systems detected increasing failure rates in the GitHub dispatcher service. Webhook events were being dropped.
07.08.25 09:39 AM (GMT+3) – Initial suspicion fell on a recent base image upgrade (Alpine 3.22). The image was rolled back and memory limits increased without resolving the issue.
07.08.25 09:52 AM (GMT+3) – Alpine 3.21 was also observed to be unstable. Resource limits continued to be adjusted.
07.08.25 10:35 AM (GMT+3) – Dispatcher service was returning HTTP 500 during request authentication, leading to brief periods of stability after restarts.
07.08.25 10:45 AM (GMT+3) – Logs indicated frequent failures while parsing incoming webhook payloads, suggesting internal overload.
07.08.25 10:49 AM (GMT+3) – Dispatcher confirmed to be overwhelmed. CPU request raised 10x.
07.08.25 10:59 AM (GMT+3) – Missing configuration identified as the root cause. Each incoming webhook request triggered repeated attempts to initialize the feature toggle client.
07.08.25 11:13 AM (GMT+3) – Fix deployed to restore the missing configuration. Webhook processing resumed normal operation. The issue causing stuck PR status checks in GitHub was fully resolved at this point.
07.08.25 03:50 PM (GMT+3) – After successful testing in staging, a full sync of all open PRs was triggered to recover any missed scans due to earlier webhook drops.
07.08.25 04:20 PM (GMT+3) – A moderate lag began forming in topics responsible for initiating PR scans.
07.08.25 04:35 PM (GMT+3) – Lag expanded to the topic that processes PR scan results and updates commit status checks. This affected both recovered and new PR scans.
07.08.25 04:40 PM (GMT+3) – A lot of “new” PRs had been synced, far exceeding expected volume at this stage.
07.08.25 05:10 PM (GMT+3) – Processing lag subsided, but rate-limiting issues emerged for some tenants due to increased scan activity.
07.08.25 06:20 PM (GMT+3) – System appeared stable.
07.08.25 07:00 PM (GMT+3) – Some customers reported discrepancies between Cycode platform and GitHub, where scans appeared successful on the platform despite missing commit status updates—likely caused by earlier webhook issues and outdated status synchronization.
07.08.25 07:00 PM (GMT+3) – High failure rates were observed across PR scans due to rate limits, overall system load and rate-limits.
07.08.25 07:40 PM (GMT+3) – Analysis showed that SCA, License, and CI/CD scans were failing due to issues in the internal graph component.
07.08.25 08:00 PM (GMT+3) – We initiated re-scans of all failed PRs for Secrets, IaC, and SAST - components unaffected by the graph issue.
07.08.25 08:00 PM (GMT+3) – Root cause investigation revealed that one of our graph database instances had entered a stuck state. It was restarted but remained in “Creating” state. Arango support was contacted, and a new instance was provisioned and used for production traffic.
07.08.25 08:30 PM (GMT+3) – The new graph instance stabilized and resumed full scanning capabilities. The issue causing failures in SCA, License, and CI/CD PR scans was fully resolved.
07.08.25 10:00 PM (GMT+3) – All failed scans were retried and completed successfully. System fully recovered.
Root Cause
The issues on June 30 were caused by a combination of configuration, scaling, and infrastructure limitations:
- The initial problem with stuck PR status checks in GitHub was caused by a missing configuration in our GitHub integration service, introduced during a recent backend update. This prevented webhook events from being processed correctly.
- During recovery, a large-scale reprocessing of open GitHub PRs was initiated. While effective in backfilling missed data, it led to unexpected system load and scan delays due to the volume of triggered scans.
- Finally, a backend graph service used in policy enforcement became unresponsive under load. This resulted in scan failures specifically for SCA with custom policies, license policies, and CI/CD security scans, until a replacement service was provisioned.
Each of these issues has been addressed, and targeted improvements are underway to enhance resiliency and recovery controls.
Actions Taken:
- Restored webhook processing for GitHub PRs.
- We’ve created a new graph instance for our PR scanning flow.
- Retried all affected scans across modules to ensure accurate, up-to-date results.
Action items:
-
Time-bounded PR Syncing (GitHub only):
We are adding functionality to allow PR syncing to be scoped to a specific time range. This will enable more controlled recovery in the future and reduce unnecessary system load. -
Support Tools for Targeted Recovery:
A new option will be added to our internal tools allowing Support to trigger a manual sync and scan of individual PRs, expediting issue resolution for specific customer tickets.
PR scans that don’t involve custom policies or CI/CD security now work as expected. It’s important to note that a ‘FAILURE’ status on the Cycode UI’s PR scans page doesn’t mean the PR itself will fail in your SCM. We’re actively working to improve this flow
We are actively working to resolve an issue with CI/CD security, SCA, and license scans.
Description: We’re currently investigating an issue affecting the synchronization of Pull Requests from GitHub to our platform. As a result, we’ve experienced delays and missed updates in the syncing of certain pull request statuses, particularly those created overnight.
This issue has been reported by multiple users who observed that their pull request statuses were not immediately updated on our platform.
Impact:
- Open pull requests may not appear with their most current status on our platform.
- Your actual code repositories on GitHub remain untouched and fully functional; the issue is with how our system processes and displays the pull request status.
Next Steps: Our engineering team is treating this as a critical incident and is actively working to identify the root cause of these synchronization issues. To ensure all delayed pull requests are updated, we’ve initiated a comprehensive sync of all open pull requests. We are gathering more data to accelerate our investigation and resolution. We’ll provide updates as soon as we have more information or a full resolution.
← Back