All systems are operational.
The issue is now completely resolved. We have not identified any ongoing problems. Our service was not impacted during the incident.
The upstream has traced the issue back to a recently applied adjustment in the area of network connection management. This change was intended to improve stability under high load but, in this case, had the opposite effect and led the host into an unstable state under load.
We will now slowly begin routing traffic and bots back to Cluster 2. If any issues occur, we will back off until a later, more stable point.
In addition, we will closely monitor the performance of the cluster and affected traffic.
The upstream provider has identified the issue and is working on a fix. We have also detected a second outage. User traffic was never partially routed back to Cluster 2, so our service was not affected.
This second outage now appears to have resolved again. We are awaiting further updates and a detailed report from the upstream provider. Traffic will not be routed back to Cluster 2 until the issue is fully resolved.
In the meantime, we will continue to monitor the situation for any potential future impact.
The upstream issue appears to be resolved. Cluster 2 is back online. As stated previously, our service was not impacted by the event.
We are currently awaiting a report on the root cause of the upstream outage. The upstream provider is still analysing the exact cause of the incident, and we will keep you updated as soon as we receive further information.
In the meantime, we will continue to monitor the situation for any potential future impact.
Alerts were triggered due to upstream outages. We are investigating the issue. All services should continue to operate normally, as only one cluster is affected.
Incident UUID 8a6ca319-a28b-4ae1-852d-2dac53b09933