Resolved -
We believe this issue to be fully resolved, but some customers might have problems due to data loss caused by the database restoration process. Support tickets should be raised for help resolving those issues. To be clear, the type of data loss incurred would be database only. This might means problems with navigating folders within the UI. Files that exist in S3 buckets in us-west-2 would not be affected at all.
An RCA will provided within the next 48 hours.
Jul 24, 17:57 PDT
Monitoring -
A fix has been implemented and we are monitoring the results.
Jul 24, 17:32 PDT
Update -
Service is restored again. We have taken action to resolve the cause of the issue. We will continue to work with our database vendor and monitor the environment closely.
Jul 24, 17:32 PDT
Update -
We are performing database restoration actions again from our July 24, 2024, 12:36 PM backup. We have identified the cause of the issue and will perform some additional actions after database restoration is complete to prevent further disruptions going forward.
Jul 24, 17:13 PDT
Identified -
We are working with out database vendor at the moment to bring the cluster back online.
Jul 24, 16:58 PDT
Update -
As we were actively monitoring the database, it entered a bad state due to sudden spikes in resource consumption. We are working on resolving the issue.
Jul 24, 16:45 PDT
Monitoring -
We have finished operations to bring the database back to a known good state. Service has been restored, but there would have been some data loss as we restored from a backup that was from July 24, 2024, 12:36. Support tickets should be raised should your organization have continued issues.
We are continuing to monitor and are evaluating preventative measures to ensure we avoid the issue re-occurring.
Jul 24, 16:12 PDT
Update -
Re-sharding continues. We are also working on other paths to restore service.
Jul 24, 15:31 PDT
Update -
We are working on resharding data in our database. The issue seems to be the result of a huge spike in usage that our current database settings were not configured appropriately to handle. Another update will happen in 30 minutes.
Jul 24, 15:00 PDT
Identified -
We are continuing to work to restore this database. Our next update will be in 30 minutes.
Jul 24, 14:29 PDT
Update -
We have identified the problem as belonging to the database servicing this region. We are currently working with our database vendor to bring it back online.
Jul 24, 14:00 PDT
Investigating -
We have been alerted to a service disruption affecting: ATS AWS US-WEST-2. Our engineers are currently investigating the incident and will provide updates when more information is available.
Jul 24, 13:22 PDT