Firebase Status Dashboard

2 min read Original article ↗

SUMMARY:

All Realtime Databases were inoperable for at least some of the period between 12:00pm and 13:30pm on November 27th. This also affected database panels in the admin console, utilization statistics recorded during the incident, and Hosting deploys.

DETAILED DESCRIPTION OF IMPACT:

The first failures began at 12:00pm. This affected all database read and write operations. At 12:04pm the Firebase on-call began investigating after receiving an alert from our monitoring tools. The on-call declared an incident and began restoring services. The final service was restored at 13:30pm, marking the end of the incident.

In addition to the loss of read/write to Realtime Database instances, this incident rendered database-related features in the Firebase Console inoperable. Other parts of the console were unaffected. Due to a dependency on the Realtime Database, developers were also unable to deploy to Firebase Hosting.

Additionally, database stats during this period accurately displayed abnormally high spikes in utilization (often more than 100% of available capacity), since many servers were overloaded during the incident.

ROOT CAUSE:

A widespread failure in a Cloud data center caused a failure of the Realtime Database's disk persistence layer. Because the reads and writes couldn't be committed to disk, the services eventually became overloaded and were unable to serve traffic. Since Realtime Database does not yet have multi-region redundancy, there was no failover mechanism to mitigate.

REMEDIATION AND PREVENTION:

To reduce the time required to root cause issues like this in the future, we will add additional monitoring for disk failures and configure alerts based on the monitoring.

We launched a full investigation of the service logs to understand the outage and determine additional remediation.

We will continue to work on improved redundancy. For example, we released our next-gen, highly scalable database, Firestore (currently in beta) to address some of these needs.

Firebase Hosting is reducing dependencies on the Realtime Database for mission-critical operations to avoid deploy outages.