The failure of zone "europe-west2-a" last month was, according to Google, down to not maintaining a safe operating temperature due to a simultaneous failure of multiple, redundant cooling systems combined with the "extraordinarily high" outside temperatures.
The failure impacted numerous Google services, including Google Compute Engine, Persistent Disk (PD), and Google Cloud Storage, causing instance terminations, service degradation, and networking issues.
What actually happened?
Google engineers powered down the data center that hosted a portion of the impacted zone Europe-west2-a while the cooling system was repaired
The total impact on cloud services was estimated at 18 hours and 23 minutes.
This is fairly disturbing news, particularly considering how Google claims these regional services are "designed to survive the failure of a single zone".
Google attributed the mistake to inadvertently modifying traffic routing for internal services to avoid all three zones in the "europe-west2" region, rather than just the impacted "europe-west2-a" zone.
The routing incident stopped customers from being able to access data from regional storage services, including GCS and BigQuery, across multiple zones.
Will this happen again?
News like this is understandably pretty scary if you are concerned about global warming, as the UK might well be seeing quite a few even warmer days in the future.
Luckily, Google made some commitments to stop these types of failures from impacting its cloud hosting ever again.
These included repairing and re-testing its failover automation in an attempt to ensure stronger resilience in its failover protocols during large-scale events such as this one.
The cloud giant is also committed to investigating and developing "more advanced methods" to progressively decrease the thermal load within a single data center space, reducing the probability that a full shutdown is required.
In addition, Google is supposedly set to examine its procedures, tooling, and automated recovery systems for gaps and will be conducting an audit of cooling system equipment and standards across the data centers that house Google Cloud globally.
- Want to move your storage away from external data centers? Check out our guide to the best bare metal storage
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Will McCurdy has been writing about technology for over five years. He has a wide range of specialities including cybersecurity, fintech, cryptocurrencies, blockchain, cloud computing, payments, artificial intelligence, retail technology, and venture capital investment. He has previously written for AltFi, FStech, Retail Systems, and National Technology News and is an experienced podcast and webinar host, as well as an avid long-form feature writer.