Google claims Gmail is available 99.9% of the time, but when that .1% failure rate occurs, users notice.
Such was the case Sept. 23 when a dual network failure affected hordes of Gmail messages. Today, Google is out with some explanation as to what went wrong and how it plans to prevent a similar situation from happening again.
According to an Official Gmail Blog post, two separate, redundant network paths stopped working simultaneously. Though unrelated, the double down networks put the squeeze on message delivery, and emails started piling up at 5:54 a.m. PT/1:54 p.m. BT/11:54 p.m. AEST.
Most message backlog was cleared by 1 p.m. PT, when Google restored and repurposed network capacity, with the rest heading out the virtual door just before 4 p.m. PT.
While 71% of messages weren't affected, Google said the remaining 29% were delayed on average by 2.6 seconds. A blink really, but a small population of messages, about 1.5%, were stalled by about two hours.
Users who tried to download large attachments from affected messages also experienced errors, the blog post acknowledged. Other Gmail functions, such as logging in and reading delivered messages, continued unperturbed.
Google announced in June that Gmail has 425 million users, so an interruption in service of any size is going to impact quite a few people.
To make sure something like this has a slimmer chance of happening again, Google said it has some solution irons in the fire.
First order of business is ensuring the search giant has sufficient capacity, including enough backup capacity for Gmail.
The firm didn't lay out specifics, but Google also said it plans to make Gmail message delivery more resilient to future network capacity shortfalls.
Finally, it will update its internal practices with the aim of developer quicker and more effective responses to network failures.
Work on these improvements and others will get going over the coming weeks.