Google has confessed that a capacity miscalculation was responsible for the outage of its Gmail webmail service on Monday.
The Gmail service was down for under two hours, but inevitably attracted major attention as millions were left without their email.
"We know how many people rely on Gmail for personal and professional communications, and we take it very seriously when there's a problem with the service," said Google on the Gmail blog.
"Thus, right up front, I'd like to apologise to all of you — today's outage was a Big Deal, and we're treating it as such. We've already thoroughly investigated what happened, and we're currently compiling a list of things we intend to fix or improve as a result of the investigation."
Google explains that a miscalculation when servers were taken offline for routine maintenance caused the problem.
"…we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response."
The company also explained that it has already started to introduce measures to stop this kind of outage occurring again – although the adverse PR generated by the problem will not aid in Google's desire to get businesses to leave traditional email solutions and move to webmail.
"What's next: We've turned our full attention to helping ensure this kind of event doesn't happen again. Some of the actions are straightforward and are already done — for example, increasing request router capacity well beyond peak demand to provide headroom.
"Some of the actions are more subtle — for example, we have concluded that request routers don't have sufficient failure isolation
"We'll be hard at work over the next few weeks implementing these and other Gmail reliability improvements — Gmail remains more than 99.9% available to all users, and we're committed to keeping events like today's notable for their rarity."