Twitter down: why so many Fail Whales?

Twitter Fail Whale
Twitter down, and the whale comes up

There are two things everybody knows about Twitter: lots of famous people use it, and Twitter is down a lot.

The Fail Whale, Twitter's cute "oops! We're over capacity!" page, was a regular sight in Twitter's early days, but the service has made lots of improvements to cope with demand. Or at least, that's what we've been told.

So why has the Fail Whale been as much a part of this year's World Cup as the dreaded vuvuzela?

The short answer is demand. Twitter's extraordinary growth has seen it go from 2.5 million tweets per day in January 2009 to 50 million in January 2010, and the numbers just keep on growing: it averaged 55 million tweets per day in April 2010 and is currently sitting at an enormous 65 million tweets per day.

On a typical day Twitter deals with an average of 750 tweets per second, but during popular events those numbers go through the roof. Writing on the official Twitter blog, Sean Garrett notes that June's game between the LA Lakers and the Boston Celtics generated a record 3,085 tweets per second. That record didn't last long: Japan's World Cup game against Denmark saw Twitter hit a new peak of 3,283 tweets per second.

Those peaks in demand can overwhelm websites, as Mike Bromilow, country manager for UK, Middle East and Africa at website performance experts Keynote Systems, explains. "This year's World Cup has really tested online resources such as sporting sites and social networks as when goals are scored during big matches, there has been a huge reaction online," he says.

World Cup fail

"A good example of this was during England's World Cup match against Slovenia. According to our statistics, Twitter's availability dropped to just over 30 percent in the UK during the match."

Hence the Fail Whale. But why does Twitter fall down when other high profile sites don't? "Our statistics show that other free social networking sites such as LinkedIn and Facebook consistently outperform Twitter in terms of availability and download speeds," Bromilow says.

To be fair, LinkedIn is a business-focused network so you wouldn't expect it to have much traffic during a World Cup match, and while Facebook's status updates are as prone to peaks as Twitter tweets are, Facebook has been dealing with hundreds of millions of visitors for considerably longer. Twitter has been beefing up its infrastructure, but it hasn't always done it right.

Sean Garrett admits that June has been "Twitter's worst month since last October." As he explains, the site has been trying to tweak its systems while dealing with record traffic levels.

"We have long-term solutions that we are working towards, but in the meantime we are making real-time adjustments so that we can grow our capacity and avoid outages during the World Cup," he says. "We have uncovered unexpected deeper issues and have even caused inadvertent downtime as a result of our attempts to make changes."

Complications

As Bromilow points out, "A site falling over is very rarely bad luck. With all sites, problems of this nature are typically due to a lack of planning, load testing and site performance monitoring which can leave sites unprepared for an influx of site visitors at one particular time."

In Twitter's case it says it did plan, it did test and it did monitor performance - but just as traffic levels started to soar, it discovered that things were more complicated than expected. As Sean Garrett explains, "we were well aware of the likely impact of the World Cup. What we didn't anticipate was some of the complexities that have been inherent in fixing and optimizing our systems before and during the event."

So does any of this matter? Twitter is keen to point out that, according to Pingdom, it's still achieving over 98% uptime - but the same reports show that Twitter is getting slower, and of course 98% uptime is only impressive if that 98% includes the times when you actually want to use the service.

As Mark Bromilow points out, the danger for a real-time communications site such as Twitter is obvious. "If a particular social networking site isn't performing as it should and people are unable to voice their opinion in this way, they will soon lose patience and may look to using another site in the first instance in future."

So far, it seems that the outages aren't doing Twitter any serious damage: if you check out the traffic graphs on Alexa, Twitter just gets more and more popular.

But social network users are a fickle bunch, and fortunes can change overnight: one day you're surfing an ever-increasing wave of traffic, and the next you're watching all your users flee. Just ask MySpace.

Carrie Marshall
Contributor

Writer, broadcaster, musician and kitchen gadget obsessive Carrie Marshall (Twitter) has been writing about tech since 1998, contributing sage advice and odd opinions to all kinds of magazines and websites as well as writing more than a dozen books. Her memoir, Carrie Kills A Man, is on sale now. She is the singer in Glaswegian rock band HAVR.