Skip to main content

AWS outage - Amazon confirms recovery after Zoom, Slack, Canva among work apps hit

AWS outage took down many major work apps

AWS re:Invent 2024
(Image credit: © Future / Mike Moore)

If you were struggling to connect to your work apps, you aren't alone - Amazon Web Services (AWS) suffered a major outage.

After an issue in one of its US regions overnight, users were unable to access services such as Zoom, Slack and Canva - all of which run on AWS systems - before the issue spread to the US.

Refresh

Welcome to our live coverage of this major AWS outage.

We've since seen major outages for a number of consumer-focused services, along with work-focused tools - outage tracker site DownDetector is showing the following...

A Downdetector graph showing outages relating to Amazon Web Services

(Image credit: Downdetector)

Slack is one of the hardest-hit services, with issues across the board.

Slack outage AWS issues

(Image credit: Future / Mike Moore)

According to AWS's own status page, the issue seems to stem from Amazon DynamoDB, which is the company's managed NoSQL database platform - an important building block for many customers and apps.

It's not just Slack and Zoom - DownDetector is also showing issues for other workplace tools, with Asana, Atlassian, Xero and Jora all affected (although reports do seem to be falling now)

Over at Zoom, it seems several parts of the platform have been affected, with its status page reporting several issues.

My Slack access has just totally collapsed, meaning I can't contact my team or find out what they're working on - will no-one think of the poor editors?

Outage reports are now falling from their peak at both Slack and Zoom, but it seems like issues still persist across the board.

Zoom Slack outage Aws

(Image credit: Future / Mike Moore)

Zoom Slack outage Aws

(Image credit: Future / Mike Moore)

A new update from AWS - "Oct 20 2:22 AM PDT We have applied initial mitigations and we are observing early signs of recovery for some impacted AWS Services. During this time, requests may continue to fail as we work toward full resolution. We recommend customers retry failed requests."

And just like that - another update from AWS, and it's good news for all of us wanting to get on with work.

Slack and Zoom are both still reporting issues on their respective status pages, but both have promised an update within the next 30 minutes.

DownDetector AWS outage

(Image credit: DownDetector)

Some expert insight from James Capell, our Editor on Web Hosting here at TechRadar Pro...

"The outage appears to be caused by a DNS resolution error for DynamoDB in the US-EAST-1 region. The DynamoDB database is used for many core AWS services including IAM which is used for permissions. The DNS error means that this database service cannot be accessed by the services that require it to function. Since most AWS services rely on this service somewhere in the chain we’re seeing a lot of problems."

We're not sure exactly what happened at Slack - but it's suddenly just had another major spike in outage reports.

Downdetector slack outage

(Image credit: Downdetector)

AWS has updated the severity status of the issues to "degraded" on its status page - which again could mean a solution is imminent...

Good news - AWS now thinks it has solved this issue, and services should be returning to normal very soon.

Outage reports for Slack, Zoom, Canva and Xero have all basically fallen to nothing, although the status pages for the first two are still showing some issues, so we'll stay tuned for anything happening there...

Here's a seemingly-final update from AWS - the company is pretty satisfied the issue is now over, but is still urging caution for users...

"Some requests may be throttled while we work toward full resolution. Additionally, some services are continuing to work through a backlog of events such as Cloudtrail and Lambda. While most operations are recovered, requests to launch new EC2 instances (or services that launch EC2 instances such as ECS) in the US-EAST-1 Region are still experiencing increased error rates. We continue to work toward full resolution. If you are still experiencing an issue resolving the DynamoDB service endpoints in US-EAST-1, we recommend flushing your DNS caches. We will provide an update by 4:15 AM, or sooner if we have additional information to share."

The good news is that this incident doesn't seem to have been a cyberattack - but instead, a case of AWS' own systems suffering under their own weight.

"In this case it looks like it is an IT issue on the database side and they will be working to remedy it as an absolute priority."

AWS is now tying up all the loose ends - its latest update notes, "We are continuing to work towards full recovery for EC2 launch errors (which may manifest as an Insufficient Capacity Error). Additionally, we continue to work toward mitigation for elevated polling delays for Lambda, specifically for Lambda Event Source Mappings for SQS."

And in more "good" news - Zoom is now showing zero issues or problems, so you can connect and chat to your heart's content!

Zoom status page

(Image credit: Zoom)

As everything now seems to be back in order, we're going to take a quick break - but we'll keep monitoring for any further updates or issues, particularly as the US comes online over the next few hours - fingers crossed this doesn't cause any more issues!

Welcome back - we're still monitoring for any knock-on effects of this morning's AWS outage as the US comes online.

Fortunately though, all key customers depending on AWS do seem to be up and running again, with Slack, Zoom, Canva and more all working as expected.

What we feared might happen...has happened.

"We are investigating and will provide further update in 30 minutes or soon if we have additional information."

DownDetector is showing some slight increases in reports on AWS, plus a number of other services - including Zoom, Jira, Trello and more.

Belay that warning - AWS has already updated to say, "We are seeing early signs of recovery for the connectivity issues and are continuing to investigate the root cause."

We may be edging closer to a solution - AWS notes, "We have identified that the issue originated from within the EC2 internal network."

More from AWS on the root causes - "We have narrowed down the source of the network connectivity issues that impacted AWS Services. The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers. We are throttling requests for new EC2 instance launches to aid recovery and actively working on mitigations.

And another quick update - it's great to have such transparency from AWS on this.

The journey towards a fix is continuing, with another significant step forward.

"Lambda is experiencing function invocation errors because an internal subsystem was impacted by the network load balancer health checks. We are taking steps to recover this internal Lambda system. For EC2 launch instance failures, we are in the process of validating a fix and will deploy to the first AZ as soon as we have confidence we can do so safely. "

Elsewhere, Amazon has released a statement on its public press page - however there's no new information as far as we can see, just a truncated summary of what's happened already today, and what the cause was.

Amazon promised a further technical update - and here it is...

So the end may (finally) be in sight!

Things are continuing to improve, with the latest AWS update reading as follows:

"We are also experiencing significant improvements to Lambda invocation errors, especially when creating new execution environments (including for Lambda@Edge invocations)"

And that (as they say) should be that. AWS' latest update confirms, "We continue to observe recovery across all AWS services, and instance launches are succeeding across multiple Availability Zones in the US-EAST-1 Regions. "

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.