Skip to main content

AWS outage - Zoom, Slack, Canva among work apps hit by major issues

Work apps, banking service and more all affected in AWS outage

AWS re:Invent 2021 sign
(Image: © Future / Mike Moore)

If you're struggling to connect to your work apps today, you aren't alone - Amazon Web Services (AWS) is suffering a major outage.

After an issue in one of its US regions overnight, users in Europe have been unable to access services such as Zoom, Slack and Canva this morning - all of which run on AWS systems.

Follow our live updates below...

Refresh

We may be edging closer to a solution - AWS notes, "We have identified that the issue originated from within the EC2 internal network."

"We continue to investigate and identify mitigations."

Belay that warning - AWS has already updated to say, "We are seeing early signs of recovery for the connectivity issues and are continuing to investigate the root cause."

What a day it's been!

What we feared might happen...has happened.

The latest update from AWS' status page notes, "We can confirm significant API errors and connectivity issues across multiple services in the US-EAST-1 Region."

"We are investigating and will provide further update in 30 minutes or soon if we have additional information."

DownDetector is showing some slight increases in reports on AWS, plus a number of other services - including Zoom, Jira, Trello and more.

Fortunately though, all key customers depending on AWS do seem to be up and running again, with Slack, Zoom, Canva and more all working as expected.

So bad luck - no Monday lie-in today!

Welcome back - we're still monitoring for any knock-on effects of this morning's AWS outage as the US comes online.

AWS says it is still hard at work on the issue, with its status page noting, "Oct 20 4:48 AM PDT We continue to work to fully restore new EC2 launches in US-EAST-1. We recommend EC2 Instance launches that are not targeted to a specific Availability Zone (AZ) so that EC2 has flexibility in selecting the appropriate AZ. The impairment in new EC2 launches also affects services such as RDS, ECS, and Glue. We also recommend that Auto Scaling Groups are configured to use multiple AZs so that Auto Scaling can manage EC2 instance launches automatically."

As everything now seems to be back in order, we're going to take a quick break - but we'll keep monitoring for any further updates or issues, particularly as the US comes online over the next few hours - fingers crossed this doesn't cause any more issues!

And in more "good" news - Zoom is now showing zero issues or problems, so you can connect and chat to your heart's content!

Zoom status page

(Image credit: Zoom)

AWS is now tying up all the loose ends - its latest update notes, "We are continuing to work towards full recovery for EC2 launch errors (which may manifest as an Insufficient Capacity Error). Additionally, we continue to work toward mitigation for elevated polling delays for Lambda, specifically for Lambda Event Source Mappings for SQS."

The good news is that this incident doesn't seem to have been a cyberattack - but instead, a case of AWS' own systems suffering under their own weight.

Rafe Pilling, Director of Threat Intelligence at Sophos, told us, "When anything like this happens the concern that it's a cyber incident is understandable. AWS has a far reaching and intricate footprint, so any issue can cause a major upset.

"In this case it looks like it is an IT issue on the database side and they will be working to remedy it as an absolute priority."

Here's a seemingly-final update from AWS - the company is pretty satisfied the issue is now over, but is still urging caution for users...

"The underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now," the company says.

"Some requests may be throttled while we work toward full resolution. Additionally, some services are continuing to work through a backlog of events such as Cloudtrail and Lambda. While most operations are recovered, requests to launch new EC2 instances (or services that launch EC2 instances such as ECS) in the US-EAST-1 Region are still experiencing increased error rates. We continue to work toward full resolution. If you are still experiencing an issue resolving the DynamoDB service endpoints in US-EAST-1, we recommend flushing your DNS caches. We will provide an update by 4:15 AM, or sooner if we have additional information to share."

Outage reports for Slack, Zoom, Canva and Xero have all basically fallen to nothing, although the status pages for the first two are still showing some issues, so we'll stay tuned for anything happening there...

Good news - AWS now thinks it has solved this issue, and services should be returning to normal very soon.

"We continue to observe recovery across most of the affected AWS Services," it says on its status page. "We can confirm global services and features that rely on US-EAST-1 have also recovered. We continue to work towards full resolution and will provide updates as we have more information to share."

AWS has updated the severity status of the issues to "degraded" on its status page - which again could mean a solution is imminent...

However it's worth noting that the east coast of the US is about to wake up and log on - could this affect the recovery?

We're not sure exactly what happened at Slack - but it's suddenly just had another major spike in outage reports.

The status page is still showing widespread issues, so it may simply be more users logging on and seeing problems - or possibly something more?

Downdetector slack outage

(Image credit: Downdetector)

Some expert insight from James Capell, our Editor on Web Hosting here at TechRadar Pro...

"The outage appears to be caused by a DNS resolution error for DynamoDB in the US-EAST-1 region. The DynamoDB database is used for many core AWS services including IAM which is used for permissions. The DNS error means that this database service cannot be accessed by the services that require it to function. Since most AWS services rely on this service somewhere in the chain we’re seeing a lot of problems."

Slack and Zoom are both still reporting issues on their respective status pages, but both have promised an update within the next 30 minutes.

As you can see from the screenshot below, DownDetector is showing a rapid drop-off in reports...

DownDetector AWS outage

(Image credit: DownDetector)

And just like that - another update from AWS, and it's good news for all of us wanting to get on with work.

"Oct 20 2:27 AM PDT We are seeing significant signs of recovery. Most requests should now be succeeding. We continue to work through a backlog of queued requests. We will continue to provide additional information."

A new update from AWS - "Oct 20 2:22 AM PDT We have applied initial mitigations and we are observing early signs of recovery for some impacted AWS Services. During this time, requests may continue to fail as we work toward full resolution. We recommend customers retry failed requests."

"While requests begin succeeding, there may be additional latency and some services will have a backlog of work to work through, which may take additional time to fully process. We will continue to provide updates as we have more information to share, or by 3:15 AM."

Outage reports are now falling from their peak at both Slack and Zoom, but it seems like issues still persist across the board.

Zoom Slack outage Aws

(Image credit: Future / Mike Moore)

Zoom Slack outage Aws

(Image credit: Future / Mike Moore)

My Slack access has just totally collapsed, meaning I can't contact my team or find out what they're working on - will no-one think of the poor editors?

Over at Zoom, it seems several parts of the platform have been affected, with its status page reporting several issues.

Zoom Chat, file transfers, Zoom Clips and Zoom Contact Center are among the services showing "degraded performance".

It's not just Slack and Zoom - DownDetector is also showing issues for other workplace tools, with Asana, Atlassian, Xero and Jora all affected (although reports do seem to be falling now)

According to AWS's own status page, the issue seems to stem from Amazon DynamoDB, which is the company's managed NoSQL database platform - an important building block for many customers and apps.

Slack is one of the hardest-hit services, with issues across the board.

We're Slack users here at TechRadar Pro, and have seen issues sending messages, links and more - so you're not alone.

Slack outage AWS issues

(Image credit: Future / Mike Moore)

We've since seen major outages for a number of consumer-focused services, along with work-focused tools - outage tracker site DownDetector is showing the following...

A Downdetector graph showing outages relating to Amazon Web Services

(Image credit: Downdetector)

Welcome to our live coverage of this major AWS outage.

The issues seemed to start in the early hours of Monday morning, with AWS' US-EAST-1 Region seeing problems, which have caused a knock-on effect across the globe.