AWS outage - Amazon nears solution as Zoom, Slack, Canva among work apps hit
AWS outage takes down many major work apps

If you're still struggling to connect to your work apps today, you aren't alone - Amazon Web Services (AWS) is suffering a major outage.
After an issue in one of its US regions overnight, users in Europe have been unable to access services such as Zoom, Slack and Canva this morning - all of which run on AWS systems - and the issue has now spread to the US, where users are reporting problems, although Amazon claims a fix is coming soon.
Follow our live updates below...
Things are continuing to improve, with the latest AWS update reading as follows:
"Our mitigations to resolve launch failures for new EC2 instances continue to progress and we are seeing increased launches of new EC2 instances and decreasing networking connectivity issues in the US-EAST-1 Region."
"We are also experiencing significant improvements to Lambda invocation errors, especially when creating new execution environments (including for Lambda@Edge invocations)"
Amazon promised a further technical update - and here it is...
"Our mitigations to resolve launch failures for new EC2 instances are progressing and the internal subsystems of EC2 are now showing early signs of recovering in a few Availability Zones (AZs) in the US-EAST-1 Region. We are applying mitigations to the remaining AZs at which point we expect launch errors and network connectivity issues to subside."
So the end may (finally) be in sight!
Elsewhere, Amazon has released a statement on its public press page - however there's no new information as far as we can see, just a truncated summary of what's happened already today, and what the cause was.
The journey towards a fix is continuing, with another significant step forward.
"We continue to apply mitigation steps for network load balancer health and recovering connectivity for most AWS services<" the latest update reads.
"Lambda is experiencing function invocation errors because an internal subsystem was impacted by the network load balancer health checks. We are taking steps to recover this internal Lambda system. For EC2 launch instance failures, we are in the process of validating a fix and will deploy to the first AZ as soon as we have confidence we can do so safely. "
And another quick update - it's great to have such transparency from AWS on this.
"We have taken additional mitigation steps to aid the recovery of the underlying internal subsystem responsible for monitoring the health of our network load balancers and are now seeing connectivity and API recovery for AWS services. We have also identified and are applying next steps to mitigate throttling of new EC2 instance launches."
More from AWS on the root causes - "We have narrowed down the source of the network connectivity issues that impacted AWS Services. The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers. We are throttling requests for new EC2 instance launches to aid recovery and actively working on mitigations.
We may be edging closer to a solution - AWS notes, "We have identified that the issue originated from within the EC2 internal network."
"We continue to investigate and identify mitigations."
Belay that warning - AWS has already updated to say, "We are seeing early signs of recovery for the connectivity issues and are continuing to investigate the root cause."
What a day it's been!
What we feared might happen...has happened.
The latest update from AWS' status page notes, "We can confirm significant API errors and connectivity issues across multiple services in the US-EAST-1 Region."
"We are investigating and will provide further update in 30 minutes or soon if we have additional information."
DownDetector is showing some slight increases in reports on AWS, plus a number of other services - including Zoom, Jira, Trello and more.
Fortunately though, all key customers depending on AWS do seem to be up and running again, with Slack, Zoom, Canva and more all working as expected.
So bad luck - no Monday lie-in today!
Welcome back - we're still monitoring for any knock-on effects of this morning's AWS outage as the US comes online.
AWS says it is still hard at work on the issue, with its status page noting, "Oct 20 4:48 AM PDT We continue to work to fully restore new EC2 launches in US-EAST-1. We recommend EC2 Instance launches that are not targeted to a specific Availability Zone (AZ) so that EC2 has flexibility in selecting the appropriate AZ. The impairment in new EC2 launches also affects services such as RDS, ECS, and Glue. We also recommend that Auto Scaling Groups are configured to use multiple AZs so that Auto Scaling can manage EC2 instance launches automatically."
As everything now seems to be back in order, we're going to take a quick break - but we'll keep monitoring for any further updates or issues, particularly as the US comes online over the next few hours - fingers crossed this doesn't cause any more issues!
And in more "good" news - Zoom is now showing zero issues or problems, so you can connect and chat to your heart's content!
AWS is now tying up all the loose ends - its latest update notes, "We are continuing to work towards full recovery for EC2 launch errors (which may manifest as an Insufficient Capacity Error). Additionally, we continue to work toward mitigation for elevated polling delays for Lambda, specifically for Lambda Event Source Mappings for SQS."
The good news is that this incident doesn't seem to have been a cyberattack - but instead, a case of AWS' own systems suffering under their own weight.
Rafe Pilling, Director of Threat Intelligence at Sophos, told us, "When anything like this happens the concern that it's a cyber incident is understandable. AWS has a far reaching and intricate footprint, so any issue can cause a major upset.
"In this case it looks like it is an IT issue on the database side and they will be working to remedy it as an absolute priority."
Here's a seemingly-final update from AWS - the company is pretty satisfied the issue is now over, but is still urging caution for users...
"The underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now," the company says.
"Some requests may be throttled while we work toward full resolution. Additionally, some services are continuing to work through a backlog of events such as Cloudtrail and Lambda. While most operations are recovered, requests to launch new EC2 instances (or services that launch EC2 instances such as ECS) in the US-EAST-1 Region are still experiencing increased error rates. We continue to work toward full resolution. If you are still experiencing an issue resolving the DynamoDB service endpoints in US-EAST-1, we recommend flushing your DNS caches. We will provide an update by 4:15 AM, or sooner if we have additional information to share."
Outage reports for Slack, Zoom, Canva and Xero have all basically fallen to nothing, although the status pages for the first two are still showing some issues, so we'll stay tuned for anything happening there...
Good news - AWS now thinks it has solved this issue, and services should be returning to normal very soon.
"We continue to observe recovery across most of the affected AWS Services," it says on its status page. "We can confirm global services and features that rely on US-EAST-1 have also recovered. We continue to work towards full resolution and will provide updates as we have more information to share."
AWS has updated the severity status of the issues to "degraded" on its status page - which again could mean a solution is imminent...
However it's worth noting that the east coast of the US is about to wake up and log on - could this affect the recovery?
We're not sure exactly what happened at Slack - but it's suddenly just had another major spike in outage reports.
The status page is still showing widespread issues, so it may simply be more users logging on and seeing problems - or possibly something more?
Some expert insight from James Capell, our Editor on Web Hosting here at TechRadar Pro...
"The outage appears to be caused by a DNS resolution error for DynamoDB in the US-EAST-1 region. The DynamoDB database is used for many core AWS services including IAM which is used for permissions. The DNS error means that this database service cannot be accessed by the services that require it to function. Since most AWS services rely on this service somewhere in the chain we’re seeing a lot of problems."
Slack and Zoom are both still reporting issues on their respective status pages, but both have promised an update within the next 30 minutes.
As you can see from the screenshot below, DownDetector is showing a rapid drop-off in reports...
And just like that - another update from AWS, and it's good news for all of us wanting to get on with work.
"Oct 20 2:27 AM PDT We are seeing significant signs of recovery. Most requests should now be succeeding. We continue to work through a backlog of queued requests. We will continue to provide additional information."
A new update from AWS - "Oct 20 2:22 AM PDT We have applied initial mitigations and we are observing early signs of recovery for some impacted AWS Services. During this time, requests may continue to fail as we work toward full resolution. We recommend customers retry failed requests."
"While requests begin succeeding, there may be additional latency and some services will have a backlog of work to work through, which may take additional time to fully process. We will continue to provide updates as we have more information to share, or by 3:15 AM."
Outage reports are now falling from their peak at both Slack and Zoom, but it seems like issues still persist across the board.
My Slack access has just totally collapsed, meaning I can't contact my team or find out what they're working on - will no-one think of the poor editors?
Over at Zoom, it seems several parts of the platform have been affected, with its status page reporting several issues.
Zoom Chat, file transfers, Zoom Clips and Zoom Contact Center are among the services showing "degraded performance".
It's not just Slack and Zoom - DownDetector is also showing issues for other workplace tools, with Asana, Atlassian, Xero and Jora all affected (although reports do seem to be falling now)
According to AWS's own status page, the issue seems to stem from Amazon DynamoDB, which is the company's managed NoSQL database platform - an important building block for many customers and apps.
Slack is one of the hardest-hit services, with issues across the board.
We're Slack users here at TechRadar Pro, and have seen issues sending messages, links and more - so you're not alone.
We've since seen major outages for a number of consumer-focused services, along with work-focused tools - outage tracker site DownDetector is showing the following...
Welcome to our live coverage of this major AWS outage.
The issues seemed to start in the early hours of Monday morning, with AWS' US-EAST-1 Region seeing problems, which have caused a knock-on effect across the globe.