After investigating the November 19 worldwide multi-factor-authentication outage, Microsoft's Azure team has revealed the root causes of the disruption that affected a number of its users.
The team has discovered three independent root causes along with monitoring gaps that resulted in Azure, Office 365, Dynamics and other Microsoft users from being unable to authenticate for most of that day.
Microsoft's Azure Active Directory Multi-Factor Authentication (MFA) services were down for many customers for 14 hours on November 19 and since Office 365 and Dynamics also use this service to authenticate, their users were also affected.
The first root cause appeared as a latency issue in the MFA front-end's communication to its cache services. The second was a race condition in processing responses from the MFA back-end server. A code update rollout, which began in some data centres on Tuesday November 13, was responsible for these two causes.
A third root cause, which was triggered by the second, led the MFA back-end to be unable to process any more requests from the front-end despite the fact that it appeared to be working correctly based on Microsoft's monitoring.
Future MFA improvements
European, Middle Eastern and African (EMEA) and Asian Pacific (APAC) customers were the first users to be affected by these issues. However, as the day continued, Western European and then later American data centres were hit.
Microsoft has laid out a series of next steps to further improve its MFA service including a review of its update-deployment procedures, a review of its monitoring services a review of the containment process and an update to the communications process for the Service Health Dashboard and monitoring tools.
The company plans to have most of these steps completed by December with the exception of its containment process review which it aims to complete by January.
- We've also highlighted the best VPN