Disaster Recovery & Identity:Entra ID can be a Single Point of Failure

When organisations plan for disaster recovery, they often start with infrastructure, servers, storage and virtual machines. Essentially, the systems that host applications and hold data. While this is a logical place to begin, it's not the complete picture.

There is a dependency that has started underlying almost every system that users access, every application businesses depend on, and every administrative action IT teams can take. It used to live on a server in your data centre, but no longer. It isn't always backed up as part of a standard DR runbook… and it's frequently the last thing tested in a recovery scenario.

That dependency is identity, and in many UK businesses, it is controlled by Microsoft Entra ID. This article explores why Entra ID has become the control plane of the modern organisation, what happens when it isn't included in disaster recovery services and planning, and what identity-resilient recovery looks like in practice.

What is Microsoft Entra ID?

Microsoft Entra ID - previously known as Azure Active Directory (Azure AD) is Microsoft's cloud-based identity and access management platform. It is the system that controls who can authenticate to your applications, who has administrative rights, how conditional access policies are enforced, and how services are permitted to access data.

If your organisation uses Microsoft 365, Azure, or any modern SaaS application with single sign-on, Entra ID is almost certainly involved. In many organisations, it is present in every login, every service-to-service interaction, and every privileged operation, whether users realise it or not.

That reach is what makes it so critical to disaster recovery. It is the layer through which access to your entire operating environment is granted, controlled, and revoked. When it functions normally, it's invisible. When it doesn't - or when it's recovered incompletely - the consequences are often felt immediately, and broadly.

The role of Entra ID in Modern IT Environments

Understanding why Entra ID matters so much to recovery requires understanding how deeply it's embedded in day-to-day operations, and most organisations underestimate this until they're under pressure.

Authentication for Microsoft 365 and Azure:

Every user accessing Teams, Outlook, SharePoint, or any Azure-hosted service authenticates through Entra ID. If the authentication layer isn't available or correctly configured, users cannot access these services, regardless of whether the underlying data or infrastructure has been restored.

SaaS single sign-on dependencies:

Modern SaaS stacks rely on federated identity. HR platforms, CRM tools, project management software, and finance systems - many of these are configured to authenticate through Entra ID via SAML or OpenID Connect. If Entra ID isn't correctly in place during recovery, access to these applications may fail even if those applications themselves are entirely unaffected by the incident.

Conditional access and MFA enforcement:

Conditional access policies govern how users can authenticate, from which devices, from which locations, and under which conditions. In a recovery scenario, the environment changes, meaning devices may be different, and locations may be unusual. If policies haven't been validated as part of DR planning, they can block legitimate access to recovered systems at exactly the moment it's needed.

Service-to-service authentication:

Most modern applications don't just rely on user authentication; they rely on application identities, managed identities, and service principals configured within Entra ID. These are the credentials that allow systems to communicate with each other automatically. If these identities aren't restored, or if their permissions have changed, application integrations break even when the applications themselves appear healthy.

Privileged role management:

Global administrators, security administrators and Exchange administrators are privileged roles in Entra ID and determine who has the access needed to conduct recovery operations. If role assignments have been tampered with, corrupted, or simply not documented, the ability to manage the recovery environment itself may be compromised.

Identity risks in cyber incident recovery and ransomware

The identity risks in a standard infrastructure failure are significant, yet in a cyber incident, particularly ransomware, they are considerably more complex.

Ransomware actors rarely move directly to file encryption. Before any payload is deployed, the most capable threat actors spend time in an environment mapping it, escalating privileges, and establishing continuity. Entra ID is a particularly attractive target during this phase because control over identity means control over access - and access to the recovery environment itself.

This creates a set of risks that aren't resolved simply by restoring infrastructure from backup.

Identity as an attack vector:

Compromised accounts, including privileged administrator accounts, may have been used to stage the attack. Restoring an environment while those accounts remain active risks restoring into a position of continued exposure.

Privilege escalation and persistence:

Attackers may create new accounts, modify role assignments, or add credentials to existing service principals before triggering the primary incident. These changes can survive infrastructure recovery if Entra ID isn't restored to a known-good state.

Validating identity integrity before failover:

Failing over to a recovery environment without first confirming the integrity of your identity platform may simply be handing control of a clean environment to an attacker who still has a foothold in identity. This is why identity validation must be an explicit step in cyber incident recovery - not an assumption made after the fact.

Segregating recovery control planes:

Best-practice recovery design separates the administrative controls used to manage the recovery environment from those of the production environment. This ensures that if production identity is compromised, the integrity of recovery operations isn't threatened by the same access paths.

Designing identity-resilient Disaster Recovery

Addressing identity within disaster recovery isn't about adding complexity. It's about recognising a dependency that already exists and planning for it with the same attention applied to infrastructure and data.

Mapping identity dependencies:

The starting point is understanding which systems, applications, and services depend on Entra ID and how. User authentication, service principals, managed identities, and external associated relationships all need to be documented. Without this map, recovery sequencing will be incomplete.

Validating access during DR testing:

A DR test that confirms infrastructure is online but doesn't validate that users can actually log in, that applications can authenticate, and that administrative access works correctly is an incomplete test. Access validation should be a formal checkpoint in every recovery exercise.

Segregated administrative controls:

Recovery environments should have their own administrative accounts, credentials, and access paths isolated from production identity and stored securely outside the systems that might be affected by an incident. This isn't just good practice for ransomware scenarios; it's good practice for any event that might leave production identity in an uncertain state.

Documentation and change management:

Entra ID configurations, including conditional access policies, role assignments, service principals and application registrations, all change over time. If those changes aren't documented and reflected in DR plans, there's a growing gap between what a recovery exercise validated and what the live environment actually requires. Keeping identity documentation current is just as important as keeping infrastructure runbooks current.

Continuous validation of identity resilience:

Just as DR testing should follow change rather than just a calendar, identity validation should be triggered whenever material changes are made to Entra ID configurations. A significant change to a conditional access policy, a change in privileged role assignments, or a new application integration are all reasons to confirm that recovery still works as expected.

Common identity failures during disaster events

Across both planned recovery tests and live incidents, identity-related dependencies can cause extended recovery timelines. They may also be overlooked as failures, as access control often does not appear in traditional infrastructure monitoring.

Conditional access misalignment:

Policies configured for normal operating conditions can block access in a recovery scenario. Users on unfamiliar devices or from unusual locations are denied authentication even though the systems they need are fully available.

MFA enforcement issues:

Multi-factor authentication is an important security control, but during a recovery event, the devices and authentication methods users normally rely on may not be available. If recovery procedures don't account for this, MFA enforcement becomes a barrier to recovery rather than a protection against a threat.

Expired certificates:

Application registrations and service principals in Entra ID rely on certificates and secrets that have expiry dates. These are routinely managed in normal operations, but they're easy to miss in recovery scenarios, particularly if they expire while a recovery is in progress or if documentation is out of date.

Undocumented identity recovery procedures:

Organisations that have recovery procedures for infrastructure but no equivalent documentation for identity are at risk. When something goes wrong with Entra ID disaster recovery, the team has no established process to follow, and that delay compounds the impact of the original incident.

Recovery is about access, not just Infrastructure

Recovery is only complete when users and systems can access what they need to operate. Infrastructure that is online but inaccessible isn't recovery - it's a different kind of outage.

Entra ID is the layer that determines whether access is possible. Identity-first recovery thinking means including Entra ID Disaster Recovery in your scope from the outset - mapping its dependencies, validating its integrity as part of every test, and ensuring it can be restored to a known-good state when it matters.

For organisations already using Microsoft 365 or Azure, the tools to protect identity exist. Our CloudCover 365 service offers Microsoft Entra ID backup as part of a comprehensive approach to Microsoft 365 resilience, ensuring that identity data is protected, restorable, and validated alongside the rest of your Microsoft environment.

 


Let's talk

If you'd like to understand how identity fits into your current DR posture, our team is ready to help you map the gap between where you are today and where resilience requires you to be. Get in touch to speak directly with our experts.

Related Blogs

How Microsoft 365 E7 Strengthens AI Workloads

For frontier firms – the pioneers adopting an AI-first approach to business – cyber security is a growing concern. AI is invaluable for efficiency, productivity, and intelligence. However, with multiple...

Explore this story

Why Backing Up Microsoft Entra ID is Crucial for Business Continuity and Security

In today’s digital landscape, Microsoft Entra ID (formerly Azure AD) plays a pivotal role in managing access and security for numerous organisations. It acts as an orchestrator, in terms of access...

Explore this story

How to assess your cyber security risk & make improvements

A new report published recently in InformationWeek looks at how enterprises are attacking the issue of cyber security.

Explore this story

Email Sandboxing isn’t enough to secure data against a Phishing attack

In the on-going data security war our support team is in the front line, helping many new businesses recover from, and prevent security breaches, including Phishing attacks.

Explore this story