Data Sovereignty: Keeping Files in Your Jurisdiction

In December 2024, Italy's data protection authority (the Garante) issued a landmark €15 million fine against OpenAI, marking the first major GDPR enforcement action against a generative AI provider. The cited violations included processing personal data for model training without an adequate legal basis, lack of user transparency, and a failure to properly notify regulators of its March 2023 data breach. While the Court of Rome ultimately overturned the financial penalty on appeal in March 2026, the case established a permanent blueprint for how European regulators intend to audit the data ingestion pipelines of LLM developers.

In March 2026, Microsoft changed a default setting for Microsoft 365 Copilot: new enterprise tenants now have "Flex Routing" enabled, meaning Copilot inference requests can be routed to US, Canadian, or Australian data centers when EU capacity is at peak. This is on by default. EU enterprise data has been leaving the EU boundary without administrators necessarily knowing it.

Also, since January 2026, Anthropic has been a subprocessor for Microsoft 365 Copilot. Anthropic's infrastructure is explicitly out of scope for Microsoft's EU Data Boundary. When Copilot uses Anthropic models, which it does, the data leaves the EU regardless of your flex routing settings.

These are not hypothetical scenarios. They are the current state of using AI in a globally regulated environment. For security teams in organizations operating across the EU, UK, India, or Israel, or handling US export-controlled technical data, the data sovereignty question around AI tools is urgent, specific, and not yet adequately answered by any hyperscaler.

What Data Sovereignty Actually Means

Data sovereignty is often used interchangeably with data residency, but they're different problems.

Data residency is where data is physically stored. A data center in Frankfurt stores bits on servers in Germany.

Data sovereignty is who has legal authority over that data and under what conditions. The US CLOUD Act (Clarifying Lawful Overseas Use of Data Act) allows US law enforcement to compel any US-headquartered company to produce data stored anywhere in the world, regardless of the physical location of the servers. A company storing EU data in a Frankfurt data center operated by a US parent corporation is not operating under EU data sovereignty; it's subject to US legal jurisdiction.

This is the core contradiction in "sovereign cloud" offerings from Microsoft Azure, Google Cloud, and AWS. Data residency is what they're describing. Data sovereignty in the legal sense is something they cannot contractually provide, because US law doesn't give them the option.

The CLOUD Act problem means that for any organization whose data is genuinely sensitive, under GDPR, under ITAR, and under India's DPDP Act, using a US-based AI provider creates legal exposure that "where-your-data-is-stored" cannot fully resolve.

The Regulatory Landscape: What Each Jurisdiction Requires

GDPR Chapter V governs international data transfers. Sending personal data to a US-based AI API is a Chapter V transfer requiring a legal mechanism: EU-US Data Privacy Framework certification, Standard Contractual Clauses, or Binding Corporate Rules.

The EU-US Data Privacy Framework survived a September 2025 legal challenge, but remains legally fragile; the ruling can be appealed, and a future Schrems III decision could invalidate it, as Schrems II invalidated Privacy Shield.

More importantly, SCCs alone are often insufficient for high-risk transfers. French and German data protection authorities have published guidance that organizations processing sensitive data via US AI services must implement "supplementary technical measures", specifically, encryption where the AI provider does not hold the decryption keys. This is the CNIL's explicit position. For most commercial AI API usage, this requirement is practically difficult to satisfy without client-side encryption, because the AI model needs to see plaintext to generate useful responses.

The UK renewed its EU adequacy decision on December 19, 2025, valid until December 2031, following passage of the UK Data (Use and Access) Act 2025. However, the UK CLOUD Act exposure is identical: a US AI provider with UK-hosted infrastructure is still subject to US government data access demands.

India: Digital Personal Data Protection Act (DPDP)

India's DPDP Rules were notified in November 2025. Cross-border transfers use a "negative list" model — transfers are permitted to all countries unless the Central Government explicitly blocks them. No blacklist has been published as of May 2026, so cross-border AI processing can largely continue legally, but consent and transparency requirements apply regardless of destination.

The significant risk for AI users: India is actively pursuing mandatory local storage requirements for AI model processing involving Indian citizens' data. Organizations using OpenAI or Anthropic APIs to process Indian user data may be required to host inference infrastructure in India. The DPDP Rules also require Significant Data Fiduciaries to evaluate AI and ML processing under Rule 13 for risk to data principals' rights.

Israel: Privacy Protection Law (Amendment 13)

Amendment 13 took effect August 14, 2025, aligning Israel's framework more closely with GDPR. Pre-transfer risk assessments are now required, evaluating the receiving country's data security environment, potential for surveillance, and data sensitivity.

For AI tool usage, the Israeli Privacy Protection Authority published draft AI guidelines on April 28, 2025, requiring privacy by design, transparency for automated decisions, and prohibiting unlawful data scraping. For EU-originating data held in Israeli databases, EEA Data Regulations (in force since January 2025) apply an additional layer of requirements.

ITAR and Export-Controlled Technical Data

For US defense subcontractors and any organization handling ITAR-controlled technical data, the AI sovereignty problem is also an export compliance problem.

ITAR's "deemed export" rule prohibits making controlled technical data accessible to foreign nationals. AI providers present three specific risk vectors: foreign national employees with system access, international data routing through non-US infrastructure, and foreign corporate ownership of the provider or its subprocessors.

ITAR requires all AI processing of controlled technical data, inference, and logging to occur in approved US facilities accessible only to US persons. Standard enterprise AI API usage does not satisfy this requirement without specific technical controls.

🛠️

CMMC Audit Readiness: For defense subcontractors handling ITAR-regulated data, unauthorized AI routing creates non-deferrable compliance liabilities under CMMC Level 2. Download our comprehensive CMMC Level 2 Compliance Checklist to evaluate your file boundaries and confirm your cryptographic implementations are assessment-ready.

Why "Sovereign Cloud" Isn't Enough

Microsoft's EU Data Boundary, Google's Sovereign Cloud, and AWS's EU regions all address data residency. They don't resolve the fundamental sovereignty problem:

The CLOUD Act still applies: Any US-headquartered company can be compelled to produce data in response to US government orders, regardless of where the data is physically stored. "EU data center" doesn't mean "EU jurisdiction."

Subprocessors matter: Microsoft's use of Anthropic as a Copilot subprocessor illustrates this directly. The EU Data Boundary covers Microsoft's own infrastructure. It does not extend to Anthropic. When Copilot routes a request to an Anthropic model, that data leaves the boundary by design.

Enabling protections can break the product: Microsoft's Double Key Encryption, its strongest protection, where Microsoft holds no key, currently works only with Windows desktop applications and disables search, DLP, co-authoring, and eDiscovery. The features that make Copilot valuable are precisely the ones DKE disables.

The pattern is consistent across hyperscalers: the stronger the sovereignty protection, the more AI functionality it breaks. This creates a practical choice organizations haven't had to make before: use AI productively, or protect data sovereignty. Most current "sovereign AI" offerings don't eliminate that trade-off.

🔍 Is Your AI Stack Creating Data Sovereignty Exposure?

Use our Shadow AI Risk Assessment Checklist to identify every AI system accessing your sensitive data and assess its governance posture.

Download the Free Checklist

The Technical Answer: Encrypt Before the File Reaches the API

The approach that actually resolves the sovereignty tension, without breaking AI functionality for non-sensitive content, is client-side file encryption before data reaches any AI tool.

The logic: if a file containing sensitive data is encrypted with a key that the AI provider never holds, what the provider processes, stores, and potentially routes through US infrastructure is ciphertext. It cannot be read, summarized, or exposed by a data breach at the provider. A CLOUD Act subpoena to the AI provider yields nothing useful because the provider holds no readable data. The legal exposure is substantially reduced.

For ITAR compliance specifically, this maps directly to the encryption safe harbor under § 120.31; properly encrypted technical data is not considered a regulated export because it cannot be accessed without authorized decryption. The key remains under the organization's control. The file can travel to any infrastructure, including US-based AI, without constituting an unauthorized disclosure.

This approach doesn't prevent AI tools from accessing non-sensitive content. It creates a deliberate, controllable boundary: files classified as sensitive are encrypted before they reach any AI API. Files that can be processed freely, are. Sensitive files stay under the organization's cryptographic control regardless of where they travel.

📋

Vendor Boundary Asset: Data sovereignty gaps don't just happen at the cloud API layer, they happen when local files are shared downstream with external AI consulting teams, developers, or contractors. Use our Contractor Offboarding File Security Checklist to audit your technical controls and ensure corporate data doesn't permanently reside on third-party endpoints.

How Theodosian Addresses the AI Data Sovereignty Problem

Theodosian's per-file FIPS 140-3 validated encryption applies protection at the document level, before any file reaches an AI tool, a cloud platform, or a cross-border infrastructure.

The zero-knowledge architecture means Theodosian holds FILE_SEEDs (not encryption keys) to facilitate authorized access. The decryption keys themselves are derived from seeds that the organization maintains in joint custody. No third party, including Theodosian, can access the plaintext content of a protected file. A US government order directed at Theodosian yields nothing readable.

For organizations handling ITAR-controlled technical data via AI tools: protected files satisfy the § 120.31 encryption safe harbor regardless of where they travel. For EU organizations concerned about GDPR Article 46 supplementary measures, file-level encryption where the provider holds no key is the specific technical measure European DPAs have identified as appropriate for high-risk US transfers.

Context-aware access controls add the governance layer: a protected file can be configured to open only for users with the right identity, device posture, and location, regardless of which AI tool, cloud service, or endpoint holds a copy.

The practical result: AI tools can be deployed across global teams without the sovereignty trade-off. Sensitive files stay under jurisdictional control. Non-sensitive content flows freely.

💡

A Framework for Global AI Deployment

For security teams navigating AI deployment across multiple jurisdictions, a practical starting point:

Step 1: Classify before you deploy

Know which data categories are subject to which regulatory frameworks: GDPR personal data, ITAR technical data, India DPDP personal data, HIPAA PHI. AI tools should access only the categories that can be lawfully processed in the infrastructure they run on.

Step 2: Audit your AI subprocessor chain

Your AI provider's contractual data boundary doesn't extend to their subprocessors. Identify every party in the chain that touches your data, and verify whether each satisfies the legal requirements for your most sensitive data categories.

Step 3: Apply file-level encryption to sensitive categories before AI access

For data that cannot be lawfully processed by US-based AI infrastructure, encrypt it before any AI tool touches it. The AI can still access and process non-sensitive content. Sensitive files carry their own access policy.

Step 4: Establish clear AI access policies by data classification

Define which AI tools can access which data categories, under what conditions, and with what audit requirements. Make this part of your information governance framework, not a separate AI policy.

Step 5: Generate evidence of controls

Regulators in the EU, India, and Israel are increasingly examining AI processing practices. Per-file audit logs, showing what was accessed, by whom, and whether access was denied, provide the contemporaneous evidence that demonstrates compliance is operational, not just documented.

💡

🛡️ Take Jurisdictional Control of Your AI Pipelines

Theodosian closes the loophole by applying zero-knowledge, FIPS 140-3 validated encryption directly to the file boundary before it ever encounters an LLM or cross-border infrastructure. Deploy cutting-edge AI models globally without sacrificing structural legal sovereignty.

Discover How With a Free 14-Day Pilot

FAQs: Data Sovereignty and AI

Does the EU-US Data Privacy Framework resolve GDPR transfer concerns for AI tools?

Partially. The EU-US DPF provides a legal mechanism for transferring personal data to DPF-certified US companies. But for high-risk processing, particularly involving sensitive categories of data or AI systems processing personal data at scale, EU data protection authorities have stated that SCCs and DPF certification alone may be insufficient. Organizations must also implement supplementary technical measures, which, for AI tool usage, means encryption where the provider does not hold keys.

How does ITAR apply to AI tool usage?

ITAR controls the export of US defense articles and technical data. The "deemed export" rule treats making controlled technical data accessible to a foreign national as an export, even within the US. AI providers typically have foreign national employees, international data routing, and sometimes foreign corporate ownership. Processing ITAR-controlled technical data through a standard commercial AI API may constitute a deemed export unless specific technical controls ensure the data is never accessible to unauthorized persons. The ITAR encryption safe harbor (§ 120.31) provides protection when data is encrypted such that it cannot be accessed without authorized decryption.

What is Microsoft Copilot Flex Routing, and why does it matter for GDPR?

Flex Routing is a Microsoft 365 Copilot feature (enabled by default for new tenants since March 2026) that routes inference requests to US, Canadian, or Australian data centers when EU capacity is at peak. This means EU personal data processed by Copilot may leave the EU without a specific configuration to disable it. Additionally, since January 2026, Anthropic has been a Copilot subprocessor, explicitly outside the scope of Microsoft's EU Data Boundary. To disable flex routing: Microsoft 365 Admin Center → Copilot → Settings → "Flexible inferencing during peak load periods" → uncheck “Allow flexible routing outside my data boundary during peak demand periods."

Data Sovereignty in the Age of AI: How Global Teams Keep Sensitive Files Under Their Jurisdiction

What Data Sovereignty Actually Means

The Regulatory Landscape: What Each Jurisdiction Requires

India: Digital Personal Data Protection Act (DPDP)

Israel: Privacy Protection Law (Amendment 13)

ITAR and Export-Controlled Technical Data

Why "Sovereign Cloud" Isn't Enough

The Technical Answer: Encrypt Before the File Reaches the API

How Theodosian Addresses the AI Data Sovereignty Problem

A Framework for Global AI Deployment

FAQs: Data Sovereignty and AI

Does the EU-US Data Privacy Framework resolve GDPR transfer concerns for AI tools?

How does ITAR apply to AI tool usage?

What is Microsoft Copilot Flex Routing, and why does it matter for GDPR?

Cookie Settings

What Data Sovereignty Actually Means

The Regulatory Landscape: What Each Jurisdiction Requires

GDPR (EU and UK)

India: Digital Personal Data Protection Act (DPDP)

Israel: Privacy Protection Law (Amendment 13)

ITAR and Export-Controlled Technical Data

Why "Sovereign Cloud" Isn't Enough

The Technical Answer: Encrypt Before the File Reaches the API

How Theodosian Addresses the AI Data Sovereignty Problem

A Framework for Global AI Deployment

FAQs: Data Sovereignty and AI

Does the EU-US Data Privacy Framework resolve GDPR transfer concerns for AI tools?

How does ITAR apply to AI tool usage?

What is Microsoft Copilot Flex Routing, and why does it matter for GDPR?