What Is Data Classiication, and Why It Falls Short

Your Purview scan finished last Tuesday; 14,000 files in SharePoint are now labeled "Highly Confidential." Your compliance team marks the data classification initiative complete.

On the same Tuesday, an analyst downloads three of those labeled files to her personal laptop for a client presentation. The files leave SharePoint, and the Purview labels travel as metadata, with no enforcement mechanism on the receiving end.

The classification program worked exactly as it should, but the protection failed because classification was never designed to provide it.

That gap, between discovering sensitive data and actually protecting it, is where most breach notifications originate in organizations with mature security programs.

What Data Classification Does

Data classification is a discovery and labeling discipline. Its job is to answer one question: what sensitive data do you have, and where is it?

The workflow is well-established.

Discovery scans data repositories — cloud storage, SharePoint, file servers, SaaS platforms — to identify files containing sensitive content. Pattern matching against known formats (SSNs, credit card numbers, healthcare identifiers) plus ML-based classification for unstructured content like legal memos, engineering specifications, and financial models.

Labeling applies sensitivity tags that communicate the data's classification tier (Public, Internal, Confidential, Highly Confidential) and trigger policy-based rules. In Microsoft Purview, this means sensitivity labels that can attach metadata, apply visual markings, and — within the Microsoft ecosystem — enforce some access restrictions.

Monitoring runs continuous scanning to identify new sensitive data, detect classification drift, and alert when classified data appears in unexpected locations.

This is highly valuable, but most organizations don't know what sensitive data they have or where it is. Classification solves that problem.

It is the map. Not the protection.

Turn Your Data Map into Active Enforcement

Theodosian closes this gap by transforming passive tags into self-defending files. The moment data is flagged as sensitive, it is automatically wrapped in FIPS 140-3 validated encryption that travels with the file everywhere.

Make Your Classification Self-Enforcing

The Leading Data Classification Platforms

Microsoft Purview

The dominant enterprise classification platform. Purview sensitivity labels integrate natively with M365, Azure, and integrated SaaS. Labels can enforce encryption within the Microsoft tenant, a meaningful capability when everything stays inside that ecosystem.

The limitation Purview's own documentation acknowledges directly: "Encryption and content marking for files in the Microsoft Purview Data Map are not currently supported." Outside the Microsoft ecosystem — downloaded files, third-party applications, email to non-Microsoft recipients — label-based enforcement depends on additional DLP policy configuration and is not inherent to the label.

A Purview-labeled Confidential document attached to an email going to outside counsel is a file with a header and no enforcement behind it.

Varonis

Varonis extends deep file-system monitoring expertise across M365, AWS, Azure, and SaaS platforms. Its DSPM capabilities identify where sensitive data is, who has access, and what behavioral patterns suggest risk. Strong access analytics and behavioral baselining.

What it doesn't do: encrypt files or enforce access at the document level. Classification is the output. Varonis's own data shows the scale of what it's identifying — the average employee has access to approximately 25,000 sensitive folders in their organization's M365 environment. Finding all those folders doesn't close them.

BigID

Strong cross-environment discovery covering structured and unstructured data. ML-based classification with particular depth in personal data under GDPR, CCPA, and HIPAA. The primary output is classification-and-compliance evidence reporting.

Cyera

Positioned as a DSPM market leader. Cyera's own marketing draws the boundary explicitly: "DSPM tells you where sensitive data is and who has access, while DLP enforces controls that prevent specific data movements."

The vendor draws the line themselves. Discovery and classification are the scope. Enforcement is a different thing.

The Pattern Across All Four

Every major classification platform defines its primary output as discovery, labeling, and risk visibility. They are the map. For protection to travel with the data, something else needs to be in the stack.

The Classification-Enforcement Gap

Here is what actually happens when a sensitivity label is applied, but no file-level enforcement exists:

The download problem

A labeled file is downloaded by an authorized user to a contractor's personal laptop, a device that will be decommissioned next month, or a traveling engineer's personal machine. The classification metadata travels; the enforcement does not. The file sits on that device with whatever protection the operating system provides, which is none for the file itself.

📋

Offboarding & Lifecycle Control: Classification tags indicate that an asset is sensitive, but they cannot revoke access once a contractor has saved that asset to a local hard drive. Download the Contractor Offboarding File Security Checklist to secure endpoints post-separation.

The email problem

A labeled confidential document is attached to an email and sent to outside counsel, a compliance auditor, or a supply chain partner. Outside the Microsoft tenant, the label is a visual marking. The recipient's email client, their device, their DMS — none of them enforce it.

The collaboration tool problem

A classified file is uploaded to Slack, Teams, or a project management tool for a cross-functional review. The platform's classification model doesn't map to yours. The file is available to everyone in the workspace.

The AI ingestion problem

A Highly Confidential file is pasted into a browser-based AI tool or uploaded to an enterprise LLM for summarization. The label communicates sensitivity. It doesn't prevent the AI tool from processing the content or routing the request through infrastructure outside your governance boundary.

In all four cases, classification did its job. The label is accurate. The problem is that labels are information; they communicate sensitivity. They are not technical controls on the file itself.

🔍

Is your Shadow AI accessing classified files? Use our Shadow AI Risk Assessment Checklist to identify every AI system accessing your sensitive data and assess its governance posture. Download the Free Checklist →

What Regulatory Frameworks Actually Require

The confusion between classification and protection is embedded in how frameworks are written.

CMMC Level 2 / NIST SP 800-171: Requires identification and marking of Controlled Unclassified Information (CUI) — that is a classification requirement. It also requires FIPS 140-3 validated encryption for CUI (SC.3.177), media protection on contractor-controlled devices (MP.2.121), and organization-controlled key management (SC.3.187). Classification is the precondition. The encryption controls are an obligation.

A defense contractor with a comprehensive CUI labeling program but no FIPS 140-3 validated file-level encryption will fail at SC.3.177. These controls cannot be deferred to a Plan of Action and Milestones (POA&M) under Phase 2 enforcement.

🛠️

CMMC Audit Warning: C3PAO assessors will immediately fail an assessment if CUI is labeled but lacks a corresponding FIPS 140-3 Cryptographic Module Validation Program (CMVP) certificate number. Utilize our CMMC Level 2 Compliance Checklist to audit your technical controls before your window closes.

HIPAA: Requires organizations to classify PHI as part of their risk analysis. The safeguards — encryption, access controls, and audit logging under 45 CFR § 164.312 — are the actual requirements. The label is the diagnosis; the technical safeguard is the treatment.

GDPR / CCPA: Both require "appropriate technical and organizational measures" proportionate to the risk. Classification informs what those measures should be. It doesn't satisfy them.

The pattern is consistent: frameworks require classification as a foundation, then require technical controls for execution. Organizations that stop at classification have completed step one of a two-step compliance requirement.

Where File-Level Enforcement Fits in the Classification Stack

Classification discovers and labels. File-level enforcement ensures that every labeled file carries its own protection wherever it travels.

When a document is classified as Highly Confidential or identified as containing CUI, PHI, or regulated IP, that classification should trigger per-file FIPS 140-3 validated encryption protection that travels with the document regardless of what happens next. Every file carries its own cryptographic key and access policy. Opening it is an access request that must satisfy the policy: correct identity, compliant device, and authorized location. If those conditions aren't met — on any device, in any environment — access is denied.

The classification program found the file and named the risk. The file-level layer ensures that risk doesn't materialize into a notification.

This is the architecture the market is converging on: DSPM for visibility, file-level encryption for enforcement. Classification without enforcement is a liability report. Classification with enforcement is a data protection program.

Platform Comparison

Capability	Purview	Varonis	BigID	Cyera	Theodosian
Data discovery	✅ M365/Azure	✅ On-prem + cloud	✅ Multi-cloud	✅ Cloud-native	Policy-Driven Integration
Sensitivity labeling	✅	✅	✅	✅	Policy-Driven Integration
Risk posture reporting	✅	✅	✅	✅	❌
File encryption	Partial (within M365)	❌	❌	❌	✅ FIPS 140-3 per-file
Post-download enforcement	❌	❌	❌	❌	✅ Travels with file
Unmanaged device coverage	❌	❌	❌	❌	✅ Device-agnostic
Access revocation after sharing	❌	❌	❌	❌	✅ Immediate, retroactive
AI ingestion control	Visibility only	Visibility only	Visibility only	Visibility only	✅ Access denied at the file level
CMMC SC.3.177 satisfaction	Partial (within M365)	❌	❌	❌	✅ FIPS 140-3 validated

Choosing the Right Approach

If your primary need is knowing what sensitive data you have and where it is, any of the classification platforms above addresses this. Purview is the natural choice for M365-heavy environments; Varonis for organizations with significant on-premises file share infrastructure; BigID and Cyera for multi-cloud and complex SaaS footprints.

If your classification program has produced a list of labeled files, but you're uncertain what happens after an authorized user downloads them, that uncertainty is the gap. The classification is accurate. The enforcement is missing.

If you're subject to CMMC, ITAR, HIPAA, or GDPR, Classification satisfies the identification and marking requirements. It does not satisfy the encryption, access control, and audit logging requirements that follow. Both parts need to be in the stack.

Next steps? You choose.

1. Do Nothing — Accept that your data classification program is producing an accurate inventory of files you cannot guarantee are protected once they leave your governed environment.

2. Make Classification Self-Enforcing — Every file your classification program identifies as sensitive carries its own FIPS 140-3 validated encryption, access policy, and audit trail. Classification and protection become the same event, not two separate programs.

Close the Classification-Enforcement Loophole

Legacy DSPM platforms and compliance frameworks treat visibility as the final milestone. But in a highly distributed, AI-driven enterprise landscape, a data label without an active cryptographic enforcement control is an operational liability.

Close the Loophole With a Free 14-Day Pilot

FAQs: Data Classification and Sensitive Data Protection

What is data classification in cybersecurity?

Data classification is the process of identifying sensitive data across an organization's systems, applying sensitivity labels based on content type and regulatory requirements, and using those labels to inform security controls and compliance evidence. It answers "what sensitive data do we have and where is it?" — but it is a discovery and governance function, not an enforcement control. Labeling a file as Highly Confidential does not prevent an authorized user from downloading it to an unmanaged device or forwarding it to an uncontrolled recipient.

What is the difference between data classification and data protection?

Data classification identifies and labels sensitive data. Data protection enforces controls that determine what happens to that data — who can access it, under what conditions, on what devices, and whether access can be revoked after the fact. Both are required for a complete data security program. Classification without protection is an inventory of risks that aren't being managed. Protection without classification is encryption applied to files without understanding what they contain or why they matter.

Does Microsoft Purview satisfy data protection requirements?

Purview's sensitivity labels enforce some protections within the M365 ecosystem — including encryption and access restrictions for M365 files shared between M365 users. The enforcement breaks down for files that leave the Microsoft environment: downloaded to non-managed devices, shared via email with non-M365 recipients, or handled by applications outside the Purview policy scope. Purview also does not currently support encryption in the Purview Data Map for files outside the M365 ecosystem. Organizations subject to CMMC, ITAR, or HIPAA typically need file-level enforcement that operates independently of the Microsoft tenant.

Does data classification satisfy CMMC Level 2 requirements?

Partially. CMMC Level 2 requires CUI identification and marking, which is a classification function. It also requires FIPS 140-3 validated encryption (SC.3.177), organization-controlled key management (SC.3.187), and media protection on contractor-controlled devices (MP.2.121). These are enforcement requirements, not classification requirements. A CMMC compliance program that includes data classification but not FIPS-validated file-level encryption will fail at SC.3.177. These controls are on the POA&M-prohibited list for Phase 2 enforcement (November 2026) and must be implemented before assessment.

What data classification framework should organizations use?

Common frameworks include the US federal CUI Registry and NIST SP 800-60 for organizations handling government data, HIPAA's PHI classification requirements for healthcare, and organization-defined schemes (typically Public / Internal / Confidential / Highly Confidential) for commercial enterprises. The framework is less important than the enforcement that follows it. An organization with a precise four-tier classification scheme and no file-level enforcement is in a worse position than one with a simple two-tier scheme paired with robust encryption — because the detailed scheme creates documented evidence of what it knew was sensitive and did not protect.

What Is Data Classification, and Why Is It Not Enough to Protect Sensitive Data