The latest 2025 State of Data Compliance and Security Report from Perforce Software reveals a troubling paradox in how enterprises view AI model training and sensitive data usage.
The study found that 91% of organizations believe sensitive data should be allowed in AI training, even though 78% of those same organizations admit to serious concerns about theft or breach of such data. Perforce warns that once sensitive data is ingested into an AI model, it can never truly be removed—making these practices a long-term privacy and compliance risk.
The report also underscores a worsening breach landscape. 60% of organizations have already experienced data breaches or theft in software development, testing, AI, and analytics environments—an 11% increase from last year. Despite this, 84% continue to allow compliance exceptions in non-production environments, exposing themselves to preventable risks.
“You should never train your AI models with personally identifiable information (PII), especially when secure alternatives exist,” says Steve Karam, Perforce
“The rush to adopt AI presents a dual challenge for organizations: immense pressure to innovate while facing heightened fear about data privacy,” said Steve Karam, Principal Product Manager at Perforce. “To navigate this complexity, organizations must adopt AI responsibly. You should never train your AI models with personally identifiable information (PII), especially when secure synthetic data can accelerate AI pipelines without compromising compliance.”
Echoing the urgency, Ross Millenacker, Senior Product Manager at Perforce, noted: “Too many organizations see masking and data protection as cumbersome, and allow risky exceptions instead. But this mindset leaves a significant vulnerability. It’s time to close these gaps and truly protect sensitive data.”
The findings come at a time when 86% of enterprises plan to invest in AI data privacy solutions over the next 1–2 years. To address this, Perforce recently introduced AI-powered synthetic data generation in its Delphix DevOps Data Platform, combining data masking, delivery, and synthetic generation to help enterprises balance innovation with compliance.