Agentic AI Data Quality: Why Ontology and Confidential Computing Must Come Before Enterprise Agents

Agentic AI Data Quality: Why Ontology and Confidential Computing Must Come Before Enterprise Agents

By M. Mahmood | Strategist & Consultant | mmmahmood.com

TL;DR / Summary

Enterprises are moving into agentic AI with the same optimism that accompanied cloud migrations, data lake programs, and digital transformation offices. In each of those waves, the visible layer got funded first, the foundational work was deferred, and leadership eventually learned that the expensive part was not buying the technology but compensating for what had been ignored underneath. That is where many companies now sit with agentic AI data quality.

The warning signs are no longer abstract, as evidenced by the event in 2025 when Amazon’s internal AI coding agent Kiro reportedly deleted the AWS Cost Explorer production environment and rebuilt it from scratch, causing a prolonged outage in a mainland China region. Amazon publicly described the incident as user error caused by misconfigured access controls. A few months later, reporting around another Kiro-related event claimed that AI-assisted code changes contributed to a 99 percent drop in U.S. order volume for a full day, or roughly 6.3 million lost orders. You can debate the exact share of blame between people, process, and software, but the larger point is harder to escape: once agents are embedded in live workflows, small hidden weaknesses stop being theoretical and start becoming operational.

Most executive teams still interpret these failures as isolated tooling mistakes. That reading is convenient, but it misses what is actually going wrong. The deeper issue is that enterprises are trying to place autonomous reasoning on top of systems whose definitions do not line up, while also running those agents inside environments that cannot convincingly prove who or what had access to sensitive data during inference. The two least glamorous layers in the stack are becoming the ones that decide whether the whole program survives.

Why agentic AI data quality is the real deployment bottleneck

Gartner’s projection that more than 40 percent of enterprise agentic AI projects will be cancelled by the end of 2027 is not really a statement about weak demos or inadequate models. It is a statement about the gap between controlled pilots and production reality. The same pattern appears in a March 2026 survey showing that 78 percent of organizations had at least one agent pilot running, while only 14 percent had scaled any of them across the business. That is not a frontier-model story, it's a readiness story.

One reason the scaling gap persists is that data quality and integration failures sit behind more than 70 percent of enterprise AI initiatives that fail before production. In practical terms, that means most enterprises are not hitting the wall because the model is too small or the prompt is imperfect. They are hitting the wall because customer, account, product, claim, inventory, and pricing data do not mean the same thing across the systems an agent has to interpret.

This is where the conversation usually gets lazy, as people hear “data quality” and think about stale records, missing values, or duplicates because that is the world business intelligence (BI) trained them to see. But BI-era data quality and agent-era data quality are not the same problem. In the reporting world, a human reviews the output and catches a surprising number of issues before a consequential action is taken. In the agentic world, the system reasons across steps, often across multiple tools and sources, and carries semantic confusion forward until it becomes action. That is why a small inconsistency in meaning can quietly compound into a major operational error.

The sharpest explanation I have seen comes from enterprise data architecture work rather than AI marketing. The Inteq Group argues that agents usually fail not because the model is weak, but because they are reasoning against an inconsistent or incomplete definition of the business itself. That sentence gets to the heart of the issue; If “active account” means one thing in finance, something else in sales, and a third thing in customer operations, the agent does not solve that ambiguity through intelligence, as It inherits the ambiguity and accelerates it.

A retailer case study illustrates the point better than any abstract framework could. One global retailer deployed a customer service agent trained on historical chat logs, inconsistent product descriptions, and outdated policy information. The result was not an immediate, spectacular failure. It was something more realistic and more dangerous. The agent gave wrong return instructions to certain customer segments for months before the organization connected the errors back to the underlying data and semantic inconsistencies. That is what these failures often look like in practice. They hide in plain sight until enough damage accumulates to force attention.

I have seen a version of the same pattern firsthand in a multi-region enterprise deployment. The model looked fine in testing. The demos looked strong. Once the system hit production, however, the agent had to reason across five upstream systems that used three different definitions of “active account.” The recommendations it produced looked perfectly sensible if you examined each source system in isolation, yet they made no strategic sense once viewed at the portfolio level. Nothing about that failure called for a bigger model or a new orchestration layer. It required a simple but uncomfortable act of discipline: the business had to decide what the term meant and enforce that meaning before the agent was trusted with the workflow.

That is why ontology matters here, not as an academic exercise but as operating discipline. If an enterprise has not explicitly defined its core entities, their relationships, and the rules for resolving conflict between systems, then “agentic AI data quality” is mostly branding. You are asking the agent to reason over a world the enterprise has never properly described. The same point helps explain why specialized AI architectures and multi-agent protocols such as A2A and MCP only create durable value when the semantic layer below them is strong enough to support real reasoning.

Why confidential computing belongs in the same decision

Even if the semantic layer gets fixed, the architecture is still incomplete if the trust model is weak. This is the part vendor presentations usually slide past because it is harder to package neatly. They will talk about encryption, policy, monitoring, and governance, but they rarely sit with the question that becomes unavoidable once agents are handling sensitive workflows: what can be seen while the model is actually running, and who has technical access to that memory at that moment?

Standard cloud controls answer that question poorly. They protect data at rest in storage and data in transit across networks, but they do not inherently protect data in use while an agent is actively processing prompts, retrieved documents, credentials, account context, and intermediate reasoning artifacts inside shared infrastructure. That distinction mattered less when many AI workloads were narrow inference calls with limited persistence. It matters much more now that enterprises want agents to work across systems, retain state, and act on sensitive information with less human friction in the loop.

The broader operating environment should already make executives uneasy. A recent security survey found that 63 percent of employees using AI tools in 2025 pasted sensitive company data into external AI systems, including source code and customer information, while prompt injection and agent hijacking remain near the top of industry risk lists. Seen from that angle, the Kiro story does not look like an odd engineering embarrassment. It looks like an early warning that enterprises are still letting software inherit trust boundaries designed for people.

This is why confidential computing needs to sit inside the same executive discussion as ontology and data quality. Technologies such as Intel TDX, AMD SEV-SNP, and AWS Nitro Enclaves create trusted execution environments that keep workloads inside encrypted memory and isolate them from the hypervisor and even the cloud operator. Just as important, those environments produce attestation, which gives the enterprise a technical record of what ran, where it ran, and whether the boundary held. That is not a paperwork improvement. It is a change in what can be proven.

This is already moving into live sectors rather than staying trapped in proof-of-concept decks. Healthcare organizations are using confidential computing for diagnostic AI in public cloud, and banks are using it for cross-institution fraud analysis without exposing raw data. Once that becomes economically viable, the harder question is no longer whether confidential computing is practical. The harder question is why any enterprise would continue placing sensitive agent workloads inside an environment that cannot technically prove isolation during inference.

If the semantic layer is weak, your agents will reason badly, and if the trust boundary is weak, your agents may reason correctly and still create a compliance or security event. Either way, the board ends up inheriting consequences that should have been prevented at design time. That is why I do not see ontology and confidential computing as separate workstreams, rather they are the two halves of the same readiness decision.


What to do in the next 90 to 180 days

You do not need a grand transformation program to start fixing this. What you need is sequencing, discipline, and the willingness to stop calling unfinished infrastructure a pilot strategy. The first move is to identify the handful of agent workflows closest to production and force a semantic audit of the core entities involved. The second is to add observability and stop conditions so that the agent escalates when data confidence drops instead of improvising. The third is to place every sensitive or regulated workload inside a real confidential-computing boundary and separate agent permissions from inherited human credentials.

Phase What to do Owner Gate
Days 0 to 30 Identify the five workflows closest to production and map every conflicting definition of the business entities they depend on. CDO / Head of Data One authoritative definition exists for each core entity.
Days 30 to 60 Create the semantic resolution layer and add quality monitoring with explicit stop-and-escalate rules. CDO + CIO Data quality thresholds and escalation paths are tested in a live scenario.
Days 60 to 90 Run a confidential-computing pilot on the highest-risk regulated workflow and generate attestation evidence. CISO + CIO The organization can prove who or what can access agent memory during inference.
Days 90 to 180 Expand confidential computing to all sensitive agent workflows and move agent permissions away from inherited human credentials. CISO + CIO + General Counsel Board reporting includes semantic risk, TEE coverage, and agent decision traceability.

That sequence also makes adjacent governance work more useful. If you are already thinking through AI governance at the board level or refining how managers supervise an AI-enabled workforce through an AI workforce manager playbook, those efforts become much more credible once the semantic and trust foundations are in place. Otherwise, governance language becomes a cosmetic layer on top of systems whose assumptions are still unstable.

The vendor-whitepaper version of this story says companies should move fast, learn fast, and mature over time. I think that is exactly how many enterprises end up owning preventable failures. The losers in this cycle will not necessarily be the laggards. They will be the organizations that went live early, confused motion with readiness, and only later discovered that their agents were reasoning over ambiguity inside infrastructure they could not technically defend.

If that sounds close to your current roadmap, it is worth slowing down now. First define the business semantically. Then prove the trust boundary technically. Then deploy the agents. That is the order that gives you a program you can scale, defend, and explain.

Frequently Asked Questions (FAQ):

Why is agentic AI data quality different from traditional data quality?

Traditional data quality was designed for reporting and human review. Agentic AI data quality requires machine-readable, conflict-resolved business definitions because agents act on the data directly rather than presenting it to a person first.

Why does confidential computing matter for enterprise agents?

It protects data while the agent is actively processing it in memory, not just when the data is stored or transmitted, and it gives the enterprise attestation evidence that can be used with auditors, boards, and regulators.

What should a CIO do first before scaling enterprise agents?

Start with the five workflows closest to production, force a semantic audit of the entities involved, and do not let any high-risk workflow go live until both data meaning and trust boundaries are explicit.

For leaders working through these choices, AI Strategy: The Executive Playbook goes deeper on governance, operating models, and infrastructure tradeoffs.

If you need outside help pressure-testing your architecture before agents move deeper into production, MD-Konsult is built for exactly that kind of work.