Blog
Why AI cannot fix badly structured engineering data
Right now, somewhere in an engineering organisation, someone is building an AI agent for their team. The idea is good. The agent will monitor requirements, flag when something changes upstream, check whether the design still satisfies what it is supposed to satisfy. They have seen what these tools can do. They are not naive.
The data problem
What they have not fully reckoned with is the data. Not the demo data, which was curated. The actual engineering record: requirements split across two systems and a shared drive, design parameters in manually copied spreadsheets, analysis results in files named `thermal_final_v4_REVIEWED_use-this-one.xlsx`. The agent will read it all with complete confidence and produce wrong answers, because it has no way to know which document is current or whether any link between a requirement and a design element is still valid.
The agent is not broken. The data is.
Why this persists
Engineering data was built to satisfy review processes, not to be queried by machines. A requirements document needs to be readable to a reviewer, signed off, and placed under change control. It does not need a unique identifier for every shall statement, a structured link to the design element that satisfies it, or a timestamp on every field that changes.
These are not oversights. They are rational choices given how reviews work. The reviewer reads the document. The reviewer checks whether it is coherent. The reviewer signs. Nobody in that loop is a machine.
So the data accumulates in formats that humans can parse and machines cannot. When an AI agent arrives, it is reading documents, not databases. It is pattern-matching over prose, not querying structured records. The confident-sounding output it produces is as reliable as the documents it was trained on. Which is to say, unreliably reliable.
The deeper problem is that even when data is structured, it is rarely connected. A requirement lives in one system. The design element that satisfies it lives in another. The analysis that validates the design lives in a third. Each has its own version history and no formal link to the others. The agent can read all three. It cannot know whether they still agree.
What it costs
The costs show up in two places.
The first is obvious: the agent does not deliver what was promised. Teams spend weeks pre-processing data before it can run. The pre-processing becomes a project of its own. The efficiency gain evaporates before it starts.
The second is less visible: the data problem gets worse. Once teams know an AI tool is reading their data, they start cleaning it locally, for the tool, in ways that do not improve the underlying structure. A curated subset gets created. The curated subset drifts from the live data. Now there are two representations of the truth, and a new coordination burden to keep them aligned.
The tool that was supposed to reduce engineering overhead creates more of it.
What good looks like
Structured engineering data is not the same as digitised engineering data. Most teams have digitised data: files, documents, exports. Very few have structured data: every element with a defined type, a unique identity, a version history, and a formal relationship to other elements.
When data is structured at that level, AI delivers real leverage. A change to a requirement triggers an automatic check of every design element linked to it. An analysis output writes back to the model, not to a spreadsheet. A query about current design status returns a live answer, not a document that was accurate last quarter.
This is not AI doing something novel. It is AI doing what retrieval systems have always done, but against data that is actually queryable. The capability was always there. The bottleneck was the data.
The teams getting real value from AI in engineering are not the ones with the best models. They are the ones who built their data structure before they brought in the model.
The diagnostic
One question: if you asked an AI agent to tell you which requirements are currently satisfied by your design and which are not, what would it need to do to answer that?
If the answer involves reading documents, retrieving files, cross-referencing spreadsheets, and relying on whoever last touched the data to have been careful, the bottleneck is not the AI. The intelligence is available. The data is not ready for it.
Fix the data structure first. The AI will follow.