New AI tools are met with hesitancy because businesses believe their data needs to be ‘perfect’. We talked to industry professional Gao Maribe, a Data and AI Executive, and our very own founder, Jay van Zyl, to get their take on what businesses should be focusing on instead.

The real world is not perfect, so why should your data be?

Jay put it bluntly – the definition of ‘perfect’ data relies completely on what you are intending to use it for. Some cases require high accuracy, while others thrive in messy data environments.

In some cases, it is vital that your data is structured to ensure accuracy and precision, minimizing noise so your predictions are as sharp as possible. In data science, noise refers to variability in data—like typos in customer names or missing fields—that doesn’t carry a useful signal about the outcome you’re trying to predict. Noise can weaken predictions when models confuse irrelevant variation with real patterns. In an environment like adjudicating medical claims, accuracy and precision take precedence because weak predictions can have severe repercussions.

Crowd of diverse people crossing a busy city street, representing how noise is part of the truth when modeling human behavior.

Human beings are complex, unstructured and unpredictable. Trying to represent human behavior in structured, static datasets is not only tiresome, but obfuscates a world of nuance.

There are prominent fears that imperfect data leads to “flawed AI decisions, negating the benefits of adopting AI technology”. But data that is too perfect can also lead to inaccurate representations of the world. 

When modeling human behavior or analyzing customer experience, noise is often part of the truth. In churn prediction, for example, relying only on pristine historical data can slow down and even obscure real‑time signals that matter most. If you only modeled clean, “perfect” histories, you’d miss rare but important churn indicators that happen in real time. A sudden spike in late payments might look like noise, but in some cases, it’s an early warning that a customer under financial stress is about to churn. Jay said, “If you leave the data dirty, you have a real reflection of the real world.”

 

 

Choose your tooling wisely

Businesses hesitate to adopt AI tools by zooming in on perfection rather than seeing the bigger picture. The focus should not be on having perfect data, but on the ability of AI tooling to use that data properly. 

Jay often emphasizes the misuse of generative tooling. There are ongoing copyright lawsuits; regardless of legal outcomes, generative models are not designed to reproduce exact copyrighted text on demand and may invent details. This demonstrates how, built for linguistic usefulness, generative tools are not reliable for accuracy and precision, unless they are bound by rules.

Gao, a trained statistician, noted that common practice has long been to use simulations on controlled, synthetic datasets to test the effectiveness of

Person in protective gear cleaning a surface with disinfectant spray, symbolizing the myth of needing perfectly clean data for AI.

Over-cleaning or perfecting data can lead to the destruction of human nuance. What is often mistaken as noise, are the clues to understanding human behavior.

 models. However, this can be harmful as the synthetic datasets, created by generative AI, are often far removed from your business’s reality.

A practical way to make generative AI useful is to use agents, according to Jay. “If you use a generative tool to generate synthetic data, you want it to adhere to, let’s say, the categories that you have in your business. So, your generative process needs to go find your agents that will tell you what the valid categories are. You don’t want it to make those up.” 

Data in production will always be imperfect, so your model should learn from that reality. ‘Perfect’ data is an unrealistic goal, wasting time, energy and resources. Instead, the focus should be on using the correct tools for your end goal – working with imperfection when it comes to humans, and clearing out noise when precision is vital.