Bigger is not always better for agentic AI

Why small language models are the smarter choice

Instinct might tell you that models trained on more data that can answer virtually any question with linguistic fervor, would be most favorable to use for complex agentic journeys.

As stated in recent research in the field , “Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation.”

(Very) large language models power the core components of most AI agents. It has become common practice in the industry to use one generalist LLM to serve a large volume of diverse requests. This general knowledge is used to inform the agent’s strategic decisions, to perform reasoning and to break down tasks into subtasks.

However, the research paper, titled “Small Language Models are the future of Agentic AI” goes further to present the compelling argument that LLMs are far less than ideal for agentic AI frameworks.

The potential of agentic AI will not be fully realised if businesses and tech professionals attempt to copy-and-paste what has worked before. The technology requires the industry to confront the weaknesses and harness the strengths of generative AI.

Without reliability, accuracy and explainability built into the frameworks, businesses will never be able to fully outsource work to these ‘autonomous’ agents.

Where LLMs fail

Agentic AI uses language models to perform specialized tasks

By integrating LLMs with tools, APIs, and memory systems, and by implementing guardrails, agentic frameworks give LLMs the means to act in an autonomous and human-like manner.
However, the nature of LLMs, as holders, interpreters and conveyors of enormous swathes of information, makes them inherently general. Pairing this with agentic AI’s requirement to perform small, repetitive, specialized tasks has had some adverse consequences.

As our founder Jay van Zyl puts it:

“the more information a model possesses, the longer the thinking process will be, the more verbose the answers are, and the less successful the model is at detecting intent. “

What you need instead are models that have narrower focus and more specialized knowledge.

Consider for a moment the unlikely event that you are visiting a public library. Now, imagine the equally unlikely event that you are looking for a book on the Migratory Patterns of Glowworms in the Limestone Caves of Southern Kentucky.

You approach a pasty-skinned librarian at the front desk who, upon hearing your request, runs one bony finger down the catalog screen. He then scratches his nose, looks up, and informs you that no such book exists. You walk out of the library, disheartened by the prospect that you may never learn about the Migratory Patterns of Glowworms in the Limestone Caves of Southern Kentucky.

Then, from the corner of your eye, down an arbitrary side street, you spot a bookstore called Things That Glow. A signboard propped up next to the entrance reads: “We have books about Glowworms in Southern Kentucky!”. Your eyes brighten with excitement and, as you enter the store, a friendly glowworm enthusiast hands you the book you have been searching for.

That unassuming bookstore, which served no function except that of satisfying the community of glowworm fanatics like yourself, is a small language model (SLM).

Why small language models (SLMs) are better for agentic frameworks

LLMs have the primary objective of extracting patterns from large amounts of data and to produce novel content. While this is useful for contexts where a user requires feedback on a well-documented topic, LLMs are inefficient when it comes to tasks that require a more specialized approach.

SLMs, on the other hand, are trained on smaller amounts of data, which restricts their context automatically. This means that SLMs have highly-specific knowledge about a certain thing, and cannot return answers for anything that falls outside of their expertly small window of context. With the rise of AI hallucinations, businesses are more aware that no answer is in fact better than a convincing-yet-entirely-incorrect one.

Research has shown that SLMs outperform LLMs on several fronts:

SLMs have higher throughput and lower latency than LLMs. This means that they can (A) process higher volumes of data within a specific time period, and (B) decrease the time delay for the completion of a data request.
SLMs not only stick within the bounds of a specific task, but do so more efficiently. With far fewer parameters, SLMs’ inference cost is 10 to 30 times lower than that of an LLM.
Additionally, their lower pre-training and fine-tuning costs make them far more flexible and easier to retrain.

The potential of agentic AI will not be fully realised if businesses and tech professionals do not confront the weaknesses and harness the strengths of generative AI. Your customers turn to you for information specific to their context, requiring your agents to be highly-focused and able to perform specialized tasks. Essentially, you want to be the bookstore called Things that Glow – the expert your customers can rely on to deliver precise, relevant, and trustworthy answers every time.

Watch this space for weekly insights from our team!