How two different languages can sound the same
Since the dawn of the industrial revolution, humans and machines have come in closer and closer proximity to one another. This trend began with the displacement of manual labour and evolved into machines becoming intimately involved in almost every aspect of our lives, from the very first sound we hear when we wake up, to our primary interface with knowledge, communication, business, and entertainment.

Many of these technological innovations acted as aids to humans to complete menial tasks so that they could spend more time and energy working with other humans doing uniquely human things. It is only recently that machines have begun to occupy a space that tries to emulate humans themselves. There is nothing about a train that tries to look like a thousand human legs running along a track, nor a telephone that emulates a human mouth. However, artificial intelligence, or more specifically, large language models, purposefully try to emulate human language. Reminding ourselves where machines end and humans begin not only preserves our pride, but is essential for using machines for their strengths and not being fooled by their uncanny mimicry of the human voice.
The mathematical origins of language models
In the 1800s, French philologist Michel Bréal, spent his time studying languages – their structure, how they change as time passes and the relationships between words. By the end of World War II, the need for conflict resolution revealed the importance of communication, and growing global connections required that humans do this on a massive scale – a feat that required superhuman help.

The earliest forms of language models were used to translate one human language into another one. But this required that machines find a commonality between different human languages, which proved to be a lot more complicated than anticipated. Rather than revealing a logical pattern, human language unveiled its underlying chaos and broken rules – something a machine could not begin to understand.
Technologists had to find a way to ‘translate’ human language into a language machines could understand. Human language, as a result, had to be reduced to a mathematical format, stripping it of cultural anomalies and nuance. In 1966, MIT computer scientist Joseph Weizenbaum developed the first program using natural language processing. This program functioned by identifying keywords and responding with pre-programmed responses. In the 1980s, IBM developed the first small language model which functioned slightly differently – they used training data to predict the next most probable word in a sentence. This was a major step for natural language processing (NLP), which had, prior to the 1980s, used hand-written rules.
Strength in one another’s weaknesses
NLP combines linguistics with computer science and artificial intelligence to enable computers to create mathematical models of language, informing interpretation and generation of language by computers. But can human language really be translated into mathematical rules? Linguist Ferdinand de Saussure developed a theoretical framework for understanding language called Structuralism. This theory claims that languages follow hidden rules that practitioners know implicitly, but are unable to articulate. Claude Levi-Strauss took this theory further in his anthropological research, and argued that human culture itself is composed of hidden rules that govern human behavior… a ‘human algorithm’, if it pleases you.
Alan Turing predicted in 1947 that the only way machines would begin to truly understand, and therefore imitate human language at its most intimate level would be to give machines the ability to “alter its own instructions”. This would get machines closer to thinking like humans do and what might be perceived as ‘broken rules’ could instead be seen as rules that change based on context.
Modern-day LLMs come close to this by using artificial neural networks. However, while imitating the structure of a brain with an interlinked web of nodes (neurons) which receive and process signals, human brains are far more complex, with factors like neurotransmitters, re-uptake inhibitors, ion channels and refractory periods adding uniquely human nuance to every interpretation and subsequent action. Additionally, LLMs lack the feedback loops necessary for self-reflection, leaving them vulnerable to error and unable to adjust predictions based on immediate context.
LLM’s relative simplicity, however, allows them to process vast amounts of text data at far greater speeds than a human brain. Additionally, they are able to ‘translate’ various forms of data: text to image, audio to text, 3D models to videos. LLMs’ translation capabilities go as far as being able to translate DNA into music, using genetic sequencing as a pattern for sound. LLMs therefore excel in a realm where humans are limited.

Large language models and human language evolved from different origins and for different purposes. Human language evolved as a means of expression of thought, feelings, and abstractions. Cuneiform script, one of the earliest forms of writing, took the form of wedge-shaped characters on mainly clay tablets in the ancient writing systems of Mesopotamia, Persia, and Ugarit. It first evolved as a way of keeping record of trade and ownership, and only later served the function of literature. To pass down scribal knowledge to future generations, glossaries were structured as lexical lists, copied and passed down for generations. This began the standardization of the writing system, but inherently tied to it was a way of seeing the world. Lexical lists formed a taxonomy, sorting words into groups of similar meaning. Because human beings act as both encoders and decoders, this way of thinking was internalised, developing a new way of seeing a world that ordered reality into a fixed architecture upon which more scribal knowledge was formed. Writing and language became a form of cognitive outsourcing and a tool for self-reflection.
Human beings’ biological senses enrich words with greater meaning – relating to experience and emotion – and in turn affect how we perceive the world. Our sense of smell is particularly volatile when exposed to linguistic suggestions. Subjects in a study exposed to a manufactured odor and told that it was “cheddar cheese” showed activation of a specific area of the brain that processes olfactory information; now told that the scent was in fact “body odor”, different areas were activated; and when presented with clean air and told it was “cheddar cheese,” again the same “cheese area” was activated. Not only do we use language to describe sensations, but the cognitive feedback that occurs affects the orientation of our senses.
This flexibility and dynamic posturing towards the physical environment is absent in machines, not only because they lack sensory faculties, but because their design is rigid, geared towards efficiency, speed and usefulness. The vulnerability of our perception, and its reciprocal relationship with language, may appear to be a hindrance for accuracy, but in fact it is in these flaws that human cognition, and therefore language, becomes tied to experience – enriched with data from the real world, filtered through emotion, reproduced with nuance.
Conversations between humans and machines
Human beings developed language as a tool for intellect, a way to formulate thought and externalise it for consumption by other humans. Moving beyond our era of clay tablets, sculptures and rock paintings, we have not only mastered the ability to externalise our thoughts, but have now outsourced these very linguistic capabilities to machines. Rather than seeking knowledge from elders or respected friends, we have entered into dialogue with a series of artificial nodes, thinking that the information they produce is somehow superior.
LLMs like ChatGPT are purposefully designed to engage in a back and forth conversation with human beings, in a human-like manner. LLMs are trained to mirror human preferences, with built-in politeness, often dishing out saccharine compliments before delivering a response. But in conversation with machines, humans need to recognise the difference between the language computers speak and that which we speak. Although the output appears the same or even superior, the internal processes of producing language in machines and humans are vastly different. Perceiving a machine’s language capabilities as equivalent or superior to a human’s is not only misleading, but misses the opportunity of playing to both parties’ strengths. Human language is informed by individual experience, societal norms, emotional intelligence, and is enriched by signals from biological systems; whereas LLMs function on sophisticated statistical neural networks, delivering content as an aggregation of millions of sources.
Human and machine language may appear the same at a superficial level, but the underlying mechanisms and intentions behind language diverge fundamentally. Human language as a means of expression, exploring abstract thought, cognition, with the primary goal of connection. Large language models were designed for linguistic utility, and are oriented towards efficiency. The strengths of LLMs should be viewed as inherently complementary to human strengths – excelling where we falter, and vice versa. To preserve the richness of human communication and make the most of our technological tools, we must engage with them critically, using them not as replacements, but as extensions of our cognitive abilities.
