For the past few weeks, I’ve been engaged in an email exchange with my favourite anarcho-syndicalist Noam Chomsky. I reached out to him initially to ask whether recent developments in ANNs (artificial neural networks) had caused him to reconsider his famous linguistic theory Universal Grammar. Our conversation touched on the possible limitations of Deep Learning, how well ANNs really model biological brains and also meandered into more philosophical territory. I’m not going to quote Professor Chomsky directly in this article as our discussion was informal but I will attempt to summarise the key take-aways.
Noam Chomsky is first and foremost a professor of linguistics (considered by many to be “the father of modern linguistics”) but he is probably better known outside of academic circles as an activist, philosopher and historian. He is the author of over 100 books and was voted the world’s leading public intellectual in a 2005 poll conducted by magazines Foreign Policy and Prospect.
For the record, I am an admirer of Chomsky’s work, particularly his critiques of American imperialism, neo-liberalism and the media. Where our views have diverged slightly is in relation to his dismissal of continental philosophers (especially the French post-structuralists). Perhaps I have been poisoned by drawing too often from the wells of Foucault, Lacan and Derrida in early adulthood but I’ve always found Chomsky’s analytical approach to philosophy morally appealing but a little too “clean” to satisfactorily explain our world. While his disdain for these post-structuralist luminaries is conspicuous, Chomsky’s philosophical views are more nuanced than his detractors give him credit for.
I will declare from the outset that I am not a linguist, but in this section, I will try to give an overview of Universal Grammar theory. Before Chomsky, the predominant hypothesis in linguistics was that humans are born with minds that are “tabula rasa” (like a blank slate) and acquire language through reinforcement. That is, children hear their parents speak, they mimic the sounds that they hear and when they correctly use a word or structure a sentence they are praised. What Chomsky showed was that reinforcement is only part of the story and that there must be innate structures within the human brain that are universal and that facilitate language acquisition. His primary arguments were:
This theory of a genetically hard-coded language faculty became widely accepted in the scientific community, but the obvious next question was “what does this Universal Grammar actually look like?”. Intrepid researchers soon set out to discover shared properties across all human languages but there remains no consensus on what form our innate linguistic capacities take. It’s safe to assume that Universal Grammar does not consist of concrete syntactic rules, but is more likely to be a fundamental cognitive function. Chomsky has postulated that at some point in our history, humans developed the ability to perform a simple, recursive process called “Merge” and this is responsible for the properties and constraints of the syntactic structures we see within human languages. It’s a little bit abstract (and too involved to address properly here), but essentially “Merge” is the process of taking two objects and combining them to form a new object. While seemingly prosaic, the ability to mentally combine concepts, and to do this recursively, is deceptively powerful and allows us to construct an “infinite variety of hierarchically structured expressions”. Not only may this small but crucial genetic leap forward explain our aptitude for verbal communication, it also follows that it could be responsible (at least in part) for our mathematical talents and human creativity more broadly. This “Merge” mutation that occurred in one of our ancestors ~100k years ago, might be one of the key things that separate humans from other animals.
The primary reason I got in touch with Professor Chomsky, was because I wanted to hear his views on Artificial Neural Networks (a topic I know materially more about than linguistics). ANNs are a subset of machine learning models that are loosely modelled on the human brain and learn in a similar way (by seeing lots of examples). These models require very little hard-coding and can perform quite a broad array of complex tasks (e.g. image tagging, voice recognition, text generation) with relatively simple architectures. An instructive example of this approach is the AlphaGo Zero model developed by Google, which learnt to play the game Go (a complex and challenging board game) and ultimately became unbeatable by human world champions. Most impressively, it was trained to do all of this with no hard-coding or human intervention, that is “tabula rasa”. While ANNs are certainly not a perfect analogy for the human brain, I asked Professor Chomsky whether these models suggest that in fact we do not need hard-coded cognitive structures to learn from scattered data.
Chomsky correctly pointed out that ANNs are useful for highly specialised tasks, but these tasks must be sharply constrained (although their scope can appear vast given the memory and speed of modern computers). He compared ANNs to a massive crane working on a high rise building; while certainly impressive, both tools exist in systems with fixed bounds. This line of reasoning is congruent with my observation that all of the deep learning breakthroughs I have witnessed have occurred in very specific domains and we do not appear to be approaching anything like artificial general intelligence (whatever that means). Chomsky also pointed to mounting evidence that ANNs do not accurately model human cognition, which is so comparatively rich that the computational systems involved may even extend to the cellular level.
If Chomsky is right (and for what it’s worth I think he is) what are the implications for deep learning research moving forward? Ultimately there is nothing magical about the human brain. It is simply a physical structure composed of atoms and therefore it is entirely rational to believe that at some point in the future we may be able to create an artificial version that is capable of general intelligence. With that said, current ANNs offer only a simulacrum of this kind of cognition and by Chomsky’s logic, we won’t reach this next frontier without first improving our understanding of how organic neural networks operate.
The ethical use of AI is a salient concern for modern data scientists, but at times this domain can feel vague and subjective in an otherwise concrete field. Not only does Chomsky’s work provide a unique technical perspective on the future of deep learning, Universal Grammar also has profound moral implications since language is how we discuss and interpret the world. For example, Chomsky’s view is that the aforementioned innate neural structures preclude moral relativism and that there must exist universal moral constraints. There are many different flavours of moral relativism, but the core tenet is that there can be no objective basis for ethical determinations. Moral relativists assert that while we might believe deeply in statements such as “slavery is immoral”, we have no empirical way of proving this to somebody who disagrees since any proof will necessarily rely on value judgements and our values are ultimately exogenous and determined by culture and experience.
Chomsky contends that morality manifests in the brain and is, therefore, by definition, a biological system. All biological systems have variation (natural and due to divergent stimuli) but they also have limits. Consider the human visual system: experiments have shown that it contains some plasticity and is shaped by experience (especially in early childhood). By varying the data provided to the human visual system, you can literally alter the distribution of receptors and thereby change the way that an individual perceives horizontal and vertical lines. What you can not do, however, is turn a human eye into an insect eye, or give somebody the ability to see X-rays. According to Chomsky, biological systems (including morals) can vary quite widely but not infinitely. He goes on to say that even if you believe that our morality is entirely derived from culture, you still need to obtain that culture in the same way that you acquire any system (as a result of innate cognitive structures that are universal).
My initial reservation with this reading is that if we assume morality is simply a consequence of “Merge” (or something equally primitive), then while this may impose theoretical constraints, my intuition is that our morals may vary so wildly that it is practically impossible to make universal statements. In the past, Chomsky has discussed how moral progress appears to follow certain trends (e.g. acceptance of difference, rejection of oppression etc.) but I struggle to see how these broad trends would emerge consistently from such simple atomic cognitive structures. When I put this to Professor Chomsky, he argued that this view was illusory and that when we don’t understand things, they seem more diverse and complex than they really are. He gave the example of the variance seen in animal body plans since the Cambrian explosion. Merely 60 years ago, the dominant view in biology was that organisms vary so drastically that each must be studied on an individual basis, but we now know that this is completely wrong and that genetic variation between species is fairly slight. Variation in complex acquired systems must be minimal, otherwise we wouldn’t be able to acquire them.