We Need Universal Teaching Machines as Much as We Do Universal Learning Machines

Since Alan Turing's fascination behind the question, "Can machines think?" (from his novel paper on Computing Machinery and Intelligence in 1950) to teaching machines long term and short-term attention (from "Attention is all you need" paper in 2017), the best brains and the biggest organizations on this planet have been marshaling their faculties and resources in developing what we now know as Universal Learning Machines. Universal Learning Machines are machines which can learn almost anything. Now that we have Universal Learning Machines (ULMs) with undeniable (and possibly uncontrollable) capabilities, I think it's time we heed some attention to Universal Teaching Machines (UTMs).

How did Computing machines become Learning Machines?

Despite the misleading term, "Machine learning" (ML) algorithms did not learn stuff in the true sense - because, we still needed to do "feature engineering". For example, let's say we wanted a Machine Learning model to predict the Iris flower species (the Hello world of ML problems), a data scientist needed information (or in most cases engineered the features themselves) on petal's length and width as well as the sepal's length & width.

No alt text provided for this image

Deep Learning probably started "learning" since it employed a technique known as "Universal Approximation function", where it started to learn the features from the input data by itself, i.e., feature engineering was no longer necessary. Based on the above example, a deep learning could predict the Iris species type based on a picture of the Iris flower.

No alt text provided for this image

Gen AI is a totally different ball game when it comes to learning.

No alt text provided for this image

That's a poem generated by GenAI about Iris. In the style of Tagore. In a few milliseconds. I can assure you (with a tinge of hubris) that no one has ever attempted to write a poem on Iris, but GenAI was able to. This leaves us with little doubt about the ability of machines to learn on par with humans. i.e, Artificial General Intelligence has been achieved. How? Because we thought the machines deserved to learn from the humongous totality of all Reddit conversations; emojis and slang welcome. We thought the machines deserved to learn about Dante as well as Diogenes. We taught machines to pay "attention" to the next word as well as every other word that occurs reasonably together.

Is this a luddite post complaining about all theses incredible changes? Hell, no. I'm fascinated that I'm living in an era where I'm able to witness this marvelous change. What I'm worried about is, we are not enabling humans to learn almost anything (they want to) the same way we are enabling machines to learn almost anything.

A case for UTM

What's a Universal Teaching Machine (UTM)?

Universal Teaching Machine is a tool which can be used to teach any human about any topic they want, by making the teaching personalized and as efficient as possible.

Why do we need UTMs?

Information asymmetry does exist in education

Information Asymmetry exists very much in the education domain and the advent of ULMs is going to skew that asymmetry. Why does this asymmetry exist? Well there's a socio-economic angle to it. There's an educational policy angle to it. But I am primarily concerned with the intrinsic aspect of the learning process itself.

We did not learn how to learn

We are rarely taught how to learn in schools or in colleges, yet our success in academics, and as a knowledge worker largely depends on how well we learn anything. We have been misled by misconceptions of "innate talent" and "inborn passion," dissuading us from exploring new subjects and embracing expertise in diverse domains. It's the same notion that prevents an individual to switch domains (most employees think they're married to their domain). However, the truth is far from these myths - without putting in the grind to attain a certain proficiency, no one has ever developed a sustainable passion in any domain. The job of a teaching agent [human or AI] should be then to help us fearlessly delve into these uncharted territories and immerse ourselves in the pursuit of knowledge, thus helping us to unlock the gates to genuine passion and greatness, unraveling the true potential that lies within us.

The 2 sigma problem [and solution]

The 1984 paper on Two Sigma problem by Benjamin S. Bloom demonstrates the 2 sigma impact of 1:1 personalized tutoring over regular coaching (where the average can be pushed to the 95th %ile). This means that using 1:1 tutoring, you can convert an average student into an excellent student and a weak student into a good student.

No alt text provided for this image

It's posed as a problem back then, because 1:1 tutoring was postulated as a non-scalable solution. Even today, it's a pertinent problem, since we have not effectively solved for that even with the advent of ULMs. Sal Khan of Khan academy is one of the benevolent few who's thinking seriously about this topic, and he touches some of the topics that I'm discussing here. So if you have not watched his TED video on this topic, I'd say, ditch this article and watch that first.

Is UTM an utopia?

A tool which can overcome the above obstacles, a tool which we can, in good faith, call as a Universal Teaching Machine (UTM) is not an utopian wishful thinking. Those who are pretty good at learning stuff, intrinsically employ a few techniques that aids them in gobbling up massive amounts of knowledge and makes them 100x productive. Obviously those techniques itself ain't some magical touch of genius; most of these techniques such as the Feynman technique, Spaced Repetition, Interleaving and Generation technique are rooted in cognitive psychology research and have been shown to be highly effective in enhancing learning and retention. I'm here to show you that most of these techniques can be modularized and implemented, some with the help of LLMs, but most, with common sense, creativity and commitment. We just need to solve for (implement) the following components to build a barebones UTM.

The Whole to part processor

This is a technique employed in Land Surveying. When you want to measure an entire landscape, you try to get a preliminary understanding of the "whole" area before starting to survey the "parts". Think of this as skimming a book and literally trying to judge a book by its cover. The core idea is to try to form a preliminary understanding of a complex topic, so that we can identify and set a goal for completing the topic on hand. This can help us break the topic into into smaller, manageable components and track our progress against the identified goal.

No alt text provided for this image

The Feynman processor

Think of any topic that you are good at. Can you explain this topic to a 5-year-old? A high school student? An elderly person who has no idea whatsoever about your domain? If so, then you have mastered the topic according to the Feynman technique. Named after the Nobel laureate Richard Feynman, Feynman technique can also be used teach complex concepts by explaining a concept at multiple levels based on your understanding at different points in time. Add-in a little personalization according your language of choice, throw in examples based on your interest - all of which is possible using current generation LLMs. And you got yourself a great Feynman processor.

The WIRED 5 levels series is a great inspiration in this regard. Here's one of my favorite.

Spaced repetition module

A module which strategically choses the right intervals to learn/re-learn a concept. This is based on the concept of memory curves, which indicate that you are likely to forget any piece of information you try to learn (around 70% in 20 mins). The rate at which you forget information can be captured by an exponential decay function.

Duolingo (yes! the language learning app) published a paper demonstrating how we can capture the decay of this curve using Half-life regression. Drawing inspiration from Peter Drucker's famous quote, I'd like to say, "If you can measure it, you can try to improve it". And that's exactly what Duoling does - it strategically uses spaced repetition based on the decay curves, helping you remember the words and concepts you're learning in the app. With this explanation, I believe the feasibility of implementing this module is well-established.

No alt text provided for this image

The Part to whole processor

The module which is responsible for tying up everything to the big picture or the main idea that the learner wanted to learn. Also evaluates progress against the learning goal. This is just an analytics metric problem in my opinion (and hence the feasibility of implementation should be out of question).

The Goldilocks module

The module which persists the right amount of challenge in learning anything after which it tries to moves on to a new topic. The concept of interleaving (where you mix up diverse topics, while trying to study something) can be experimented with, to see what level of interleaving achieves the Goldilock's threshold.

No alt text provided for this image

Measuring motivation is a sure hell of a task. However, Douglas W. Hubbard makes a pretty convincing case that it can be done in his book - "How to Measure Anything."

Conclusion

I love learning. Because learning new things gives me joy. A joy that I do not want to outsource to machines yet. But I intend to use them to make learning a little bit easier, a little bit interesting. And a little bit democratized. Hence I'm building the Universal Teaching Machine. In a small way. Contributions, criticisms and a coffee chat are most welcome.