I’ve had so many behind the scene requests recently that I thought perhaps I would write a quick short note now; a longer AI update will come in a month or so as I had planned.
Thoughts On DeepSeek
For some reason, the fact that a Chinese hedge fund has developed a LLM model that could be trained relatively cheaply has shocked everyone. I was not surprised.
I foresaw everything in AI the past decade, assembled colossal models, and now, standing on that Mount Olympus of predictions, I have precisely zero idea what comes next. (Ilya at NeurlPS)
Some thoughts:
The history of innovation is littered with Horatio Alger stories of businesses started by a few people in a garage. Hewlett-Packard, Apple, Amazon, Google, Nike, Dell, etc. So that a few people can create something new is hardly a surprise. Instagram before acquisition had only 13 employees; Whatsapp had 50. Hedge fund managers are the new geeks in garages.
The power of incentives is strong. The CHIPS Act restricted the sale to China on NVidia’s more advanced H100 chip. The primary difference between banned and unbanned chips is the data transfer rate. The H800 that is exported is nerfed by effectively halving the data transfer rate between chips compared to the H100. So it is hardly surprising that creative people found a way to innovate around the constraint. That’s essentially what Deepseek did, and why their cost was lower.
There is a well-developed history of smaller, more efficient models being developed from, and trained by, the larger LLM models. OpenAI, Anthropic, Llama have all done so. We are now beginning to hear that DeepSeek trained on the larger US models.
Let’s step back for a second.
First off, when we first spoke about Large Language Models (LLMs) we were talking about specific, next-word-ahead prediction models based on back-propagation and other AI techniques.
Now, when we talk about the latest LLMs, we are talking about complex systems - aggregations of models. The best models today are “mixture of experts models;” specific models trained to perform specific functions. In the DeekSeek case, the cost is low because each query only goes to the needed expert sub-model. One of DeepSeek’s submodels specifically evaluates the quality of the “chain of thought” reasoning process used. OpenAI and Anthropic to my understanding do something similar, but it is not fully disclosed.
Second. Deepseek is an Open Source model like Meta’s Llama; currently, these models are at least as good as proprietary models from OpenAI.
Third, one innovation from Deepseek was how it implemented reinforcement learning. Reinforcement learning “rewards” good answers and different models have used different reward functions. Deepseek appears to have combined many reward functions into a very effective function.
The implication of all this is that the cost of artificial intelligence is continuing to be lowered - but that has been the case for a while. If the energy needed to train and run these models goes down, that is a good thing (for the cost of AI, for global warming, etc).
But the lower cost may induce even faster and broader growth. You may have heard the phrase Jevon’s Paradox thrown around.
As we have been saying from the beginning, things are growing exponentially fast. Let’s remember that this LLM revolution is just really three years old. Covid supply chain concerns are older than the LLMs.
I went back to look at what I wrote about LLMs in the past, and where things were at the time.
Before GPT-3, we talked about AI in a broader sense, but the criticism was that it couldn’t understand context.
In 2020, when I first wrote about ChatGPT (GPT-3) on the Substack the response was
“it’s just stochastic parroting,”
“it doesn’t know basic facts,"
“it’s biased, sexist and racist”
“it can’t generalize outside of its training data”
“it thinks 2 + 2 = 5” and
“it’s too slow to be practical.”
By 2023, the criticism had moved on to
“yeah, but it still hallucinates,”
“ok, it can do simple math, but fails multi-step problems, and it has trouble with calculus”
“it tries to plan, but gets stuck in loops,”
“now it’s too censored - it won’t answer basic historical questions,” and
“its knowledge isn’t current.”
In 2024, the critical bar was raised yet again
“it can sort of plan, but gets distracted,”
“lol, it struggles with visual-based logic puzzles,” and
“ok, it can get information from the internet, but it just pastes internet search results.”
The next stage will be existential panic.
It’s getting too powerful!
Everything’s changing too fast!
Is AI secretly making decisions we don’t know about?
I think we’re entering a phase where people will stop laughing at AI’s failures and start worrying about its successes.
First, people laugh at it.
Then, they complain about its flaws.
Then, they start using it anyway.
Finally, they worry about its impact.