The Bitter Lesson’s Btter Lessons (opinion piece)
Date read: 20th October 2025
Blog link
Key Points
- Addresses Richard Sutton’s claims on Dwarkesh Podcast in which he roughly said that babies don’t learn through imitation, so LLMs shouldn’t either.
- Not mimicing humans and learning from scratch gives up huge potential compute savings… it gives up all the knowledge humans have strived so hard to accumulate.
- Estimates that to get to human intelligence took 10^50 operations (10^30 parallel organisms alive for 4.9x10^9 years)
- LLMs tend to train on 10^26 FLOPs… so we’re a long way of this
- Humans utilised information technology to overcome this:
- Broadcasting: spread info to others
- Broad listening: distill info from multiple sources into a single world model
- LLMs excell a broad listening… but still only act on tiny proportion of human data as most of it is locked up in private databases
- This private data could allow huge performance increases in LLMs as a lot of it is also very high quality (business records, health records etc)
- Challenge: allow knowledge transfer whilst maintaining copyright, security, privacy etc