The State of Learning to Code - 2024 Report

16 min read Original article ↗

I’ve been building a learning curriculum for backend developers for the last 3 years, but I’ve mostly been relying on qualitative feedback and my own intuitions.

Well now I have my own quantitative data, and as the founder/grand magus of Boot.dev, I’m gonna use it. I figured I’d also share it with you, if you’re interested.

What is this report? 🔗

This is primarily an info dump of our Boot.dev learners’ stats (aggregated and anonymized of course) with some of my own commentary. Obviously, I’ll only include data that makes me look correct and smart.

High-level numbers 🔗

I’m not trying to flaunt our growth numbers (but if you want to acquire Boot.dev for north of 10 Billion hmu) but to understand the data that follows, it’s important to know a bit about our scale. In some cases we have enough data for statistical significance, in others we might not.

User data 🔗

MetricValueDescription
Total Registered Users336,271Everyone who has made an account, free or paid
Total Individual Members18,255Folks who are paying (or were gifted) a membership
Total Team Members193Folks who are part of a team membership

Course data 🔗

MetricValueDescription
Total Lessons2090All active lessons. A lesson is a single pass/fail assignment that usually takes ~2-10 minutes, with outliers of 1+ hours
Total Chapters217Chapters are collections of lessons grouped by concept. They usually have 6-16 lessons.
Total Courses21Courses are collections of lessons broken into chapters. Courses are building blocks of tracks. Courses primarily teach new concepts.
Total Projects9Projects are also collections of lessons and are building blocks of tracks. Projects have much larger lessons with less guidance. Projects primarily practice known concepts.
Total Tracks1A track is an ordered list of courses and projects. We only have one currently: a backend developer track. Working on new ones.

Usage data 🔗

MetricValueDescription
Lesson Completions10,725,530Total number of lessons that have been completed by all users.
Course/Project Completions64,286Total number of courses and projects that have been completed by all users.

Here’s a chart of lesson completions by month (blue bars) and active members (orange line):

lesson completions

The hardest concepts when learning to code 🔗

Let’s start at the beginning. Where do people give up?

Here’s the drop-off funnel of chapters 4-14 of the very first course: “Learn to code with Python”.

python chapter drop off

The first bar represents the number of users who did at least one lesson in chapter 4. The next bar represents the number of users who went on to do at least one lesson in chapter 5, and so on.

I excluded the first 3 chapters because the interactivity paywall is at the end of chapter 3, which skews the results.

By percentage, chapters 7, 8 and 9 have the biggest drop off rates, 11.5%, 10.5%, and 15% respectively. These chapters cover comparison operators, loops, and lists.

How does Go compare to Python? 🔗

It’s important to understand that our Python course is an “intro to coding” course. It starts from zero. Our Go course is “Go for developers” and assumes that you already understand coding concepts, and want to learn the Go-specific syntax and idioms. Most new learners on Boot.dev start in one of two places:

  1. Brand new coders in Python
  2. Experienced coders who just want to learn back-end development in Go

go chapter drop off

The largest drop offs are in chapters 4, 5, and 8 with 24%, 14% and 22% respectively. These chapters cover structs, interfaces, and slices.

Giving up vs struggling 🔗

Now to be fair, giving up isn’t a pure measure of difficulty. It’s a combination of difficulty and motivation and probably a few other factors. So let’s look specifically at the hardest chapters in each course.

It’s interesting to distinguish between “hard” in the absolute sense and “hard” relative to how much you struggle when you arrive at a concept. For example, learning about functions is easier than learning about recursion in the absolute sense, but is it easier in the relative sense? You won’t encounter recursion in Boot.dev until you’ve had 6 additional courses of programming practice. We mostly care about relative difficulty. We see it as our job to introduce the right concepts at the right time, with the right amount of practice.

We calculate a “difficulty” score for each lesson based on a few metrics:

  • Number of attempts before passing. Weight .4
  • Number of solution views before passing. Weight .4
  • Number of chats with AI before passing. Weight .2

We’d like to add “time to complete” as a factor, but its surprisingly tricky to measure accurately… we plan to add it in the future.

We then use a normal distribution/standard deviation to normalize the scores. Finally, we map it all onto a scale of 1-10 so its easy to understand what the numbers mean as a user.

This means difficulty scores are relative to the other lessons in the course. It would be impossible to have all 10s, or all 1s.

Here are the chapters of the Go course, sorted by average lesson difficulty, hardest to easiest:

Avg Difficulty ScoreChapter
6.667Loops
6.000Channels
4.769Maps
4.636Slices
4.476Functions
4.250Generics
4.250Enums
4.200Conditionals
4.091Structs
4.091Pointers
4.080Variables
4.000Interfaces
4.000Errors
3.750Mutexes
3.556Packages and Modules
3.500Quiz

Of course, we run into a new problem… a chapter with more hard lessons will appear easier if it also has a lot of easy lessons. If anything, more lessons (of any kind) should indicate the chapter is harder if only by a little. It certainly doesn’t mean it’s easier.

Let’s try sorting by the chapters with the most lessons over a difficulty score of 6:

Num Difficult LessonsChapter
4Channels
3Maps
3Loops
3Slices
2Functions
1Generics
1Conditionals
1Variables
0Errors
0Structs
0Mutexes
0Enums
0Packages and Modules
0Interfaces
0Pointers
0Quiz

As expected (by me) channels (and by extension concurrency) bumped up a spot - students anecdotally seem to ask for more help with this chapter in our Discord than any other.

I picked 7-10 as an arbitrary cutoff for “hard” lessons because it gets weird when you sum the difficulties because 3 lessons of difficulty 3 is significantly easier than a single 9. That said, I think we can improve one last thing. Let’s sum the difficulties, but only if they’re over 6. That way we give a bit more weight to the harder lessons:

Sum of Difficult ScoresChapter
33Channels
23Loops
22Maps
22Slices
15Functions
7Generics
7Conditionals
7Variables
0Errors
0Structs
0Mutexes
0Enums
0Packages and Modules
0Interfaces
0Pointers
0Quiz

This feels really close to what our students report - so let’s roll with it. Maybe in the future we’ll add some sort of exponential scaling so that we don’t need a cutoff, but I’d need to sit down and test it, and we would probably need more data to make that worthwhile.

Now let’s use this calculation for every chapter on the platform. Here’s the data formatted as a Markdown table:

ScoreCourseChapter
76Learn PythonLoops
56Learn Data StructuresBinary Trees
46Learn KubernetesStorage
46Learn HTTP ServersJSON
44Learn Data StructuresTries
42Learn Functional ProgrammingRecursion
39Learn AlgorithmsP vs NP
37Learn SQLJoins
34Learn Data StructuresLinked Lists
33Learn Advanced AlgorithmsLinear Programming
33Learn GoChannels
33Learn AlgorithmsExponential Time
32Learn Memory ManagementMark and Sweep GC
31Learn Functional ProgrammingFirst Class Functions
31Learn Advanced AlgorithmsDijkstra’s
31Learn Functional ProgrammingDecorators
30Learn PythonFunctions
29Learn Object Oriented ProgrammingPolymorphism
29Personal Project 2Placeholder
28Learn Data StructuresRed Black Trees
28Learn HTTP ServersAuthentication
27Learn CryptographyDES
26Learn Functional ProgrammingSum Types
26Learn Functional ProgrammingPure Functions
26Build a Static Site GeneratorWebsite
25Learn HTTP ServersServers
25Build a Blog AggregatorFollowing
24Learn Functional ProgrammingCurrying
24Learn SQLAggregations
23Learn Memory ManagementAdvanced Pointers
22Learn PythonVariables
22Learn GoMaps
22Learn Object Oriented ProgrammingInheritance
22Build a Static Site GeneratorInline
22Learn KubernetesNodes
22Learn GoSlices
20Learn Data StructuresHashmaps
19Learn GitConfig
18Learn Memory ManagementPointers
18Learn Functional ProgrammingClosures
18Learn Pub/Sub ArchitectureSubscribers & Routing
18Learn AlgorithmsSorting Algorithms
18Learn SQLIntroduction
18Learn Data StructuresBFS and DFS
17Learn Advanced AlgorithmsEdit Distance
17Learn Pub/Sub ArchitectureDelivery
17Learn Memory ManagementObjects
17Learn CI/CDDatabase
17Learn Functional ProgrammingWhat is Functional Programming?
17Learn CryptographyRSA
17Learn Pub/Sub ArchitectureSerialization
16Learn Object Oriented ProgrammingClasses
16Learn JavaScriptArrays
16Build a Blog AggregatorRSS
16Learn HTTP ClientsDNS
16Learn Object Oriented ProgrammingAbstraction
15Learn Memory ManagementStack Data Structure
15Learn Advanced AlgorithmsHeaps
15Learn CI/CDBuild
15Learn CryptographyHash Functions
14Learn PythonDictionaries
14Learn SQLBasic Queries
14Learn PythonErrors
14Learn PythonLists
14Learn Pub/Sub ArchitecturePub/Sub Architecture
14Learn Advanced AlgorithmsA* Search
14Learn Pub/Sub ArchitecturePublishers & Queues
10Learn CryptographyDigital Signatures
10Learn HTTP ServersRouting
10Learn Pub/Sub ArchitectureScalability
9Learn CryptographyAES
9Learn Pub/Sub ArchitectureMessage Brokers
9Learn SQLPerformance
9Learn Data StructuresStacks
8Learn HTTP ServersWebhooks
8Learn SQLConstraints
8Learn CryptographyBlock Ciphers
8Learn Memory ManagementRefcounting GC
8Build a Static Site GeneratorBlocks
8Learn CI/CDFormatting
8Learn CryptographyEncoding
8Learn Memory ManagementStack and Heap
7Learn Object Oriented ProgrammingEncapsulation
7Learn GoGenerics
7Learn GoConditionals
7Learn CryptographyKDFs
7Learn CryptographyCaesar Cipher
7Learn HTTP ServersAuthorization
7Learn Functional ProgrammingFunction Transformations
7Learn HTTP ClientsPaths
7Learn Memory ManagementUnions
7Learn SQLSubqueries
7Learn CryptographyAsymmetric Encryption
7Learn CryptographyStream Ciphers
7Learn CI/CDTests
7Learn CryptographyBrute Force
7Learn HTTP ClientsAsync
7Learn HTTP ClientsURIs
7Learn HTTP ClientsMethods
0Learn How to Find a Programming JobRelocation
0Learn HTTP ClientsHeaders
0Learn Git 2Squash
0Learn Git 2Stash
0Learn GitRepositories
0Learn AlgorithmsMath
0Learn Shells and TerminalsTerminals and Shells
0Build a Static Site GeneratorStatic Sites
0Learn CryptographySymmetric Encryption
0Learn Git 2Reflog
0Learn DockerPublish
0Learn CI/CDLinting
0Learn HTTP ServersDocumentation
0Learn Shells and TerminalsPermissions
0Learn GoPackages and Modules
0Learn GitInternals
0Learn Advanced AlgorithmsBellman Ford
0Learn GitReset
0Learn DockerDockerfiles
0Learn Git 2Tags
0Learn Data StructuresGraphs
0Learn CI/CDContinuous Integration
0Learn How to Find a Programming JobApplying
0Learn CI/CDSecurity
0Learn PythonScope
0Learn SQLNormalization
0Learn GitRebase
0Learn Data StructuresQueues
0Learn How to Find a Programming JobResume
0Learn Git 2Rebase Conflicts
0Build a Blog AggregatorAggregate
0Learn GitMerge
0Learn CI/CDDeploy
0Learn AlgorithmsPolynomial Time
0Learn How to Find a Programming JobStrategy
0Learn SQLStructuring
0Learn How to Find a Programming JobLinkedIn Profile
0Learn Advanced AlgorithmsDynamic Programming
0Learn KubernetesNamespaces
0Learn DockerCommand Line
0Learn Memory ManagementEnums
0Learn How to Find a Programming JobNetworking
0Learn GitBranching
0Learn Memory ManagementC Basics
0Learn Git 2Revert
0Learn Git 2Worktrees
0Learn GoMutexes
0Learn Shells and TerminalsPackages
0Learn Shells and TerminalsPrograms
0Learn PythonComparisons
0Learn How to Find a Programming JobInterviewing
0Learn DockerNetworks
0Learn PythonTesting and Debugging
0Learn GitGitHub
0Build AsteroidsAsteroids
0Learn PythonSets
0Learn SQLTables
0Learn GitGitignore
0Learn PythonComputing
0Learn PythonQuiz
0Learn Shells and TerminalsFilesystems
0Learn SQLCRUD
0Build AsteroidsPlayer
0Learn KubernetesScaling
0Learn Git 2Cherry Pick
0Learn KubernetesConfigMaps
0Learn Memory ManagementStructs
0Learn Git 2Fork
0Learn HTTP ClientsHTTPS
0Learn How to Find a Programming JobGitHub Profile
0Learn Git 2Bisect
0Build AsteroidsPygame
0Learn KubernetesPods
0Learn GitSetup
0Learn CryptographyXOR
0Learn GitRemote
0Learn How to Find a Programming JobProjects
0Learn HTTP ServersArchitecture
0Learn GoInterfaces
0Learn HTTP ClientsWhy HTTP?
0Learn KubernetesIngress
0Learn JavaScriptRuntimes
0Learn KubernetesServices
0Learn Git 2Merge Conflicts
0Learn KubernetesInstall
0Learn HTTP ClientscURL
0Learn KubernetesDeployments
0Learn PythonChallenges

To simplify, let’s aggregate all this data by course: 🔗

ScoreCourse
222Learn Functional Programming
209Learn Data Structures
141Learn HTTP Servers
136Learn Go
129Learn Cryptography
128Learn Memory Management
110Learn Advanced Algorithms
110Learn Python
108Learn Algorithms
99Learn Pub/Sub Architecture
99Learn SQL
90Learn Object Oriented Programming
78Build a Static Site Generator
73Learn HTTP Clients
70Build a Blog Aggregator
37Learn CI/CD
29Build a Web Crawler
24Learn JavaScript
7Learn Kubernetes
0Personal Project 1
0Build a Pokedex
0Build a Bookbot
0Learn Docker
0Personal Project 2
0Build a Maze Solver
0Learn How to Find a Programming Job
0Learn Shells and Terminals
0Capstone Project
0Learn Git 2
0Learn Git
0Build Asteroids

Now one might look at this, and think ThePrimeagen writes courses for n00bs, whilst TJ writes courses for the elite…

… but to be fair to Prime, his course’s data is skewed for two reasons:

  • The Git courses don’t yet have solutions to view (its tricky to implement solutions in a way that’s easy for us to maintain and also easy for the student to grok, but we think we have a solution coming soon)
  • The courses that are completed on your local machine (like Git) are harder to “screw up” on submission, because there aren’t any hidden test cases currently

Again, adding a “time to complete” metric should really help some of these courses have more accurate scores.

All that said, the current calculations seems to work really well for the courses that are comprised mostly of self-encapsulated coding lessons.

AI Tutors 🔗

It’s important to understand that viewing a solution before completing a lesson costs a “seer stone” (10 gems) or 75% of the lesson’s XP. Chatting with AI costs a “baked salmon” (2 gems) or 50% of the lesson’s XP.

The total number of messages sent to our AI mentor, Boots, was about 50% higher than the total number of times users viewed solutions.

total boots vs solutions

What’s interesting to me, is that if we aggregate by “count per use average”, we see that the boots chats are almost 3-4x more common than viewing solutions:

unique users boots vs solutions

I think this is because people who do use the AI mentor, use it more - but fewer people use it overall (I’m guessing Posthog excludes the 0’s from the average calculation’s denominator).

Now here’s my favorite part:

boots solutions totals before/after

Boots is disproportionately more popular before a student has completed a lesson, while viewing solutions are more popular after the lesson is complete.

This finding aligns with our hunch: students prefer to be guided using the Socratic method (which is what our AI is prompted to do) than to “cheat”. However, once a lesson is complete, students like to see how the instructor solved the problem - more than they like to ask follow-up questions.

Breaking it down by language 🔗

It’s important to remember that our Python course is “Python for beginners” and our Go course is “Go for developers”. And because we start students with Python, it has about 4.5x more total lesson submissions than Go.

Language% Boots Used% Solutions UsedTotal Daily Lessons
Learn Python Course9.05%8.99%~22,000
Learn Go Course7.14%9.45%~4,500
Learn FP (Hardest Python Course)32.52%21.04%~1,700
Learn HTTP Servers (Hardest Go Course)35.6%13.59%~350

It seems to me that as the problems get larger and more complex (HTTP servers lessons span multiple files, FP problems tend to be larger and more complex functions) it seems like people lean more and more on the AI mentor over direct solutions.

Breaking it down by model 🔗

For the last 60 days we’ve been powering Boots 50% with OpenAI’s GPT-4o and 50% with Anthropic’s Sonnet 3.5. Once you start a conversation with a model, you stick with that model for the duration of the conversation, but each new conversation is randomly assigned to a model.

ModelLike CountDislike CountTotal MessagesLike RatioDislike Ratio
claude-3-5-sonnet-202406209321624300230.00220.0004
gpt-4o-2024-08-068462673603590.00230.0007

One thing that’s really interesting is that about 20 days into the experiment, GPT-4o had a decisive (iirc 50% better performance), but it has become closer over time. Honestly I think we just need a lot more data; we probably need to update our UI to make the thumbs-up/thumbs-down buttons more prominent so we get more feedback per conversation.

We haven’t done this yet, but we plan to also start versioning our system prompts and using the same thumbs up/down system to continually improve the quality of the prompts and context we provide to the models.

AI content help 🔗

Generating content with AI is, by and large, a terrible idea. I’ve experimented with it a lot because it’s one of our single biggest time costs - and anything we can do to produce more and better content faster and cheaper is a win.

I’m convinced that as the world becomes flooded with more and more AI slop, the value of highly curated and high-quality content with a *chef’s kiss* human flourish will become more valuable, not less. But that doesn’t mean we haven’t found some use cases for AI in our content management.

Use case 1: Diagnosing student errors 🔗

We get a lot of reports on lessons. And we take them very seriously. As of Oct 17, we’ve closed 10,964 tickets, and currently have only 73 open. The median time a ticket stays open is 1 day. And that’s not even counting all the reports we get, because reports on the same lesson get aggregated into the same ticket - until its closed and a new ticket is started. 1-3 reports per ticket is common.

Anyhow, the point is that we’ve been building internal systems to make this process more manageable, because although we get a ton of reports, only a fraction of those reports are actionable. We’ve broken reports down into 5 categories:

  • diagnose-studenterror: The student is confused about the lesson, but the lesson is correct
  • diagnose-enhancement: The student has a good idea for an improvement
  • diagnose-bug: The student has found a valid bug
  • diagnose-badchange: The student is identifying a bug that’s not real, or suggesting a change that would make the lesson worse
  • diagnose-question: The student is asking a question, not reporting an issue

Of these, only diagnose-enhancement and diagnose-bug are actionable. Everything else can be safely closed without modifying content - although it would be nice to respond to the student and let them know if they’re confused or mistaken.

Well, up until about a month ago, this diagnosing and responding was done manually. In fact, we only responded to students <5% of the time that we would have liked to, because it was so time consuming.

However, now we use GPT-4o diagnose, and based on the diagnosis, provide a simple response to the student. The diagnosis is helpful for us, because we can more quickly close the ticket or start on a content change with confidence. The response is helpful for the student because they instantly understand that:

  1. This isn’t the place to ask a question
  2. They are likely confused - the lesson is in fact correct, and they should more carefully inspect their code, chat with Boots, or view the solution.

We’ve only been running this for a couple weeks (long enough to diagnose 229 tickets), but so far here’s how it breaks down by the numbers:

DiagnosisCount% of total
diagnose-studenterror63~27%
diagnose-enhancement86~38%
diagnose-badchange8~4%
diagnose-bug35~15%
diagnose-question37~16%

Use case 2: Intentionally making content worse 🔗

Alright that’s a bit tongue in cheek, but… only a bit. I’ve tried many times to get AI to generate new lessons. I’ve given it our repository of lessons via fine-tuning, I’ve tried to shove examples of great human-written lessons in a context window, etc. But what I’ve gotten back is mostly slop (at least by our standards). Sure, it’s usually correct, and it reads like correct English, but it’s just so boring. If you try to get the AI to use more creative language, it only does so superficially and cringily.

So I’ve mostly given up on that for now - waiting for GPT 5 I guess.

Anyhow, what it is good at is reducing and reformatting. I’ll spoil a new feature that we’re working on here, called “spellbooks”. Your spellbook is a UI feature easily accessible via a keyboard command, that fuzzy searches through the “pages” you’ve unlocked. You unlock a spellbook page as you complete lessons. A spellbook page is just a condensed lesson: A short no-nonsense description of the concept, a few code examples, and links to documentation. The intention is:

  1. You won’t need to take notes
  2. You won’t need to bookmark lessons
  3. You won’t need cheatsheets
  4. You won’t need to navigate back to earlier lessons for examples and documentation

For example, you might be halfway through the Go course but have forgotten the syntax for a struct. So you hit cmd+k (or whatever keymap we choose) and type “struct” and you get a spellbook entry that looks something like this:

A struct in Go is a collection of fields. They're a convenient way to group different types of values together in a single place:

```go
type Person struct {
    Name string
    Age  int
}
```

- [Go reference](https://go.dev/ref/spec#Struct_types)
- [Go tour](https://go.dev/tour/moretypes/2)

What’s this have to do with AI? Well, as it turns out, AI is really good at taking a larger, more well-written lesson and condensing it down to a spellbook entry. We’ve written a script that recursively generates spellbook pages for each lesson in a directory. It still requires human verification and touch ups of course, but it’s a lot faster than writing them from scratch.

Resetting lessons 🔗

Students like to review stuff. Spellbooks will be nice for recall, but not so much for practice. In the future we have plans for a better “practice” experience, but the quick and dirty thing was to allow users to reset lessons. When they reset, progress bars and navigation are reset, but re-completing the same lessons does not reward XP or chests.

Since adding lesson resets (July 24th, 2024) until now, we’ve had 3,561,569 successful lesson submissions and 418,289 resets. That’s 11.7% of all successful lesson submissions.

I was actually blown away by this number - I thought it would be in the 2-3% range, but apparently people like to redo lessons more than I thought.

People like to RTFM 🔗

We have a “listen to the lesson” button that reads you the lessons in a soothing British AI voice:

audio listen screenshot

I was surprised that people only seem to use it about 1.4% of the time. People like to read more than I expected.

Successful lesson submissions vs Audio listens

Vim mode 🔗

Our in-browser editor has a vim-mode, and since adding the configuration to our backend database (it used to live in local storage) we’ve had 684 people enable it. That’s 684 people in one week, and we have roughly 10,000 weekly active users.

So around 6.8% of active learners are using vim mode, honestly more than I expected!

Thanks for making it this far 🔗

I hope this was interesting to you in one way or another! It was really helpful for me just to sit down and gather all this data, even just for our own product and content development. I figured while I was at it, I might as well turn it into a marketing stunt share it with the world altruistically.

Find a problem with this article?

Report an issue on GitHub