Data-driven Scala Programming

4 min read Original article ↗

Analyzing sbt compilation logs

Claudio Caletti

Experience

Discussions about programming languages are generally driven by experience and intuition.

The first time I heard someone talking about monads in programming languages I thought it was nonsense. Why do you need to add something so complex to your codebase? At a first glance it doesn’t look like it’s making your life easier. It took me time to realize it actually makes sense.

Even almost-universally accepted statements such as “try not to use mutable variables” are always exposed in a dogmatic way.

Evidence

A more sound and effective approach for convincing people, is to provide them with evidence that supports your claim. “A monad is an endofunctor together with two natural transformations” doesn’t hold like evidence.

With the term “evidence” I mean “numbers”.

Here’s a better one: “The 70% of my production errors are type errors: that’s why I stopped using JavaScript, and started using TypeScript instead”. That makes sense even if you never experienced how horrible writing JavaScript code is (😇).

A quantitative analysis makes your arguments stronger, and drives you to better conclusions. Coming up with a quantitative approach it’s harder than just having an opinion: you need to take time to collect your data, analyze them and synthesize a conclusion. It takes more effort, but it works better.

Having this in mind, at buildo we started analyzing our compilation logs. We didn’t know in advance what we were searching for — we just wanted to explore and acknowledge them.

Collecting And Analyzing Logs

To begin, we had to collect the data. At buildo we use sbt, the go-to build tool for Scala projects. It seemed to be a good place to search for data.

After a few attempts we ended up with the following alias:

alias sbt='f() { sbt "$@" |tee -a ~/.sbtlogs; };f'

Once you add it to your dotfiles you’re transparently logging all your compilation errors. It works with multiple parameters, both on bash and zsh. 🎉First step is done.

Get Claudio Caletti’s stories in your inbox

Join Medium for free to get updates from this writer.

After 4 months we had a look at the logs. We analyzed them, and here you have my top 3 errors per type of error:

  • 🥇Not found
  • 🥈Type mismatch
  • 🥉Too many arguments for method parameters

The “Not found” error appeared 3175 times in my logs. The error is caused by missing imports. It means I used to forget imports 40 times a day. At the time I was writing Scala using Vim, no Ensime or IDE… Apparently not a great idea if you don’t like wasting your time.

The second error, “Type mismatch”, appeared 1771 times in the logs. That’s good news. It means the type system is working well: it catches type errors pretty often. Cool! I can finally honestly claim that I use Scala for a good practical reason.

The third error is a little more tricky. Diving deeper into why it appeared so often we found out it was an instance of a class of errors related to the Magnet Pattern. The Magnet Pattern is a programming pattern used by Spray (the http library) to implement its routing DSL. The error was due to the approach we used to write HTTP routes — this story is discussed in depth here.

Other programmers showed similar patterns. @gabro won the “apple and oranges contest” with 45 type mismatch errors a day.

What’s Next?

If you really want to improve your programming habits, data are better than opinions. Even from a simple experiment we were able to draw interesting conclusions:

  • Writing Scala using an IDE saves you time (Vim + Ensime counts as IDE)
  • The type system is useful in practice — type errors are common
  • Our approach for writing HTTP routes was not so good

In practice, analyzing sbt logs had two relevant outcomes for our team: First, we almost stopped writing Scala with no IDE (As usual, Ensime counts as IDE). Second, we decided to change the way we wrote HTTP routes. The fact we made so many errors writing routes was one of the main reasons why we implemented wiro, a library that automatically generates HTTP routes from controllers.

Here you can find the code we used for the analysis. It’s written in Python. Knowing in advance the outcome we would have used a language with a type system 🍊🍎.

I suggest you should alias your sbt command and start logging your compilation errors. I’d love to get back to you in a few months and discuss what you discovered.

If you want to work in a place where we care about the quality of our development workflow, take a look at https://buildo.io/careers