Our Love-Hate relationship with Jupyter Notebooks

About the Author:

I am the Co-Founder and CEO of Impeccable.AI. My previous experience includes working for Microsoft and Tractable on AI/ML problems as ML Engineer.

At Impeccable.AI we are building an AI Development Platform (it has a space for Notebooks too!) to help teams developing AI products work faster and collaborate better. We are taking a horizontal approach offering tools that are “intelligent assistants” and offer maximum flexibility. We know that our customers — the experts in their domains — know exactly what to do and how to do that.

Project Jupyter (started in 2015) is a project and open-source community. Its goal is to “develop open-source software, open standards and services for interactive computing across dozens of programming languages”. [Wikipedia]

Most people associate Project Jupyter with the Jupyter Notebook (now JupyterLab) — a web-based, interactive computational environment for creating notebook documents.

Press enter or click to view image in full size

Over the years, Notebooks became very popular in the Data Science and Machine Learning communities. Thanks to their unique set of features it is a powerful tool for Data Scientists around the world. At the same time, their characteristics provoke strong negative opinions.

I investigated which features in particular are loved and which are hated. I sourced most of them from multiple Reddit threads, YT videos and DS/ML Slack communities as well as from my own experience.

The Love

I will start with the characteristics that are generally considered positive.

REPL (Read-Eval-Print-Loop) programming environment

At its core, a Jupyter Notebook is a simple interactive computer programming environment similar to a command-line shell. Such an environment takes single user input, executes it and returns the result. This technique works particularly well with scripting languages, such as Python which is very popular in the DS community. Notebooks encourage incremental development of code thanks to its visual organisation as cells.

Kernel state caching

Jupyter Notebook’s most powerful addition to REPL is its concept of the Kernel. The backend responsible for code execution, maintains a computation state. The state is carried between each executed piece of code. This is different from simple REPL implementation in e.g. bash. The feature is useful for anyone whose work relies on processing big chunks of data. Once the data is loaded, it stays in the memory for the lifetime of the kernel (which is usually until it is restarted or shut down). This saves network, computational resources and time.

It doesn’t come for free though. All data referenced in the notebook sits in the machine’s memory. The state can bloat all available memory making the machine unresponsive. It may affect not only the directly responsible individual, but everyone else using the same physical machine. This depends on the size of the data, used framework and code organisation.

Another limitation is that the cached kernel state with e.g. semi-processed dataset is tied to just one Notebook instance and can’t be freely shared. To take the full advantage of state caching, Notebook instance must run as long as it is needed, because it can’t be hibernated to free resources. This can generate extra costs. I know a Data Scientist who used to work comfortably with 50GB+ datasets in Jupyter Notebooks. His secret sauce was a machine with 500GB memory, running 24/h on AWS for ~3.3k USD/month.

Mixed media rendering

Jupyter Notebook cells are not limited to storing code. They do a great job rendering different media kinds, e.g. images, tables, Markdown and more. The output of each execution is also formatted to be presented in the best possible way. This is great for e.g. Pandas dataframes and Matplotlib plots.

Interleaving code, text comments and images helps presenting thoughts and ideas in a clear and concise manner. The whole content of a notebook (incl. media) is saved in the .ipynb file and shareable. This has both benefits and drawbacks. The benefit is that the notebook can be rendered without any execution for a review (GitHub supports that). On the other hand, rich media included in the notebook file can drastically increase its size..

Ease of setup

The JupyterLab server is easy to set up and run. It is a part of the Anaconda data science package. Users with limited knowledge of DevOps can spin a Docker container running one of the publicly available all-in-one data science images in a matter of minutes. The only requirement is a SSH connection to the host. Project Jupyter also includes JupyterHub for more advanced use cases in a multi-user setting.

Notebooks can utilise all underlying compute resources incl. GPUs. This makes it a powerful tool for developing AI models in the Cloud environment. It is also encouraged for organisations that must follow various data protection laws. The right setup limits risky data transfers between secure Cloud storage and personal devices.

The perfect combination

The community agrees that mentioned above features make Jupyter Notebooks a perfect tool for the following tasks:

Learning, prototyping, POCs

The Notebook structure is a perfect implementation of the Literate programming paradigm. The gist of this paradigm introduced by Donald Knuth, is that pieces of code are juxtaposed with explanations of its logic expressed in natural language. Well structured Notebooks are easy to understand and play with, making it great for learning (e.g. a new library), prototyping and POCs.

Sharing ideas, documenting thought processes, presentations, blogs

Well structured Notebooks are easy to share with colleagues to show some idea or walk the reader through the author’s thought process. The structure is also well suited for presentations and blogs, because it allows the author to tell a simple story of going from A to B.

Exploratory Data Analysis (EDA)

Notebooks are a great tool for all kinds of jobs that include analysing, investigating and summarising data sets. They are convenient for compiling comprehensive reports that include not only summaries in the form of plots and tables, but also the source code used for generating those.

Access to compute resources

Well configured deployments give users easy access to compute resources and data with restricted access. A good example is Google Colab, a product that is a bit more than simple JupyterLab, but its principles are the same. Colab gives everyone free access to some decent compute resources that otherwise would be running idle. It is very popular amongst university students and DS enthusiasts.

The Hate

Let’s move on to characteristics that were usually mentioned in a negative context.

The .ipynb file structure

Notebooks in their raw form are simple text files containing JSON representation of everything that should be rendered. Those include the code, the text and Markdown comments and images. Whereas for pure code and text there is not much to worry about, the disk size of a Notebook that contains lots of images can’t be ignored.

Poor or lack support of typical software engineering tools

Jupyter Notebooks, unfortunately, don’t have a dedicated linter, debugger nor testing framework. Their file structure makes the standard tools used in the software development process not very useful. This is partly due to the fact that the Notebooks should be language agnostic. Same applies for the lack of a dedicated dependency management software.

Self-containment

Jupyter Notebooks are very often self-contained files — all pieces of code are stored in the same place. It comes very handy if all we want to share is the analysis and the thought process. On the other hand, it’s problematic for long term maintenance, because code gets duplicated very often.

Selective execution

Both a blessing and a reason for worry, the selective (out of order) execution is something that has lots of enemies. Notebooks allow the pieces of code to be executed in any order, not necessarily from top to bottom. The kernel state is preserved between each execution. It brings huge benefits but very often is also a curse. Running the cells in the bad order may corrupt the kernel state and force the user to restart everything from the beginning. This usually means redoing all the data transfers and computations.

The implications

The following implications of mentioned above characteristics were commonly noted:

Difficulties with version control

Notebooks need special attention in version control systems. People usually don’t want to include rich media in the repository, so the output of each cell should be removed before the file is versioned. The code cells should be treated with a linter to assure code quality. Code testing for Notebooks still looks like an uncharted territory. It is almost impossible for a human to review raw Notebook files, thus the code review process must include one of the community developed Notebooks diff tools.

Issues around reproducibility

The lack of proper dependency management makes Notebooks tricky to reproduce. The possibility of selective execution also adds lots of complexity to this problem. I heard of a situation where a data scientist shared a Notebook with the following instruction on how to start using it: “First, run the 4th cell from the top, then the 2nd twice, because it won’t work at first attempt”.

They encourage bad habits

In this context, people usually mean bad engineering practices. Whether Jupyter Notebooks actively or not encourage those, one thing can be agreed on — people under time pressure will take shortcuts. Such shortcuts reduce delivery time which is good for business (at least in the short term), however looking in the longer perspective, this incurs tech debt that may pile and cause serious problems in the worst time possible.

The tech debt is not necessarily a terrible thing, as long as an organisation has discipline to pay it back in a timely manner. Some may prefer to reduce the amount of the tech debt from the beginning instead.

Difficulties with productionisation

The productionisation of Jupyter Notebooks is a separate topic, one that attracts even more strong opinions. The term productionisation requires more clarification though. Some consider here the execution of Notebooks as part of automated pipelines that process customers’ requests, others, the use of Notebooks in the broad process of AI/ML model development.

For the former, some claim that it is doable with the right approach. Netflix published a post in 2018 in which they described how heavily they rely on the Notebooks in production. They described their methods and what benefits that brings. A year later they also released MetaFlow. According to one of the engineers in the MLOps community on Slack, that is their main tool for the AI development process. Whether they are successful with productionising Notebooks or not, it is important to remember that 1) we don’t have full knowledge of their stack nor culture, 2) every organisation is different and what works for one doesn’t have to work for another.

For the latter, this practice is quite common, because such jobs are usually done by Data/Applied Scientists who in general are in favour of the tool. Depending on the problem being solved, the process might be treated as an extension of EDA. Jupyter Notebooks bring most of the benefits mentioned above, including access to compute resources, state caching etc.

Yet, there is also a second perspective to mention. The process of developing and productionising an AI model is rarely a job done by a single person. What was trained in “isolation” on a training set must be put in production to solve the business problem. The job of productionisation is usually done by ML/Software Engineers, who typically were not in favour of the Notebooks. The voices raised most often in the community were that understanding the internals of the model that was an outcome of a process scattered amongst multiple Notebooks is a nightmare. Also, some mentioned the issues around reproducibility. This indicates that Notebooks have more problems that surface when a process requires collaboration.

Although treated as a swiss knife for data science, the community in general agrees on their fit for prototyping and exploratory work. Whether they should be used for other tasks or in production, is situational. It definitely is a powerful personal tool. Unfortunately, their design makes it less viable in settings where collaboration is required.

The cause is not definitely lost. Jupyter Notebooks have a strong position in the Data Science community. It is well deserved thanks to their unique set of features. The open-source community developed many extensions to expand the areas where it is useful and minimise their drawbacks. However, these are mainly iterative improvements. Not every issue can be corrected this way. Due to their design decisions, some of the problems with Jupyter Notebooks can only be overcome with better employee training, discipline and the right culture. While changing processes and tools is easy, changing culture is almost impossible.

Do you agree or disagree with the statements included in this essay? Please let me know in the comments!