1 Minute
Joel Grus’ presentation on why he does not like notebooks sparked a flurry of notebook-related discussion.
I like the idea of notebooks more than I like actual notebooks. I tried to use them in my analyses for a long time, but eventually gave up as there are too many small annoyances (some that the talk goes over, others that it does not, such as the fact that they do not integrate well with git).
Here is how I think they should work instead:
- There is no hidden state. Cells are always run from top to bottom.
- If you change a cell in the middle, you immediately clear its output and all those below and the whole thing is run from the top.
For example:
[1] : Code
Output
[2] : Code
Output
[3] : Code
Output
[4] : Code
Output
[5] : Code
Output
Now, if you edit Cell 3, you would get:
[1] : Code
Output
[2] : Code
Output
[3] : New Code
New Output
[ ] : Code
[ ] : Code
If you want, you can run the whole thing now and get the full output:
[1] : Code
Output
[2] : Code
Output
[3] : New Code
New Output
[4] : Code
New Output
[5] : Code
New Output
This way, the whole notebook is always up to date.
But won’t this be incredibly slow if you always have to run it from the top?
Yes, if you implement it naïvely where the kernel really does always re-run from the top, which is not likely to be usable, but you could do a bit of smart caching and keep some intermediate states alive. It would require some engineering, but I think you could keep a few live kernels in intermediate states to make the experience usable so that if you edit cell number 35, it does not need to go back to the first cell, but maybe there is a cached kernel that has the state of cell 30 and only 31 and onwards would need to be rerun.
It would take a lot of engineering and it may even be impossible with the current structure of jupyter kernels, but, from a human point-of-view, I think this would be a better user experience.
Published by luispedro
Luis Pedro Coelho is a computational biologist at EMBL. View all posts by luispedro
Published