Insights from a Conversation with a Geoscientist
Press enter or click to view image in full size
Introduction
Electronic Lab Notebooks are commonly used in labs to help track the steps in a research project. They are a valuable evolution of traditional pen and paper, but they have some limitations. My team and I talked with Gunnar Pruß, a laboratory engineer at GFZ Helmholtz Centre for Geosciences about how more general purpose workflow solutions like our open source project SpiffWorkflow could help address some of the current limitations in existing lab notebooks.
This article summarizes my understanding of the role workflows play in modern research and how BPMN (Business Process Model and Notation) tools can help remediate some of these issues. We also talk about our ongoing efforts to improve our project and how a new user interface and reporting system might be beneficial to Gunnar’s work. Many of the thoughts here are mine but I’ll quote Gunnar from time to time, and I’m deeply grateful for this opportunity to learn from him.
The Limitations of Lab Notebooks
There are many ELN (Electronic Lab Notebooks) on the market. These systems tend to focus on a specific type of lab (pharmaceuticals, biotechnology, etc.) and are somewhat prone to assumptions or opinions about how those labs are structured. Assumptions about workflows are common in many software applications. These systems are sometimes referred to as “opinionated” software.
On the plus side, they are easy to use and understand. On the down side, you may have to give up some of your own hard-won practices to adopt them.
There is also a strong tendency for ELNs (and other opinionated software) to solve a small part of the overall process. They focus on what happens in a single part of the lab or a small set of equipment, so, in essence, these ELNs never tell the full story. As Gunnar explains, “There is not an ELN that will bridge the logistics of field work to the process of preparing those samples, and the actual analysis.”
There is a lack of tools for managing the communication and collaboration between the teams responsible for each of these areas. Inter-team communication is difficult and adopting common systems across teams is also difficult. It is for these reasons that I think software vendors have avoided tackling these issues.
BPMN tools can help create a shared common understanding of a process even as it stretches over teams and between disparate systems, methodologies, and problem sets. It’s a simple, high-level, common language that improves communication and sets a higher standard for working together efficiently.
Transparency and Reproducibility
“If you torture data long enough, it will confess to anything” — Ronald H. Coase
This is said so often that it hardly needs repeating here, but there is a crisis in science as we discover so much of our research can not be reproduced. This observation comes more from my own background in psychology and may be less prevalent in geoscience. But I have heard this expressed as a crisis even by mathematicians — a field I would have assumed immune to the problem.
Just as Lab Notebooks are focused on sub-sets of the overall steps, so too are our team structures and communication. This brings us straight into Conway’s Law, which states: “Any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.” We often do not notice it. Our processes and flows are so ingrained in our day to day patterns of communication we are often not even aware they exist. These differences and ingrained patterns of thinking can diverge heavily even within a single small organization. I cannot say for a fact that this is a part of the reason reproducibility in science is so difficult, but it seems a reasonable hypothesis in that we are trying to replicate a finding while being somewhat ignorant of the original process that was followed to create it. Such blindness to that process is endemic to humans as we are always more aware of what is new and changing; constants evade our attention.
BPMN can help capture and distill the communication structure that existed when a study was initially undertaken. As a general purpose workflow notation it can capture the myriad steps that occur as part of the performance of a study. In this way, people from highly diverse backgrounds and norms can still read and understand what is required to replicate a study. It forces us, to some extent, to examine the mundane steps that happen and the assumptions we make.
All this detail could also add a great deal of distraction. The artist Hans Hoffman reminds us all that we should “Eliminate the unnecessary so that the necessary may speak.” By nesting these diagrams (BPMN allows for sub-processes), you can see as deeply as you want to see — delving into sub-processes if needed. You can design your workflow diagrams to suppress the more trivial steps and draw focus to the core efforts.
In the over-simplified example below, we show how the collection of samples (the first task) provides access to a nested process with more detail. This nesting can happen ad infinitum. In this way, we don’t over communicate, but we provide access to increasingly granular processes that allow those interested to deeply investigate the steps we take to get to our findings.
Press enter or click to view image in full size
As a final note, Gunnar remarked that all research must be funded somehow. Whether it’s public or private, someone is holding you accountable. Diagrams like these make it possible to convey a more complete picture of the effort required to conduct our research, which may well help you win additional funding.
Automating and Optimizing Research Processes
Gunnar asserted that “In a lab, everything you can automate should be automated. For speed. For consistency. For replicability.”
It is not just a pretty picture. The core strength of BPMN is that it is as easy to read by software as it is by people. The lines and arrows between tasks in a diagram are stored in a way that software can quickly parse and follow. The image below is taken from another one of our articles, but it helps illustrate some of what is possible to automate in a BPMN diagram. User tasks can contain forms that human beings can complete to control the workflow. Gateways allow us to define and take alternate paths. It’s possible to talk to other software applications and robotic tools through API calls and to assign work to specific people / groups and departments.
Finally, it’s able to listen to external events: pausing a workflow until notified or allowing a task to be interrupted by another application, and handle that interruption in a pre-defined way.
It is specifically this task that our open source software SpiffWorkflow makes possible. We are an execution engine for these diagrams. Much like a compiler or interpreter for code, SpiffWorkflow runs BPMN diagrams.
Press enter or click to view image in full size
Creating a Good UI for Scientific Workflows
As we continued talking to Gunnar we diverged into our efforts to redesign the UI and how this might impact its use in the lab. These next paragraphs discuss ways we might improve SpiffWorkflow in the future so it is easier to use in this context.
Mobile device support is critical. Many clean labs won’t allow laptops, but cell phones and tablets are often permitted. “Local-first” software — that can handle being disconnected from the internet while in the lab or far out in the field — is also an important feature.
While mobile devices have a limited viewing area, providing access to view the diagram and showing where you are within the workflow is highly valuable. Maintaining this visual connection helps people understand the process and reference why a particular step is necessary. It will also encourage the iterative improvement of that diagram. If you are always aware of what is guiding you, you will start to see where that flow can be improved, and maintain the ability to look ahead — to see what is coming next or 3 or 4 steps down the line, will help people plan ahead — and potentially work ahead if they are waiting for the current step (a chemical process, or automated system) to complete.
Finally, a good interface will allow us to delve into specific steps if needed, while allowing experienced engineers to skip signing off on each step of a process they have performed 100’s of times.
SpiffWorkflow is moving in this direction now. We have a mobile friendly interface, and we have a new diagram editor that can execute workflows without being connected to a “backend” on the internet. There is still some work to make all of this seamless, but it is well within our reach.
Leveraging Analytics without Creating Fear
We’ve spent the last few months building out an Analytics tool for SpiffWorkflow, which, among other things, provides heatmap overlays of the BPMN diagrams. We talked to Gunnar about Analytics tools for workflows in the lab in order to gain a better understanding of how helpful it would be to answer questions like:
- How many processes are running right now?
- Are any processes stuck?
- Where are the “bottlenecks” — the slow places in the workflow that could be refactored?
- What kind of goals should we set for the future?
Gunnar expressed concerns about the dangers of comparing people, which can create unnecessary and unproductive anxiety. Some people may be slower in the lab, but produce more accurate high-value results — some might be highly efficient at certain steps of the process because the thing they are studying isn’t easily affected by those steps. We aren’t certain how to address this concern, but we are thinking about it.
Data of any form can be used and misused. It might cause needless stress, or worse, misguided management decisions. It can be an effective way to create greater empathy and accountability among a team. It is a sharp stick to be used with care.
Gunnar also mentioned that analytics may be a way to reduce costs. He pointed out that reviewing completed processes would help his team establish how much of a sample set should be collected from a site. There are many costs associated with this task. For example, the direct cost of sampling too much is that rocks are heavy and expensive to ship. But there are also indirect costs, such as the need to keep samples for some number of years, and storing and managing larger sample sets is needlessly expensive.
These feedback loops are of high value. And BPMN tools like SpiffWorkflow are an excellent way to measure details and provide feedback for subsequent iterations. Perhaps the best answer, at least for labs, is to keep the analytics impersonal or fully anonymous. And allow the team to solve the problem as a group rather than point fingers at individuals.
Conclusion
Based on Gunnar’s experiences as a Geoscientist, we outlined several ways that adding a BPMN tool like SpiffWorkflow to your lab research can be beneficial. We discussed improving or augmenting lab notebooks in a way that would increase transparency, enable more automation, improve reproducibility, and generate useful metrics that will help you improve your lab over time.
We had a great conversation with our Geoökologie (Geoscientist) friend. It is highly rewarding to build something that someone else finds useful — it’s really the pinnacle of this line of work. I hope we continue to have more conversations with Gunnar and if you are out there using SpiffWorkflow, we’d love to talk to you about how we can make it better for your line of work as well.
If you would like to learn more about SpiffWorkflow, please check out our website at spiffworkflow.org.