Heroku for Science
dennyluan.tumblr.comI studied biomedical engineering at Hopkins. Before I started there, research was the promised land. I dreamt of spending my time thinking about how to solve critical problems and testing solutions.
What I saw instead were people spending the vast majority of their time pipetting. All the way up the ladder, upto and including postdocs. I sometimes thought our PI had it worse for having to spend most of her time applying for grants.
The AWSification of synbio research would be a game changer. Some labs at Hopkins have tried to build robots but with limited success. Given how cheap labor is at research institutions competing on price will be incredibly difficult.
I also thought research was the promised land. Went to Cornell for undergrad studying biological sciences and was amazed at the research opportunities... but then worked in a lab studying type II diabetes. I pipetted, cleaned beakers, measured out chemicals to prepare solutions, sucked up cell cultures and extracted DNA all day making $8/hour in extreme boredom. There were postdocs with Ph.Ds and loads of experience from prestigious schools doing the same grunt work alongside me, often confessing they were completely miserable and wishing they could start their life over again. They worked 7 days a week and long, long hours each day. I got the hell out of research and am enjoying my life much better now in tech. I also feel like the work I do is a lot more impactful. All of my peers did the same and went to consulting, finance or tech. In my view the basic research field is in total crisis...
So depressing. This (and the parent comment) is the reason I didn't go into research -- it's a life of pipetting and manual labour that no-one's interested in automating, either because it's too complex or because labour is so cheap that there's no financial incentive to do so.
I'm happy to leave someone else to do that. I'd rather be in a job I actually enjoy the day-to-day of.
And that's to say nothing of the problems of PhDs: namely that there are ten times more PhD positions than there are postdoc positions. That ten-to-one crunch when it comes to finding a job sure does sound fun...
With the utmost respect to you and the post you responded to, research is about answering questions to things you find interesting; for a biologist, pipetting is simply the means you take to get there.
If you want to contribute to Firefox or any other non-trivial open source project, you need to spend time creating a development environment and it likely will take weeks to months before you can make a substantive contribution.
If anyone is reading the comment I'm responding to or its parent comment, keep in mind that the manual labor is in pursuit of a goal.
I think your parent is alluding to the fact things could be automated in biological research but aren't because of the disincentives; and that at the same time things in tech are more amenable to automation and often parts of it are indeed automated.
Yes, it's all a means to an end, but how much time one wants to spend in the "means" (which can get extremely repetitive, apparently) is what counts for the parent (I'm supposing).
This is modern molecular science for you - you spend years getting really good at pipetting and when you are your peak you get shunted into writing grants :)
Hi! This is the Yelling Math Fairy and THE WORD EXPONENTIAL DOES NOT MEAN MORE. IT IT IS A MATH WORD. IT MEANS e^kx. IT MEANS THE SOLUTION OF y' = ky. DOES EACH NEW ASSAY DOUBLE THE TOTAL AMOUNT OF WORK? NO IT DOES NOT. THE ADDED OVERHEAD FOR EACH ASSAY IS LINEAR GROWTH. NOT EXPONENTIAL GROWTH. I KNOW IT SEEMS LIKE A LOT TO YOU BUT THAT DOES NOT MAKE IT EXPONENTIAL. EXPONENTIAL IS NOT A SYNONYM FOR BIG AND EXPONENTIAL GROWTH IS NOT A SYNONYM FOR FAST. THANK YOU.
Keep fighting the good fight, Eliezer.
To add on: Some words have valuable and specific meanings! We don't have a good substitute word for "exponential" that means the same thing. Please make an effort not to do this!
Sorry, but in this case, it's actually N^x. Even running PCR and gel analysis requires steps where the time is dependent on the number of samples (e.g. pipetting, walking to the centrifuge, making multiple gels)
Generally speaking, running double the number of samples requires approximately double the amount of work, maybe a little less since you're doing it in bulk.
Are you really saying that going from 10 samples to 11 samples causes a doubling (or 1.2x-ing or tripling or whatever) of the time/work required? That's what exponential means.
I believe this startup is as close as it gets (for now!) to what you're describing, https://www.transcriptic.com/.
(I don't work there or anything)
There's been quite a few efforts in this. See Transcriptic and Emerald Therapeutics (http://www.emeraldcloudlab.com/), while there's the more traditional suppliers for things like short oligos or expression vectors (https://www.dna20.com/ and http://www.idt.com/).
I think there's also been a lot of independent academic attempts at this (see: http://klavinslab.org/ which is CS/BioE at UWash), but all kind of waded around in the shallow water.
The reason why I think this is compelling because I think almost every synthetic biologist has an existing workflow. It's basically design using some sort of CAD software, order from IDT, receive materials next day, run test by hand, ship to Genewiz for sequencing, etc. That's just one example of a workflow involving 4-5 specialized 'steps'. As the steps get cheaper/faster/better, consolidating and automating this is just a no brainer.
Emerald doesn't really exist yet, and I believe they misrepresent their automation. They've posted a very pretty site with a bunch of mockups. They have a set of workflows that work for their internal antiviral research, but "Heroku for Science" is a completely different game.
Transcriptic, on the other hand, started taking orders six months ago and has customers at Stanford, Caltech, Harvard, and more.
There are probably others much more familiar with cloud infrastructure who can chime in, but the AWS of science and the Heroku of science are two very different challenges, and I feel the analogies probably cross over pretty well.
Definitely having the infrastructure 'warehouse' layer that Transcriptic is building (with a real API! wow!) will be valuable. And like you hint at, power users won't need hand-holding, but 99% of the market of users will. That's where packaging, ease of use, and limited configuration seem to be the difference maker (Heroku starting exclusively with Rails).
I'm a biomedical engineering graduate student at a research university. While I haven't investigated Transcriptic's pricing in depth, based on a conversation with one of their employees about collecting some basic growth curves (https://www.transcriptic.com/guides/3-growth-curves.html), their prices were simply far too high (even with the steep discount they offered) for repetitively collecting a large amount of data - precisely the situation in which you'd want to use something like Transcriptic. This is especially true for labs (such as mine) that have already made the upfront capital investment for the equipment used to take these measurements. As for the repetitive labor, you can usually find undergrads to do that for far less (often completely free) than what Transcriptic charges.
However, a service like Transcriptic may make sense if (a) you're in a company (no free undergrad labor, though summer interns may be a suitable alternative) or (b) you don't already have the equipment and just want to do a one-off collection of a large amount of data. Also, maybe prices will significantly drop as Transcriptic scales up and streamlines their operations. I'll definitely be checking back in the coming years to see if they ever reach the point where it makes sense to use their services.
Hey! Founder of Transcriptic here. This is exactly what we are. We're growing quickly and have customers at over a dozen academic institutions now. One of our key issues right now is that biologists aren't programmers and so we're doing a lot of hand-holding - we'd get really excited about someone building higher level tools (Heroku to our AWS) on top of us for specific domains that you know better than we do.
If anyone in this thread thinks this is an interesting topic I'm easy to reach at max@transcriptic.com.
Hi Max, I took a look at your jobs page and my heart sank at this: https://jobs.lever.co/transcriptic/e1cfcb93-05d8-4026-8f70-3...
The first two bullet points there are like the two biggest red flags possible in an ops job post. It reads as a development team that has built a fragile and unreliable system and is looking for a superman to dump it on.
It will matter much more if your VP of Engineering position can capacity plan than it will matter if your operations position can code. No amount of ops rockstars can fight a (larger) dev team that won't design with real world workload capacity and reliability as not just a concern but a focus.
Hey, sorry that it came across that way. The ops position is definitely not about looking for a superman to dump a buggy system on. Issues are root-caused and it's a point of culture that we don't see the same bug twice.
Another Transcriptic just Slacked everyone your comment here which has prompted a discussion about what we're really looking for in an "ops" person. The "exceptional coding skills" bullet is in almost all of our engineering job posting, and we thought such skills would apply to really good "devops" people, too; maybe this is wrong and asking for the wrong skill set. (The SREs I know at Google are all really good developers.)
Being an "on-call position" is a side effect of our volume and the fact that cells don't stop dividing at 8pm. Depending on when projects get started we often end up running reactions all the time, and so yes there is a (metaphorical) pager involved. Even minor failures here are very time sensitive due to the biological nature, and lost samples can be extremely costly (and devastating to our reputation with customers). I think this ops role is more about setting up the processes rather than being the only person (people) to respond to issues.
We'll be reflecting on that job description and update the posting.
Cool thanks for taking the feedback. And careful with the "because google" line of reasoning, whats right for them is very often not right for non-drowning-in-cash/monopoly-holders.
The problem with this fantasy is that easily automated and distributed tasks are not the rate-limiting steps in most biomedical research. The hard parts (in addition to designing the right experiments and analyzing data..) are in constructing and validating relevant model systems and doing the specific experiments to address questions of interest.
These are extremely dependent on the question being studied and often are not amenable to automation, and may require very rare, expensive, and difficult-to-handle samples. For example, my collaborators work with transgenic mice that are a model for a particular disease, and these mice have to be bred then aged to 12 weeks until they exhibit the phenotype before we can even start doing an experiment. In another model, they have to do brain surgery on each mouse and then wait several weeks for the phenotype.
The 'easy' parts, such as DNA synthesis and sequencing, are already highly standardized and automated, and there is fierce competition to improve the technology and bring costs down.
This is certainly the bottleneck in my research. I ran thousands of core years of computer simulation in the first year of my PhD. I have all the data I need to write a PhD thesis but I'm still years from graduating due to the aforementioned bottlenecks. An arsenal of software I've written in numpy/scipy/pandas saves time, but only goes so far when you're trying to carve out stories from your data to write papers.
A big problem is that scientists are traditionally very secretive. This would increase the possibility of leaks, and there'd need to be some way of assuring that the experiment was conducted correctly. Good idea, though.
If this was available convenience would trump security for many people.
I just have experience working in 3 different labs in Germany and I can tell you: secrecy is the holy grail and just using privat repos (for code not data) on github/sourcefore is forbidden. But this is perhaps just small sample size and a german habit ;)
Most worthwhile research is about the mundane. One of the first research projects I did required painstakingly adjusting and modifying conditions to the point that I could actually start collecting data. That process took weeks, but the day it worked was insanely satisfying. In the process I became a master at making small incremental changes, recording them, and learning exactly what didn't work. Years later, as a computational scientist, the process was much the same, except that there were no pipettes and beakers involved.
Any worthwhile work I have ever done has mostly been about grunt work. Along the way there have been cool things (after all Leno made fun of our research [1] once) and insanely fun times. I may not be in research now, but every day I apply the lessons learned from patiently repeating and iterating.
1. http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=723226...
I've taken a very tiny first step towards something like this for computer graphics and vision: trying to make user studies as easy as possible: http://www.imcompadre.com It's not a ready product by any means, but two paper submissions have been made with it so far.
It's a difficult problem to solve, because these pesky researchers are always trying out new things that you didn't anticipate - who would've thought! But still, for the mundane things that can be automated, something like this is definitely the way to go. Of course, as other people here point out, figuring out what to actually test is always the hardest part.
In our lab today we are consistently dealing with the opposite problem. The experiments themselves are easy in comparison with the design and analysis.
The startup equivalent would be "it's easy to build the app, it's hard to get users". Not all apps can yield users, but lowering the bar to launch will certainly yield more opportunities for successful apps.
You are right. I think we would all appreciate making more stuff that we do automatable and outsourceable. I guess where I was trying to go with the comment is that it would be nice if there were similar tools for design and analysis. Oh automated paper writing would also be much appreciated.
Experiment is working on that. :)
The Center for Open Science poses itself as something similar to what you describe. http://centerforopenscience.org/
I believe they call themselves more of the Github of Science for scientific collaboration. Adding hooks to 'push' the tasks and 'checkout' the findings could be maybe extensible on their platform.
Experimenting with a simulation is a great time saver as far as it goes, however, all models are just models, subject to the assumptions that went into them.
There is a great deal we do not know about cellular biology. Any simulation would be a fairly gross approximation. The point of many experiments is to further our understanding of the model of cellular mechanics.
From my experience (having done a similar project as an undergrad) is that the first problem is convincing people to switch and take the risk/time to use your new workflow, even if your workflow allows them to continue to use their existing infrastructure / machines.
DIYbio has already started on this: Experimental Robot for $4k: http://www.opentrons.com/
There could be some synergies with (yc12) science exchange