How DNA could store all the world’s data

57 points by lzlarryli 9 years ago · 26 comments

Reader

08-15 9 years ago

The real problem isn't storing the data, it's accessing it. There is no way to address DNA, you can only "shotgun sequence" it. In doing so, you get random fragments of around 200 bases (400 bits). You can't get one such fragment, you get half a billion in one go, currently at a cost of around $5000. (Older, much more expensive technology, got up to 1000 bases... sometimes, and only 100 fragments per machine run.) So how are you going to access your archive? By sequencing the whole thing and (temporarily) storing it on a hard drive?

The manufacturers of modern sequencers (both Illumina and ABI) have been talking about this for at least 7 years (i.e. as long as they've been selling high throughput sequencers). They actually made a weaker claim: According to them, it makes no sense to keep a sequenced genome, because just sequencing it again would be cheaper than storing the data. In these 7 years, it hasn't happened. Instead, ABI's SOLiD technology all but vanished. Actually storing data in DNA is one step further, it's not going to happen for a long time.

(Source: My employer does a lot of sequencing. I talked to sales representatives of both companies, and I work on data sequenced using Illumina's machines. We store that data on spinning rust.)

patall 9 years ago

From what I get from my own research, the talk about HGP-write and a few chats with Nick Goldman (who is a very funny guy) himself, the main problem is neither storing nor accessing (which you can improve by probing and is also not that important as a primary application could be archives) but mostly synthesis which is still at minimum $1 per 10 bp.
And sequencing will become even cheaper when you do not do it from a library prep but in a controlled buffer environment. It is just currently not getting cheaper because there is no incentive for Illumina to do so (similar to Intels position in CPUs), lets hope that ONT, BGI and who ever else still hopes to get some market share (Ion Torrent, PacBio ...) can force them to evolve (project firefly, yeah).
- toufka 9 years ago
  
  Synthesis is dropping fast, and will drop even faster in the near future. There are a couple of 'humps' in the demand for synthesis. And plateaus in between. Synthesis between 0 and ~200bp gets you all you need for PCR (copy/paste). But if you can't do ~3000bp, you can't make a full-sized gene. So people get used to PCRing everything. And there is simply no proper demand for anything larger.
  But with a few new players on the block (Twist, Gen9, and a few other smaller/newer startups), the goal is to hit economical ~2-3kb, at which point the race is back on again, and whole new markets will open up. And the moment that happens, expect the price to drop again. Competition will kick back in and everyone's price will drop.
  The size of a moderate plasmid (~5-7,000) is another hurdle, and the size of a small chromosome is another (~100,000).
  Also, if you're ordering DNA in pools or bulk (have a good compression algorithm), you can get the price/bp to come down even more.
dekhn 9 years ago

There are many ways to address DNA for sequencing that doesn't involve shotgunning.

blazespin 9 years ago

I've often thought that if we ever decide to send nano spaceships filled with engineered DNA to populate other planets like spores we should include human knowledge in the DNA so when the spores turn into an advanced civilization they could read the DNA and learn about their progenitors.

catbird 9 years ago

Sounds like the plot to a scifi novel: Scientists discover that so-called junk DNA contains physics equations, along with what appears to be coordinates to a distant star system with an Earthlike planet.
- abecedarius 9 years ago
  
  https://www.fourmilab.ch/documents/sftriple/gpic.html
  - catbird 9 years ago
    
    Oh well, there is nothing new under the sun.
- elorant 9 years ago
  
  Take my money.
dsp1234 9 years ago

Star Trek:TNG did an episode that is basically this[0][1].
[0] - http://memory-alpha.wikia.com/wiki/The_Chase_(episode)
[1] - https://en.wikipedia.org/wiki/The_Chase_(Star_Trek:_The_Next...
jawarner 9 years ago

Pretty cool if there were a SETI equivalent for this on our own DNA.

goldenrules 9 years ago

> The researchers' biggest worry was that DNA synthesis and sequencing made mistakes as often as 1 in every 100 nucleotides. This would render large-scale data storage hopelessly unreliable — unless they could find a workable error-correction scheme. Could they encode bits into base pairs in a way that would allow them to detect and undo the mistakes? “Within the course of an evening,” says Goldman, “we knew that you could.”

how does this work? are the mistakes consistent enough that we can design encodings that rely upon them?

witty_username 9 years ago

FEC (0) basically adds extra information that can be used to fix errors. A simple scheme is to just duplicate all the info (like a backup), but there are much more clever schemes which are much more efficient.
[0] https://en.wikipedia.org/wiki/Forward_error_correction
pronoiac 9 years ago

Check the "making memories" infographic midway down the article. One part is that they transcribe the data four times, another is that they only use three of four bases at a certain point. There's probably more checksums though.

chris_va 9 years ago

Even if you can make it work, DNA stability is poor.

I don't see why you wouldn't use a higher fidelity atomic storage solution.

jon_richards 9 years ago

DNA stability is quite high. To the point where there is actually a movement to get scientists to stop freezing DNA for long term storage because it uses large amounts of energy for no reason.
- chris_va 9 years ago
  
  Well, freezing/thawing will create sheer that destroys the DNA, so I think the reason is different.
patall 9 years ago

You are seriously mistaken. At abient temperature, DNA is very stable. Even in an active environment as a human, it keeps quite stable for about 100 years. Isolated for multiple millenia. It is the reason we can sequence neanderthals etc nowadays. It wont be stable for millions of years but with some redundancy you could easily make it to 100.000
- chris_va 9 years ago
  
  I'm not sure this is correct. Maybe if you completely isolate it from radiation. The human body is constantly repairing DNA damage.
  - sitharus 9 years ago
    
    DNA repair mostly results from transcription errors and biological processes, not from radiation damage.
    DNA in isolate is pretty stable.
    
    chris_va 9 years ago
    
    Thanks

pronoiac 9 years ago

Do we need to keep the DNA away from bacteria? Would it not be digested for nutrients or food? Or is that just propaganda from the salesman pushing memory carbon for my looongterm data storage needs?

gww 9 years ago

DNAse contamination would be a big problem too. It would also be a relatively easy way to "securely" erase your data.
- agumonkey 9 years ago
  
  And what about additional layers of redundancy ?

kyloren 9 years ago

DNA is also compressed in a very spectacular way. I wonder a similar compression can be applied to data.

haloboy777 9 years ago

http://thenextweb.com/insider/2016/04/28/microsoft-turning-d...
Microsoft is already initiated towards this.

Settings

How DNA could store all the world’s data

Keyboard Shortcuts