Download It`s in the genes – data storage turns to DNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA repair wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

DNA sequencing wikipedia , lookup

DNA barcoding wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

Replisome wikipedia , lookup

Molecular evolution wikipedia , lookup

Maurice Wilkins wikipedia , lookup

DNA vaccination wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Molecular cloning wikipedia , lookup

Non-coding DNA wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Community fingerprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
FRONTIER RESEARCH
It’s in the genes – data storage
turns to DNA
22 April 2013
KEY THEME: INNOVATION UNION
by Deborah Forsyth
Nick Goldman is a scientist at the European Bioinformatics Institute at Cambridge in the UK © EMBL
‘It’s in your genes!’ How often have you been reminded by friends or relatives that you look the
way you do because of the genetic code stored in your DNA? But next time you hear this
expression used, you might stop to wonder what else could be stored in those genes.
According to the latest research to come out of the Cambridge-based European Bioinformatics Institute
(EBI), DNA is capable of more than just storing genetic information alone: it also has the potential to
store massive volumes of man-made data.
The research is now getting EU funding that could go towards refining the technique so that it could be
scaled up to store all of the data that exists on Earth – estimated to be three zettabytes, or 3 000 billion
billion bytes – which, for those who don’t think in ‘bytes’, is roughly equivalent to a pile of 750 billion
DVDs.
In the future, a cup of DNA could store 100 million hours of video.
Storing information in a miniscule form that cuts down on space and does away with the need for
energy guzzling and costly hard disks would be a timely innovation in the digital age. As more and
more data is generated, the need for economical and durable forms of data storage also rises.
It was this pressing issue that prompted the key authors of the EBI research project, Nick Goldman
and EBI Associate Director Ewan Birney, to act.
1
‘At the Institute, we share biological data with other scientists to improve their insights into life,’ said
Goldman. ‘We add value to it and send it back into the research community via the Internet. But we
realised that, as the volume of biological data we receive grows exponentially, our budget to handle and
store it does not. Disks are expensive. We needed to find a way of storing large volumes of data in a
small space, cheaply – and ensure that it could be retrieved efficiently.’
The pair hit upon their approach to resolving the problem three years ago. ‘Ewan and I were chatting
one evening after a work conference in Hamburg. We were joking about, thrashing out ideas for
alternative data storage methods,’ said Goldman. ‘And then, after we’d batted a few ideas back and
forth, we just turned to each other and said, “How about using DNA?”’
Much of the funding for such research at the non-profit EBI comes from the European Union, under the
Directorate-General Research & Innovation’s Sixth and Seventh Framework Programmes. In 2012, the
Institute received EUR 7.3 million from the European Commission.
Before they started, Goldman and Birney put together a project research team at the EBI, which forms
part of the EU-wide European Molecular Biology Laboratory (EMBL). They also enlisted another actor –
Agilent Technologies, a California-based biomedical technologies company with expertise in writing
DNA – to complete the research network. ‘Agilent saw it as a challenge and a fun piece of research,’
says Goldman. ‘They provided the required DNA samples to us for free.’
Shall I compare thee to a DNA?
‘We already know that DNA is a robust way to
store information because we can extract it from
bones of woolly mammoths, which date back tens
of thousands of years, and still make sense of it. It
is also incredibly small, dense and does not need
any power for storage, so shipping and keeping it is
easy,’ Goldman said.
The experiment to see if they could actually use
DNA to store information took place in three stages:
1. First up were the EBI team. ‘Our role was to
invent a DNA code into which digital information
could be translated,’ said Goldman.
Typically, a file on a computer hard disk is stored in
binary code, comprising zeros and ones. The
computer ‘knows’ the rules of the code and
translates the information it receives accordingly. It
was up to the EBI team to rewrite the binary code
into a DNA sequence on a computer file.
‘We already know
that DNA is a robust
way to store
information because
we can extract it
from bones of woolly
mammoths, which
date back tens of
thousands of years,
and still make sense
of it.'
Nick Goldman, European
Bioinformatics Institute,
Cambridge, UK
The coding system of DNA – or deoxyribonucleic
acid – is built on four nitrogen bases, identified by
the letters A (adenine), C (cytosine), G (guanine)
and T (thymine). The trick was to write a DNA sequence where the same letters were never repeated.
One way of decreasing the risk of errors was to write only short strings of DNA.
‘We figured, let’s break up the code into lots of overlapping fragments going in both directions, with
indexing information showing where each fragment belongs in the overall code, and make a coding
scheme that doesn’t allow repeats. That way, you would have to have the same error on four different
fragments for it to fail – and that would be very rare,’ Birney said.
2. Once they had their DNA sequence design in place, they used it to encode an MP3 clip of Martin
Luther King’s famous ‘I have a dream’ speech, a photo of the EMBL-EBI lab, an image of the famous
DNA double helix structure as identified by James Watson and Francis Crick in 1953, and a text file of
all 154 of Shakespeare’s sonnets.
The encoded computer files were flown to Dr Emily Leproust of Agilent Technologies in California. ‘We
downloaded the files from the web and used them to synthesise hundreds of thousands of pieces of
DNA. The result looks like a tiny speck of dust,’ Leproust said.
2
During the synthesis process, Agilent manufactured DNA that matched the DNA sequence sent to
them by the EBI. Using technology that is a bit like an inkjet printer, they fired the encoded DNA in the
form of miniscule droplets onto a microscope’s glass slide. The fluid was then freeze dried and the
resulting speck of dust containing 739 kilobytes of data was flown back to Cambridge.
3. Reconstituted in water, the substance was shipped on to the EMBL’s Heidelberg office in Germany,
where it was read back by sequencing machinery and the digital information reconstructed with 100
percent accuracy, the researchers said.
The EBI exists in large part thanks to funds received from the EMBL’s 20 member states, but Goldman
sees EU funding as playing a vital indirect role in expanding its work. ‘In this research project, for
example, we really benefited from being able to call on team members whose skills had been honed on
schemes funded by the EU and who could assist in data analysis and data modelling. Sometimes, of
course, the EBI gains essential hardware through funding, but here it was the EU’s “investment in
people” that counted for us.’
More of a long-term thing
So, do the results of their research mean the end of the hard disk? Not quite yet. At the moment, the
team sees its main application as storing information that needs to be archived for a long period of time
and accessed on an infrequent basis.
‘From a cost point of view, DNA data storage really comes in to its own over the long term,’ says
Goldman. ‘The one-off cost for DNA sequencing is still very high. But once that expenditure has been
made, it becomes a very cheap way of archiving information. With DNA, maintenance costs are
minimal as the cost of endlessly retransferring information from one outdated medium to another – such
as video tape to CD – can be dispensed with. It costs virtually nothing to store and, unlike video tape
which degrades rapidly with time, lasts thousands of years.’
People will start using DNA to store data within the next 50 years, Goldman believes, as the cost of
DNA sequencing goes down.
‘Right now, I could see it as providing an excellent way of storing data that is now held on magnetic
tapes – it’s not impossible to imagine that those vast dusty archives of tapes, whose corridors are
currently patrolled by data retrieval robots, could be done away with once and for all with our method.’
More info
European Bioinformatics Institute
European Molecular Biology Laboratory
Agilent Technologies
3