Download How big data is transforming biology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Molecular evolution wikipedia , lookup

Exome sequencing wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Interviews with scientists:
How ‘big data’ is transforming biology
Teaching Notes
Introduction and context
The human genome project. High-throughput genotyping. Big data’s in the news – but what are
the implications for the world of biology? In this video, expert Professor David Salt of the
University of Aberdeen talks about his views.
Teacher summary
The ability to collect data at a very high rate allows very large data sets to be generated. These
very large data sets are called ‘Big data’. Big data provides researchers with new opportunities to
develop new tools to mine data sets that were not previously available. These very large data
sets must be mined with a clear hypothesis which will be tested against the data set. This means
that rather than generating an experiment to test the hypothesis, researchers can go straight to
the big data to test the hypothesis.
There are many types of data sets that are described as ‘Big Data’. DNA sequencing produces
big data sets. The genome of Arabidopsis thaliana was sequenced in 2000. The human genome
and many other genomes have been sequenced since then at an ever-increasing rate, thanks to
new technologies such as next-generation sequencing, giving us the Arabidopsis thaliana
genome in less than a day.
There are many other types of instrumentation that will generate large data sets, for example
proteomic sets generated by advances in mass spectroscopy, metabolic data sets, transcript data
sets and ionomics. As instrumentation develops, more measurements can be made in more
samples, more quickly and also more cheaply.
If a hypothesis is to be tested against a large data set, the data set has to be manipulated using a
computer programme that won’t be overwhelmed by the large quantity of data. These aren’t
complicated computer programmes and researchers who can write their own computer
programmes can increase the efficiency with which they can mine specific data sets.
Plant science has made tremendous strides using genetic model systems such as Arabidopsis
thaliana, where lab-based research has allowed genes to be deleted in order to find out what
those genes do. However, what is now being realised is that what you learn about gene function
by deleting genes in the lab isn’t necessarily the same as what those genes might do in the
environment. More particularly, when you study one particular genetic model, then what you are
looking at is a small snap-shot of the genetic variation that is actually present within that species,
so you are missing a huge amount of information.
One of the big opportunities now is actually translating what we’ve started to learn, using
molecular genetics tools (to generate big data), into the ecological function of genes. Professor
Salt thinks that once we begin to understand the ecological function of genes, that information
can help us to decide how to adapt our crops to particular ecological situations like high salinity,
drought etc. A fusion of ecological and genomic research is going to become more important as
we are facing a changing climate and the current crops that we have are not going to be adapted
to the early frosts, late rains or high winds that a changing climate may bring.
Science & Plants for Schools: www.saps.org.uk
Interviews with scientists – wheat genome and yield: p. 1
Questions
1. What is ‘big data’
Very large data sets.
2. What is ‘data mining’?
Sorting through large data sets to identify patterns and establish relationships.
3. What is next generation sequencing?
Next-generation sequencing refers to non-Sanger-based high-throughput DNA
sequencing technologies. Millions or billions of DNA strands can be sequenced in
parallel, yielding substantially more throughput and minimizing the need for the fragmentcloning methods that are often used in Sanger sequencing of genomes. (Definition from:
http://www.nature.com/subjects/next-generation-sequencing).
4. Define the term proteomics.
Proteomics is the large-scale study of proteomes. A proteome is a set of proteins
produced in an organism, system, or biological context. (From:
https://www.ebi.ac.uk/training/online/course/proteomics-introduction-ebi-resources/whatproteomics)
5. Define the term metabolomics
Metabolomics is the large-scale study of small molecules, commonly known as
metabolites, within cells, biofluids, tissues or organisms. (From:
https://www.ebi.ac.uk/training/online/course/introduction-metabolomics/whatmetabolomics)
6. Define the term ionomics
Ionomics, the study of the ionome, involves the quantitative and simultaneous
measurement of the elemental composition of living organisms and changes in this
composition in response to physiological stimuli, developmental state, and genetic
modifications. (From: https://www.ncbi.nlm.nih.gov/pubmed/18251712)
7. What information is found in a transcript data set?
Information about all the expressed genes in an organism.
8. Why should researchers in the biological sciences “know about computing?”
So that they can write computer programmes that will allow them to manipulate big data
sets when testing their hypotheses against the data.
9. Why might lab-based studies into the function of a particular plant gene be unreliable
models for the role of that gene in the environment?
In the lab, the population of plants that you are experimenting with may represent a small
snap-shot of the genetic variation that is actually present within that species, so you are
missing a huge amount of information.
10. Why does Professor Salt think that a fusion of ecological and genomic research is going
to become important in the future?
So that we can begin to understand the ecological function of genes, which can help us
to decide how to adapt our crops to particular ecological situations like high salinity,
drought etc. We are facing a changing climate and the current crops that we have are not
going to be adapted to the early frosts, late rains or high winds etc. that climate change
may bring.
Science & Plants for Schools: www.saps.org.uk
Interviews with scientists – wheat genome and yield: p. 2