Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Interviews with scientists: How ‘big data’ is transforming biology Teaching Notes Introduction and context The human genome project. High-throughput genotyping. Big data’s in the news – but what are the implications for the world of biology? In this video, expert Professor David Salt of the University of Aberdeen talks about his views. Teacher summary The ability to collect data at a very high rate allows very large data sets to be generated. These very large data sets are called ‘Big data’. Big data provides researchers with new opportunities to develop new tools to mine data sets that were not previously available. These very large data sets must be mined with a clear hypothesis which will be tested against the data set. This means that rather than generating an experiment to test the hypothesis, researchers can go straight to the big data to test the hypothesis. There are many types of data sets that are described as ‘Big Data’. DNA sequencing produces big data sets. The genome of Arabidopsis thaliana was sequenced in 2000. The human genome and many other genomes have been sequenced since then at an ever-increasing rate, thanks to new technologies such as next-generation sequencing, giving us the Arabidopsis thaliana genome in less than a day. There are many other types of instrumentation that will generate large data sets, for example proteomic sets generated by advances in mass spectroscopy, metabolic data sets, transcript data sets and ionomics. As instrumentation develops, more measurements can be made in more samples, more quickly and also more cheaply. If a hypothesis is to be tested against a large data set, the data set has to be manipulated using a computer programme that won’t be overwhelmed by the large quantity of data. These aren’t complicated computer programmes and researchers who can write their own computer programmes can increase the efficiency with which they can mine specific data sets. Plant science has made tremendous strides using genetic model systems such as Arabidopsis thaliana, where lab-based research has allowed genes to be deleted in order to find out what those genes do. However, what is now being realised is that what you learn about gene function by deleting genes in the lab isn’t necessarily the same as what those genes might do in the environment. More particularly, when you study one particular genetic model, then what you are looking at is a small snap-shot of the genetic variation that is actually present within that species, so you are missing a huge amount of information. One of the big opportunities now is actually translating what we’ve started to learn, using molecular genetics tools (to generate big data), into the ecological function of genes. Professor Salt thinks that once we begin to understand the ecological function of genes, that information can help us to decide how to adapt our crops to particular ecological situations like high salinity, drought etc. A fusion of ecological and genomic research is going to become more important as we are facing a changing climate and the current crops that we have are not going to be adapted to the early frosts, late rains or high winds that a changing climate may bring. Science & Plants for Schools: www.saps.org.uk Interviews with scientists – wheat genome and yield: p. 1 Questions 1. What is ‘big data’ Very large data sets. 2. What is ‘data mining’? Sorting through large data sets to identify patterns and establish relationships. 3. What is next generation sequencing? Next-generation sequencing refers to non-Sanger-based high-throughput DNA sequencing technologies. Millions or billions of DNA strands can be sequenced in parallel, yielding substantially more throughput and minimizing the need for the fragmentcloning methods that are often used in Sanger sequencing of genomes. (Definition from: http://www.nature.com/subjects/next-generation-sequencing). 4. Define the term proteomics. Proteomics is the large-scale study of proteomes. A proteome is a set of proteins produced in an organism, system, or biological context. (From: https://www.ebi.ac.uk/training/online/course/proteomics-introduction-ebi-resources/whatproteomics) 5. Define the term metabolomics Metabolomics is the large-scale study of small molecules, commonly known as metabolites, within cells, biofluids, tissues or organisms. (From: https://www.ebi.ac.uk/training/online/course/introduction-metabolomics/whatmetabolomics) 6. Define the term ionomics Ionomics, the study of the ionome, involves the quantitative and simultaneous measurement of the elemental composition of living organisms and changes in this composition in response to physiological stimuli, developmental state, and genetic modifications. (From: https://www.ncbi.nlm.nih.gov/pubmed/18251712) 7. What information is found in a transcript data set? Information about all the expressed genes in an organism. 8. Why should researchers in the biological sciences “know about computing?” So that they can write computer programmes that will allow them to manipulate big data sets when testing their hypotheses against the data. 9. Why might lab-based studies into the function of a particular plant gene be unreliable models for the role of that gene in the environment? In the lab, the population of plants that you are experimenting with may represent a small snap-shot of the genetic variation that is actually present within that species, so you are missing a huge amount of information. 10. Why does Professor Salt think that a fusion of ecological and genomic research is going to become important in the future? So that we can begin to understand the ecological function of genes, which can help us to decide how to adapt our crops to particular ecological situations like high salinity, drought etc. We are facing a changing climate and the current crops that we have are not going to be adapted to the early frosts, late rains or high winds etc. that climate change may bring. Science & Plants for Schools: www.saps.org.uk Interviews with scientists – wheat genome and yield: p. 2