* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download iPlant Pods - iPlant Collaborative
Survey
Document related concepts
Deoxyribozyme wikipedia , lookup
Designer baby wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Pathogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Non-coding DNA wikipedia , lookup
Microevolution wikipedia , lookup
Genomic library wikipedia , lookup
History of genetic engineering wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome evolution wikipedia , lookup
Genome editing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Transcript
iPlant Collaborative Powering a New Biology David Micklos, Cold Spring Harbor Laboratory My own suspicion is that the universe is not only queerer than we suppose, but queerer than we can suppose. J.B.S. Haldane, Possible Worlds and Other Essays (1927) The Egalitarian Gene Agarose Gel Electrophoresis, 1973 1958 Matt Meselson & Ultracentrifuge, $500,000 1973 Sharp, Sambrook, Sugden Gel Electrophoresis Chamber, $250 The Egalitarian Genome Next Generation Sequencing, 2005 Bacterial colonies Hundreds of millions of… PCR colonies (clusters, features) Su, Andrew (2013): Cumulative sequenced genomes. figshare. http://dx.doi.org/10.6084/ m9.figshare.722952 PacBio Sequencer 14x coverage of Rice IR6 Read Length (nucleotides) 9 The Egalitarian Genome Next Generation Sequencing 2014 2003: ABI 3730 Sequencer 2014 Oxford Nanopore MiniION Human Genome: $2.7 Billion, 13 Years Human Genome: $900, 6 Hours The Big Data Problem Storage and Analysis “BGI, based in China, is the world ’ s largest genomics research institute, with 167 DNA sequencers producing the equivalent of 2,000 human genomes a day. BGI churns out so much data that it often cannot transmit its results to clients or collaborators over the Internet or other communications lines because that would take weeks. Instead, it sends computer disks containing the data, via FedEx.” Biology’s Other Big Data Phenomics Visualization The useful lifetime of our analysis toolchains is now 6 months -Matthew Trunnel, Broad Institute • Requires a platform that can support diverse and constantly evolving needs. • Cyberinfrastructure is the platform for a biological “App Store” that allows scientists to run tools and workflows they need. Paradigm Shift data limited > data unlimited world • Hypotheses underdetermined by data > data underdetermined by hypotheses • Reductive biology > constructive biology iPlant Collaborative A 10 year NSF project to develop a computer infrastructure to apply computational thinking to solve biological problems • Virtual organization • High performance computing • Data and data analysis • Learning and workforce iPlant Collaborative A virtual organization UPDATE!!!! CSHL UA TACC High Performance Computing Texas Advanced Computing Center (TACC) • 2 of the three largest parallel computers in the XSEDE (formerly TeraGrid) System • 500,000 compute Cores • New Intel MIC (Many Integrated Core) chips contain 61 cores! • Up to 1TB shared memory Dan Stanzione, Acting Director iPlant Collaborative Ways to Access iPlant • iData Store: All data large and small • Atmosphere: For virtual hosting of web apps, sites, databases. • Discovery Environment: Integrated Web apps. • MyPlant: Social Networking. • DNASubway: Annotation and more • Standalone Apps: TNRS, TreeViewer, PhytoBisque, etc • The API: for programmers embedding iPlant capabilities • Command line for experts (thru TeraGrid/XSEDE) Data Store Texas Advanced Computing Center Dan Stanzione: “We hit a billion files about a year ago, so when people ask us what we’re going to do about a billion files. The answer is we’re going to do this.” 100,000 Terabytes of disk and tape. Data Store moves > 2 GB files with ease Atmosphere Cloud Computing for Biology • Handle those big data • Analogous to Amazon Elastic Compute Cloud (EC2) • Default virtual machine (VM) has 6 CPUs with 16 GB of RAM compared to desktop or laptop 1-2 CPUs with 1-4 GB RAM • Up to 16 CPU/32G RAM VM can be assigned on request • Co-localize with your data from the iPlant Data Store • Configure machine, data transformation to share with collaborators or with use case for students. Discovery Environment A rich web client •Free App store for Bio research •Consistent interface to a range of bioinformatics tools •Integrated, extensible system of applications and services •Add tools, build custom workflows • Other major projects are beginning to adopt the iPlant CI as their underlying infrastructure (some completely, some in limited ways): • CoGe (auth service, hosting) • BioExtract (web service platform) • CiPRES (computation) • Gates Integrated Breeding Platform (hosting, development) • Galaxy (storage, for now) The Biology App Store iPlant APIs Resources iPlant Audiences: Converge on the Middle Ground • Expert: bioinformaticians and computational biologists • Intermediate: bright biological researchers who need to solve problems – but who aren’t bioinformaticians or don’t know one down the hall • Novice: high school and college faculty engaged primarily in teaching Educational Challenge For the first time in the history of biology students can work with the same data at the same time and with the same tools as research scientists. Research Education Context of scientific discovery Insights from Genomics in Education Washington University, June 16-19, 2009 44 participants from three worlds and three kingdoms • Bioinformatics: Students have limited patience for pure computer work and want a wet bench hook. • Student-scientists partnerships: Someone has to care about the data generated by students. • Students as co-investigators: Projects should potentially lead to publication. • Scale: Need to move from individual classroom experiments to distributed projects. Walk or… …ride an educational Discovery Environment DNA Subway an agnostic education and research tool • • • • Simplified bioinformatics workflows Developed with 25 collaborators at 11 institutions Since March 2010 launch: 7,510 registered users Red Line: predict and annotate genes in <150 kb (2,670 projects in last six months) • Yellow Line: identify homologs in sequenced genomes • Blue Line: analyze DNA barcodes and build gene trees (7,700 projects in last six months) • Green Line: align and analyze RNA-seq data (beta) DNA Subway an educational Discovery Environment • Developed in parallel with Discovery Environment • Simplified workflow for gene discovery, annotation, and comparison • 25 collaborators at 11 institutions • Since March 2010 launch: 4,218 registered users 69,660 visits, 31,587 unique visits DNA Subway Concepts (Big Ideas) • • • • • • Genomes are complex and dynamic (queer). DNA sequence is information. DNA sequence is biological identity. Gene annotation adds meaning to DNA sequence. Concept of gene continues to evolve. A genome is more than genes. DNA Subway Producers Uwe Hilgert David Micklos Jason Williams Designers Eun-Sook Jeong Susan Lauter Programmers Cornel Ghiban Mohammed Khalfan Sheldon McKay Contributors Matt Vaughn Rion Dooley Anthony Biondo Jim Burnette Scott Cain Ed Lee Zhenyuan Lu Advisors Matt Conte Carson Holt Bruce Nash Oscar Pineda-Catalan