Download RNA-Seq - iPlant Pods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA vaccination wikipedia , lookup

RNA wikipedia , lookup

Synthetic biology wikipedia , lookup

Genome (book) wikipedia , lookup

Point mutation wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Molecular cloning wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Epigenomics wikipedia , lookup

Oncogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Transposable element wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Pathogenomics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomic library wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA silencing wikipedia , lookup

Non-coding RNA wikipedia , lookup

Human genome wikipedia , lookup

History of RNA biology wikipedia , lookup

Minimal genome wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human Genome Project wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

History of genetic engineering wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Microevolution wikipedia , lookup

Metagenomics wikipedia , lookup

Primary transcript wikipedia , lookup

Genome evolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome editing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
DNA Subway Green Line
Onramp to HPC in Biology Education
Dave Micklos and
Uwe Hilgert
iPlant Collaborative
DNA Learning Center,
Cold Spring Harbor
Laboratory; Bio5 Institute,
University of Arizona
…ride
an educational Discovery Environment
Green Line:
RNA Sequence (RNA-Seq) Analysis
• First fully GUI interface for RNA-Seq analysis — no
command line or data conversions
• Accesses XSEDE system through the iPlant Agave API
• Co-localizes up to 100 GB of data in iPlant Data Store
• Look for differential gene expression in different
tissues, life stages, or treatment
• Generate lists of expressed genes and fold-changes
• Annotate sequenced genomes; add results to Red
Line projects
RNA code
represents “active”
DNA in genome
150 feet
Homo sapiens bitter taste receptor
(TAS2R38) DNA code > RNA code
CCTTTCTGCACTGGGTGGCAACCAGGTCTTTAGATTAGCCAACTAGAGAAGAGAAGTAGAATAGCC
AATTAGAGAAGTGACATCATGTTGACTCTAACTCGCATCCGCACTGTGTCCTATGAAGTCAGGAGT
ACATTTCTGTTCATTTCAGTCCTGGAGTTTGCAGTGGGGTTTCTGACCAATGCCTTCGTTTTCTTG
GTGAATTTTTGGGATGTAGTGAAGAGGCAGGCACTGAGCAACAGTGATTGTGTGCTGCTGTGTCTC
AGCATCAGCCGGCTTTTCCTGCATGGACTGCTGTTCCTGAGTGCTATCCAGCTTACCCACTTCCAG
AAGTTGAGTGAACCACTGAACCACAGCTACCAAGCCATCATCATGCTATGGATGATTGCAAACCAA
GCCAACCTCTGGCTTGCTGCCTGCCTCAGCCTGCTTTACTGCTCCAAGCTCATCCGTTTCTCTCAC
ACCTTCCTGATCTGCTTGGCAAGCTGGGTCTCCAGGAAGATCTCCCAGATGCTCCTGGGTATTATT
CTTTGCTCCTGCATCTGCACTGTCCTCTGTGTTTGGTGCTTTTTTAGCAGACCTCACTTCACAGTC
ACAACTGTGCTATTCATGAATAACAATACAAGGCTCAACTGGCAGATTAAAGATCTCAATTTATTT
TATTCCTTTCTCTTCTGCTATCTGTGGTCTGTGCCTCCTTTCCTATTGTTTCTGGTTTCTTCTGGG
ATGCTGACTGTCTCCCTGGGAAGGCACATGAGGACAATGAAGGTCTATACCAGAAACTCTCGTGAC
CCCAGCCTGGAGGCCCACATTAAAGCCCTCAAGTCTCTTGTCTCCTTTTTCTGCTTCTTTGTGATA
TCATCCTGTGCTGCCTTCATCTCTGTGCCCCTACTGATTCTGTGGCGCGACAAAATAGGGGTGATG
GTTTGTGTTGGGATAATGGCAGCTTGTCCCTCTGGGCATGCAGCCATCCTGATCTCAGGCAATGCC
AAGTTGAGGAGAGCTGTGATGACCATTCTGCTCTGGGCTCAGAGCAGCCTGAAGGTAAGAGCCGAC
CACAAGGCAGATTCCCGGACACTGTGCTGAGAATGGACATGAAATGAGCTCTTCATTAATACGCCT
GTGAGTCTTCATAAATATGCC
Differential Gene Expression
RNA Sequence (RNA-Seq) gives “snapshot” of genes
active in different cells at different times
6
Differential Gene Expression
RNA Sequence (RNA-Seq) gives “snapshot” of genes
active in different cells
7
RNA Sequence (RNA-Seq) Analysis
Design RNA-Seq experiment, i.e., differential expression
Isolate total RNA; convert to DNA library
Sequence experiment and control libraries
Analyze sequence data on DNA Subway Green Line
Follow-up experimental validation
Image source: http://www.bgisequence.com
1) Manage Data: Quality Assessment
with FastQC; ~100 Million 75/150
nucleotide reads in < 1hr
2) FastX ToolKit: Quality Control with FastX
Toolkit; ~100M 75/150 nucleotide reads in
<1 hr (some took up to 19 hours…)
3) TopHat: Aligns ~100 Million 75/150
nucleotide (paired end) reads to a reference
genome of 100M–5B in 6–19hr
TopHat Alignment
JBrowse
TopHat Alignment
JBrowse
4) CuffLinks: Assembles transcripts and
calculates abundance on BAM files,
1–12GB in 6–19hr
5) CuffDiff: Merges assemblies from Cufflinks
and performs differential expression
analysis on 4–9 samples in 6–19 hr
Green Line
Queue time vs Run time


Asking for a high run time, leads to longer queue times
Asking for a short high time may lead to job being
terminated

Users don't like to wait too long

Users want the results right away

Finding the right balance is not easy
Green Line
Dealing w/ the unexpected





Systems taken offline
Maintenance
Network outages, data transfer issues
Science API gives glitches
Authentication
Green Line
“Monitoring XSEDE”
DNA Subway
“Power Desktop”
• Intuitive interface to support seamless genome
“round trip” for eukaryote of choice
• Access high performance computing to analyze whole
genome data (RNA-seq, initially)
• Scaffold data to sequenced genomes available in
iPlant Data Store
• Directly upload RNA-seq reads as biological evidence
for genome annotation using Red Line
NSF CCLI Project Retreat
June 8–20, 2014, CSHL
• 11 faculty from PUIs
• Program included lectures/practical sessions
Wet lab: RNA library prep
Green Line analysis & bioinformatics
Pedagogy/teaching resources
Virtual training materials
NSF CCLI Project Retreat
Faculty Participants
Agnes Ayme-Southgate
College of Charleston, SC
Judy Brusslan
California State University, Long Beach, CA
Raymond Enke
James Madison University, VA
Shaye Lewis
Prairie View A&M University, TX
Irina Makarevitch
Hamline University, MN
Judith Ogilvie
Saint Louis University, MO
Jeremy Seto
New York City College of Technology, CUNY, NY
Carrie Thurber
Abraham Baldwin Agricultural College, IL
George Ude
Bowie State University, MD
Deirdre Vaden
Prairie View A&M University, TX
Scott Woody
University of Wisconsin, WI
Flight muscle development during life-stage transitions in Apis
melifera (honeybee)
Leaf development and senescence in Arabidopsis thaliana
Retina development in Gallus gallus
Testes development from juvenile to puberty in caprine (goat)
Response to cold stress in maize
Retinal changes of mice with retinitis pigmentosa
Differentiation of rat pheochromocytoma line cells (PC12) to a
neuronal-like phenotype
Seed abscission in Sorghum bicolor
Floral inflorescence genes in banana/plantains
Peripheral blood mononuclear cells from hypertensive rats
treated with captopril
Gibberellic acid exposure in Brassica rapa (Fast Plants)
gibberellic acid (gad) mutants
NSF CCLI Project Retreat
Flight muscle development during life-stage
transitions in Apis mellifera (honeybee)
Agnes Ayme-Southgate, College of Charleston, SC
All honeybees begin as worker bees, flying short distances.
Some honeybees transition into foragers, flying long distances.
This transition necessitates major changes in flight muscles.
Goal is to identify the gene expression changes in flight muscles
during this transition
Courses
• Biol 322: Developmental Biology, 30–38 students
• Genetics, 100 students
• Undergraduate research in lab, 2–3 students
NSF CCLI Project Retreat
Differential gene expression in Capra hircus (goat)
testes during juvenile development
Shaye Lewis, Prairie View A&M University, TX
Fertility phenotypes show low heritability, and semen analysis
parameters cannot determine fertility status. Molecular
biomarkers can increase efficiency of artificial insemination
and embryo transfer in goats. Goal is to identify genes
important for normal testes development and function
Courses
•4533: Animal Breeding & Genetics, 20 students
•Undergraduate research in lab, 4 students
NSF CCLI Project Retreat
Understanding transcriptional response to cold
stress in maize
Irina Makarevitch, Hamline University, MN
Maize is grown worldwide and is astaple for >1 billion people.
Maize is thermophilic and sensitive to low temperatures, and
understanding how plants respond to cold can improve yields.
Goal is to identify genes that are differentially expressed when
maize is grown under cold stress
Courses
•Biol 201: Principles of Genetics, 80 students
•Biol 301: Genomics & Bioinformatics, 20 students
•Undergraduate research in lab, 4 students
NSF CCLI Project Retreat
RNA-Seq Datasets Generated and Analyzed
Using the Green Line of DNA Subway
• 8 eukaryotic organisms
• 21 controls paired with 26
experimental conditions
• 402 Gbases sequenced
• 837 jobs submitted to TACC
• 87% jobs completed
• 695 hours total CPU time
• 16 threads/processors running
concurrently
Intended Implementation 2014-15
100
level
200
level
300
level
400
level
500
level
Intro
Genetics, 270
Genetics, 220
Molecular & Cell Molecular Biology,
Biology, 50
100
Molecular
Applications
in Crop
Improvement
15
Biology
Cell & Molecular
Biology, 75
20
15
Genomics, 40
Genomics &
Bioinformatics, 70
Animal Breeding &
Genetics, 20
Developmental
Biology, 35
Independent
Research, 5
Undergrad
Research
Cell Structure &
Function, 30
Synthetic Biology, 30
Anatomy/Physiology,
50
Advanced Genetic
Techniques, 15
100s
320
550
140
DNA Subway is…
Producers
Uwe Hilgert
David Micklos
Jason Williams
Designers
Eun-Sook Jeong
Susan Lauter
Programmers
Cornel Ghiban
Mohammed Khalfan
Sheldon McKay
Contributors
Matt Vaughn
Rion Dooley
Anthony Biondo
Jim Burnette
Scott Cain
Ed Lee
Zhenyuan Lu
Advisors
Matt Conte
Carson Holt
Bruce Nash
Oscar Pineda-Catalan
HPC in Undergraduate Biology Education
Banbury Center, CSHL, September 3-5, 2014
Contact Dave Micklos ([email protected])
A Great Gatsby era estate on
Long Island’s “Gold Coast”
Funded by NSF and the Alfred
P. Sloan Foundation