Download Career Advancement Workshop

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA sequencing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Pathogenomics wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomic library wikipedia , lookup

Median graph wikipedia , lookup

Genome evolution wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Metagenomics wikipedia , lookup

Transcript
Workshop on FCP Accelerated
NGS
Srinivas Aluru
Iowa State University
The Big Data Challenge
Then (2005)
ABI 3700
96 ~800 bp reads
76.8 X 103 bases
~$1 per kilo base
Now
Illumina Hiseq 2500
6 billion 100 bp reads
600 X 109 bases
~$1 per 200 million bases
Many NGS Technologies
Why FCP?
• 1 NGS experiment = ~100 GB data
• Sequencing Center decade ago  small
budget individual investigator today
• Many FCP technologies are inexpensive and
widely available
Genomes Galore – Big Data Analytics for High
Throughput DNA Sequencing
Driving Grand Challenges

Identification of complex disease traits

Detection of biological threats

Microbial studies and human health

Plant genotype to phenotype
⁞
⁞
Research and Dissemination Approach
Vision and Goals

Empower community
migration to HPC

Preserve ability to
create new solutions

Target researchers &
software developers
The Team




Srinivas Aluru (ISU)
Jaroslaw Zola (Rutgers)
Kunle Olukotun (Stanford)
Wu Feng (V. Tech)
Domain Experts:
 Patrick Schnable (ISU)
 Charles Sing (U. of Michigan)
NGS Application: Assembly
reconstruct longer original sequences from the high coverage
sampling of short fragments produced by NGS
Multiple copies
Sequence
Unordered
of the same
genome
source
fragments
Randomly fragment the copies
NGS Application: Assembly
 resequencing  genome mapping
 de novo sequencing  genome assembly
 gene expression analysis  transcriptome
assembly
 metagenomic sampling  metagenomic
clustering and/or assembly
Graph Abstractions for Assembly
• Overlap graphs
– node: an NGS read
– edge: suffix-prefix alignment between a pair of
reads
• De Bruijn graphs
– node: a kmer from an NGS read
– edge: length (k-1) suffix-prefix match between
two reads
Graph Operations for Assembly
• Graph construction from reads
• Collapsing chains
• Features in local neighborhood to identify
errors
• Path walking subject to distance constraints
on pairs of edges
• Operations on multiple assembly graphs, or
multiple genomes in a combined graph
NGS Error Correction
• Hamming/Edit distance graphs
– Node: a kmer in an NGS read
– Edge: two kmers with short hamming/edit
distance
• Graph operations needed
– Concurrent access to many nodes for neighbor
queries