Download Short read alignment, genome alignment, and high performance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Promoter (genetics) wikipedia , lookup

Exome sequencing wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Mutation wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Community fingerprinting wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Non-coding DNA wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomic library wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Structural alignment wikipedia , lookup

Transcript
Short read alignment
BNFO 601
Short read alignment
• Input:
– Reads: short DNA sequences usually up to 100
base pairs (bp) produced by a sequencing
machine
• Reads are fragments of a longer DNA sequence present
in the sample given as input to the machine
• Usually number in the millions
– Genome sequence: a reference DNA sequence
much longer than the read length
Short read alignment
• Applications
– Genome assembly
– RNA splicing studies
– Gene expression studies
– Discovery of new genes
– Discovering of cancer causing mutations
Short read alignment
• Two approaches
– Hashing based algorithms
•
•
•
•
BFAST
SHRIMP
MAQ
STAMPY (statistical alignment)
– Burrows Wheeler transform
• Bowtie
• BWA
BFAST overview
PLoS ONE 4(11): e7767.
BFAST algorithm
PLoS ONE 4(11): e7767.
BFAST masked keys
Short read alignment
Empirical performance:
• Simulated data:
– Extract random substrings of fixed length with
random mutations and gaps
– Realign back to reference genome
• Real data:
– Paired reads: two ends of the same molecule
– Count number of paired reads within 500 to 10000
bases of each other
Short read alignment
Courtesy of Genome Res. June 2011 21: 936-939;
Short read alignment
Courtesy of Genome Res. June 2011 21: 936-939;
Short read alignment