* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Short read alignment, genome alignment, and high performance
Survey
Document related concepts
Promoter (genetics) wikipedia , lookup
Exome sequencing wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Community fingerprinting wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Non-coding DNA wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic library wikipedia , lookup
Molecular evolution wikipedia , lookup
Transcript
Short read alignment BNFO 601 Short read alignment • Input: – Reads: short DNA sequences usually up to 100 base pairs (bp) produced by a sequencing machine • Reads are fragments of a longer DNA sequence present in the sample given as input to the machine • Usually number in the millions – Genome sequence: a reference DNA sequence much longer than the read length Short read alignment • Applications – Genome assembly – RNA splicing studies – Gene expression studies – Discovery of new genes – Discovering of cancer causing mutations Short read alignment • Two approaches – Hashing based algorithms • • • • BFAST SHRIMP MAQ STAMPY (statistical alignment) – Burrows Wheeler transform • Bowtie • BWA BFAST overview PLoS ONE 4(11): e7767. BFAST algorithm PLoS ONE 4(11): e7767. BFAST masked keys Short read alignment Empirical performance: • Simulated data: – Extract random substrings of fixed length with random mutations and gaps – Realign back to reference genome • Real data: – Paired reads: two ends of the same molecule – Count number of paired reads within 500 to 10000 bases of each other Short read alignment Courtesy of Genome Res. June 2011 21: 936-939; Short read alignment Courtesy of Genome Res. June 2011 21: 936-939; Short read alignment