Download Repetitive DNA and next-generation sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression profiling wikipedia , lookup

Exome sequencing wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene wikipedia , lookup

Genetic engineering wikipedia , lookup

Craig Venter wikipedia , lookup

Bioinformatics wikipedia , lookup

Metagenomics wikipedia , lookup

Designer baby wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene prediction wikipedia , lookup

Whole genome sequencing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Transposable element wikipedia , lookup

Mycoplasma laboratorium wikipedia , lookup

Non-coding DNA wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomic library wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Human Genome Project wikipedia , lookup

Transcript
Repetitive DNA and nextgeneration sequencing:
computational challenges
and solutions
TJ Treangen, SL Salzberg
Nature Reviews Genetics, 2011
Chen Bichao
Scope
1.
Introduction of Repetitive DNA
2.
Mapping Assembly
3.
De novo Assembly
4.
RNA-Seq
5.
Conclusions
Introduction of Repetitive DNA
Repetitive DNA in the human genome
Mapping Assembly
Mapping assembly---problems
Mapping assembly---mapping strategies

Discard all multi-reads


Best match approach


Might result in biologically important variants being missed.
Will provide a reasonable estimate of coverage.
Report all alignments

Avoid making a possibly erroneous choice about read
placement.
De novo Assembly
De novo assembly---problems

Repeats that are longer than the read length create
gaps in the assembly.


Human genome has millions of copies of repeats in the
range of 200-500bp
An assembler can not distinguish the repeats



Create graphs and traverse them to reconstruct the
genome. (de brujin graph)
Repeats cause branches in the graph
Guess or break
De novo assembly---strategies

Using mate-pair information
De novo assembly---strategies

Using mate-pair information
De novo assembly---strategies

Using mate-pair information
De novo assembly---strategies

Using mate-pair information

Compute statistics on the depth of coverage



Assume the genome is uniformly covered
Identify the repeats
Combination of strategies
RNA-Seq
RNA-seq---problems and strategies

Read splicing



Aligning a read to two physically separate locations
False positives
Strategy for spliced alignment


Longer sequences align on both sides of each splice site,
doesn’t work on fusion genes
Exclude any read with more than one (or N) alignment(s)

Estimate gene expression level

Strategy for estimating gene expression

Distribute multi-reads in proportion to the number of reads
that map to unique regions of each transcript
Conclusions

Mapping assembly


De novo Assembly


Paired-end information
RNA-seq


Best match
Allocate multi-reads based on statistical information to
estimate expression level
Future





Increased read length
Role in disease,
Gene function,
Genome structure,
evolution
Longer paired-end libraries improved
contiguity in potato genome
Thank you.
Q&A