Download Week 13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

NUMT wikipedia , lookup

Epigenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Human genome wikipedia , lookup

Minimal genome wikipedia , lookup

Genome editing wikipedia , lookup

Genome evolution wikipedia , lookup

DNA sequencing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Pathogenomics wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomic library wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Exome sequencing wikipedia , lookup

Metagenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomics wikipedia , lookup

Transcript
10/09/2015
Tools and Algorithms in Bioinformatics
GCBA815, Fall 2015
Week 13: Next Generation Sequencing (NGS)
Analysis
Adam Cornish
Graduate Student
Guda lab
Department of Genetics, Cell Biology and Anatomy
University of Nebraska Medical Center
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Introduction
n  Vector
NTI is an integrated suite of
sequence analysis and design tools that
help you manage, view, analyze,
transform, share, and publicize diverse
types of molecular biology data, in a
graphically rich analysis environment.
Eisenstein. Nature. 2015
__________________________________________________________________________________________________
Fall 2015
GCBA 815
1
10/09/2015
Sources of NGS data
PacBio
Illumina
Ion Torrent
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Single Cell Sequencing
__________________________________________________________________________________________________
Fall 2015
GCBA 815
2
10/09/2015
Applications of NGS
n 
Genome
¨ 
¨ 
¨ 
¨ 
¨ 
n 
Transcriptome
¨ 
¨ 
¨ 
¨ 
n 
n 
Targeted sequencing panels (cancer, newborns, autism, etc.)
Whole exome sequencing
Whole genome sequencing
Copy number analysis
Reconstruction of extinct species’ genomes
Whole transcriptome (poly-A selection)
Small RNA analysis (siRNA, snoRNA, lincRNA, etc.)
Gene expression profiling for selected target genes
Rare cell identification
Metagenome
¨ 
Bulk sequencing of many types of bacteria
¨ 
Examples: human gut microbiome, pollen composition, bacteria composition, viral studies
Epigenome
¨ 
¨ 
Chromatin Immunoprecipitation Sequencing (ChIP-Seq)
Methylation Sequencing (Methyl-Seq)
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Variant calling using NGS data
__________________________________________________________________________________________________
Fall 2015
GCBA 815
3
10/09/2015
Important file types
The big three:
n 
Fastq
n 
SAM/BAM
n 
VCF
¨ 
¨ 
¨ 
¨ 
¨ 
¨ 
¨ 
¨ 
¨ 
Raw sequencing data usually directly from the sequencer
Sequence data that has usually been aligned to a specific genome
Tab-delimited text file that contains a list of possible variants:
SNV
Insertion and deletion (indel)
Duplication
Copy number variation
Inversion
Tandem duplication
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Fastq
@SRR098401.11403008/1
GAGGCTATAGCATGGTCAAGGCACAAGAAGATCACTGGACTGCCCTCGCTCAGCCCTCAGCTACTG
+
>>?>?@>?>@@>?@@=@@@@@??>??@??@?@A?>@@@?>@@???A@:@A@@A@@@A@@AAB@@BB
Row 1: Information from the sequencer about the location of this
read on the plate
Row 2: The Sequence
Row 3: Metadata provided by the sequencing team
Row 4: Quality scores pertaining to each nucleotide in the
sequence
__________________________________________________________________________________________________
Fall 2015
GCBA 815
4
10/09/2015
Fastq continued
Phred quality
score
Quality scores are phred-scaled:
Seq:
TCAGCCCTCAGCTACTGCTCT
Score: A@@A@@@A@@AAB@@BBABAB
Probability that
the base is called
wrong
Accuracy of the
base call
20
1 in 100
99%
30
1 in 1,000
99.9%
40
1 in 10,000
99.99%
50
1 in 100,000
99.999%
Phred-33 is the most common, and is based on ASCII values.
The quality score of a base call is the ASCII value of the
character subtracted by 33.
Example: the ASCII value for ‘A’ is 65, and 65 - 33 = 32. That
means the base call corresponding to this score has a 1 in
~2,000 chance of being wrong.
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Sequence Alignment / Map (SAM / BAM)
SRR098401.104031357 83 chr22 17445857 60 76M = 17445512 -421
ACTGTTACCAGATCAAGAACTGATAGGGACAGGGATCATTATTCCCCCTTTACAGATGAGAAGGCCGTCACGCCTC
@@>>B@@@BBAAAB9A@@>:@@?=A@?@?@A???>?@??=???@@@@@>@>>@@@><??@>@>@@8?>?=:@>?>>
BD:Z:NOJKPQQQQMONOMKKKLNOMNLLLJLMINLJLMLMLKKKKJLJJJMKCKLINJMMLJKKKMOOMNNOLPQSNMK
K PG:Z:MarkDuplicates RG:Z:NA12878
BI:Z:OOMLRRPPRPPQQONOLOPOONOOOKLNMONJKMNONMMMMLMKKKMLGMNLNMMNNJMJLNOMLNMPNONONNM
M NM:i:0 MQ:i:60 AS:i:76 XS:i:0
Similar to the Fastq file in that it contains the raw sequence and
its quality scores.
It also tells you where the sequence aligned to the genome, and
how well (this scre is also phred-scaled).
In this case, this read aligned to chromosome 22, position
17445857, and has a quality score of 60 (or a 1 in 1,000,000
chance of being placed incorrectly).
__________________________________________________________________________________________________
Fall 2015
GCBA 815
5
10/09/2015
Variant Call Format (VCF)
__________________________________________________________________________________________________
Fall 2015
GCBA 815
ExAC Browser
6