Download Introduction to Next Generation Sequencing (NGS) Data Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geographic information system wikipedia , lookup

Neuroinformatics wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Introduction to Next Generation
Sequencing (NGS) Data Analysis and
Pathway Analysis
Jenny Wu
Outline
• Introduction to NGS data analysis in Cancer
Genomics
– NGS applications in cancer research
– Typical NGS workflows and pipeline
– Open source software with GUI
• Pathway Analysis and Software
• Pathway Analysis goals and concepts
• Commercial and open source pathway analysis software
• Data analysis resources
• Summary
Next Generation Sequencing
Massively Parallel Sequencing: One can
generate hundreds of millions of short
sequences (up to 250bp) in a single run in a
short period of time with low per base cost.
• Illumina/Solexa GA II, HiSeq 2500, 3000,X
• Roche/454 FLX, Titanium
• Life Technologies/Applied Biosystems SOLiD
Reviews: Michael Metzker (2010) Nature Reviews Genetics 11:31
Quail et al (2012) BMC Genomics Jul 24;13:341.
NGS in Cancer Genomics
Shyr et al.2013
Data Analysis in the bottleneck
Informatics
(wall.hms.harvard.edu)
Basic NGS Workflow
Isolation of material
PCR amplification
End repair, size selection
Library QC
Cluster generation
Instrument operation
QC and pipeline analysis
Data interpretation
Olson et al.
High Throughput Data Analysis Overview
Olson et al.
Many Analysis Pipelines Start with Read Mapping
Typical Data Analysis Pipelines
Genotyping (GATK)
http://www.broadinstitute.org/gsa/wiki/images/7/7a/Overall_flow.jpg
http://www.broadinstitute.org/gatk/guide/topic?name=intro
RNA-seq (Tuxedo)
http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html
Cancer NGS Data Analysis Pipeline-Software
Raw reads
FASTQC, FASTXtoolkit,
Trimmomatic
Analysis-ready
reads
BWA, STAR
Visualization (IGV,
IGB, USCS GB……)
Mapped reads
Data
Task
Software
……
Cancer NGS Application Specific Software
Mapped reads
……
SomaticSniper,
VarScan2, mutect
freeBayes, Pindel,
CNVnator
Cufflinks, MISO
DESeq2,GATK
MACS2, SISSRs
Bismark, BS
Seeker
Open Source Software with GUI
Galaxy: Web based platform for
analysis of large datasets
http://hpc-galaxy.oit.uci.edu/root
https://main.g2.bx.psu.edu/
https://usegalaxy.org/
GENE-E: java based matrix
visualization and analysis platform;
includes heatmap, clustering,
filtering etc.
http://www.broadinstitute.org/cancer/software/GENE-E
Commercial software for NGS analysis
• Easy to use, no
command line skills
required
• Usually platform
independent
• Little to no learning
curve
o Limited flexibility
o Harder to publish
Outline
• Introduction to NGS data analysis in Cancer
Genomics
– NGS applications in cancer research
– Typical NGS workflows and pipeline
– Open source software with GUI
• Pathway Analysis and Software
• Pathway Analysis goals and concepts
• Commercial and open source pathway analysis software
• Data analysis resources
• Summary
Why Pathway Analysis
• Logical next step in any high
throughput experiments
• Goal: to characterize biological
meaning of the joint changes in gene expression
• Why? Often group of genes doing related
functions are changed
Pathway and Network Analysis
Pathway Analysis Methods:
• Functional category over representation:
discrete test for significance (BiNGO, David, IPA etc)
• Continuous test (GSEA, PAGE)
• Signaling Pathway Impact Analysis (iPathway
Guide)
Network Analysis: (WGCNA, Cytoscape etc)
Functional Category Enrichment
• Discrete tests: enrichment for groups in gene
lists
– Select gene list at some
predefined cutoff
– For each gene list and
functional category
cross-tabulate to get a 2X2
contingency table
– Test for significance using
Fisher’s exact test
– FDR correction for multiple
hypothesis testing
Differentially
expressed
Not
differentially
expressed
total
In the
pathway
a
b
a+b
Not in the
pathway
c
d
c+d
total
a+c
b+d
n
Functional Categories in Pathway Analysis
• Gene Ontology
– Biological Process
– Molecular Function
– Cellular Localization
• Pathway Databases
–
–
–
–
KEGG
BioCarta
Broad Institute (MSigDB)
Commercial knowledge bases
such as IPA
• Other
– Transcription factor targets
– Protein complexes
– Self-Defined
Commerical and Open Source Pathway
Analysis Software
Ingenuity Pathway Analysis Tool
IPA Input file
IPA results page
Resources in NGS data analysis
Public forums:
Computational resources available at UCI:
• HPC: open source software
• CLCbio, IPA, JMP Genomics…
Summary
• NGS technologies are transforming
cancer research.
• Data analysis is a crucial part in NGS
applications
• Pathway analysis concepts and software
• Data analysis resources
Thank you!