Download Analyzing human variation with Galaxy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Silencer (genetics) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Exome sequencing wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Gene expression profiling wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene desert wikipedia , lookup

Genome evolution wikipedia , lookup

SNP genotyping wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Analyzing human variation with Galaxy
Belinda Giardine and Cathy Riemer
Feb 8, 2012
Outline
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Fake example dataset
 SNP calls from Complete Genomics GS12880
 5 known disease variants added for illustration
 Various genes and parts of the gene (coding, regulatory, splicing, …)
 Realistic background for search, but not a realistic SNP combination
Uploading a file
Converting file format
Shared data
Importing datasets from library
Filtering SNPs
Filter results
Outline
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
PolyPhen2
Filtering PolyPhen2 results
PolyPhen2 results
Linking identifiers
Identifier fields
Join identifiers to result
Comparative Toxicogenomics Database (CTD)
Outline
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
SIFT inputs
Shared data
Workflow
Your workflows
Running the workflow
Running SIFT
Filter SIFT results
SIFT results
Outline
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Import predicted regulatory regions
Filter with intersect tool
PRPs results
Using ENCODE data
Again filter with intersect
DNase HSS results
Conservation
Histogram of phyloP scores
Filter on phyloP greater than or equal to 0.5
phyloP results
What we covered
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Editing the dataset name and build