Download UCSC Genome Browser Workshop

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial gene synthesis wikipedia , lookup

Transcript
UCSC Genome Browser
Workshop
Intro to the Browser
• Interactive website providing access to genome
data from >45 species
• Multiple annotation datasets (“tracks”) available
for each genome
– Include information on known genes, disease
associations, variants, expression, regulation,
conservation…
– Can search by gene/region/accession numbers or
upload your own data
Intro to the Site
• Also includes:
– Information mining tools
(Table Browser)
– Fast sequence alignment
(BLAT)
– Visualization of GWAS results
(Genome Graphs)
– Gene grouping by shared
features (e.g. tissue of
expression, pathways, etc –
Gene Sorter)
– Predicted amplicons given
primers (in silico PCR)
Brief Basics
• http://genome.ucsc.edu
• Select Genome Browser
• Option to select any
genome/species (for
simplicity we’ll use the
default)
• In search term enter
gene/region/accession#
and click submit
Pick a gene…
Example: GPR89A
Interface:
Gene Features
Click “Base” in zoom bar to get to sequence level
Track Highlights
• Click on gene name to acquire information on
chemical interactions, haplotypes, related
genes, expression and protein domains
• Mapping and Sequencing:
– Base position
– BAC end pairs: find BACs containing your gene
• Genes and Gene Predictions
– UCSC Genes (includes CCDS)
– Pfam (domains=functional regions of the protein)
– MCG Genes: use to order clones
Track Highlights
• Phenotype and Literature: we’ll come back to
this
• mRNA and EST: useful for identifying exons
and transcript variants not provided in Genes
track.
• Comparative Genomics
– Conservation (dark values/higher histograms =
higher conservation scores)
– Primate/vertebrate alignments
Track Highlights
• Regulation
– ENCODE regulation: shows multiple regulatory
features
• Histone modifications
• Transcription factor binding sites
• DNaseI hypersensitivity sites
 These all complement one another to highlight regions
that are likely to be regulatory in nature
– CpG islands: associated with transcription start
sites, often near promoters
– ENCODE methylation: gene regulation via silencing,
enriched in regulatory regions
Pick a region…
Example: 1q21.1
Track Highlights
• Expression
– RNAseq expression data (example: Burge)
– Sestan Brain data
– Exon array expression data (skip for now but keep
in mind for later lectures)
– Proteogenomics and peptide data (expressed
proteins)
– qPCR pre-designed primers
Track Highlights
• Variation
– SNPs
– Structural Variation
– SNP/CNV arrays
• Repeats
– Repeatmasker
– Segmental Duplications
• Clinically oriented track: Phenotypes and
Literature
– Decipher
– OMIM
Other Site Functions
• BLAT: alignment tool
– https://genome.ucsc.edu/cgi-bin/hgBlat
– Can paste sequence directly into search box, or
upload a text file containing the sequence
– Example:
AGGGAGATGCAGAAGGCTGAAGAAAAGGAAGTCCCTG
AGGACTCACTGGAGGAATGTGCCATCACTTGTTCAAATA
GCCACGGCCCTTATGACTCCAACCAGCCTCACAGGAAC
ACCAAAATCACATTTGAGGAAGACAAAGTCGACTCAAC
TCTGGTTGTAGA
Other Site Functions
• In Silico PCR:
– Predicts amplicons based on defined primers
– http://genome.ucsc.edu/cgi-bin/hgPcr
– Example
• Left: GCCTTATTAGCATCCCAAGACAA
• Right: CCCTGAACAGCCTTTCCTTCT
Other Site Functions
• Creating Custom Tracks (your data!):
– Annotation data can be in standard GFF format or
bedgraph, GTF, PSL, BED, bigBed, WIG, bigWIG,
BAM, VCF, MAF, BED detail, Personal Genome SNP,
broadPeak, narrowPeak and microarray (BED15)
– Can upload files directly, type information into the
track input, or link to URLs containing the data of
interest
Custom Tracks: Examples
browser position chr22:20100000-20100900
track name=coords description="Chromosome coordinates list" visibility=2
chr22 20100000 20100100
chr22 20100011 20100200
chr22 20100215 20100400
chr22 20100350 20100500
chr22 20100700 20100800
chr22 20100700 20100900
browser position chr22:20100000-20140000
track name=spacer description="Blue ticks every 10000 bases" color=0,0,255,
chr22 20100000 20100001
chr22 20110000 20110001
chr22 20120000 20120001
track name=even description="Red ticks every 100 bases, skip 100" color=255,0,0
chr22 20100000 20100100
first
chr22 20100200 20100300
second
chr22 20100400 20100500
third
browser position chr21:33,031,597-33,041,570
track type=bigBed name="bigBed Example One" description="A bigBed file"
bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb
http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb
Custom Tracks: Key Points
Browser Lines: how you instruct the browser to display innate features such as
genome position and track visibility
Basic format:
browser
attribute_name attribute_value
Track Lines: how you instruct the browser to display features of your data such as
name, color, file type, quality score, etc
Basic format:
attribute=value pair
browser position chr22:10000000-10020000
browser hide all
track name=clones description="Clones" visibility=2 color=0,128,0 useScore=1
url="http://genome.ucsc.edu/goldenPath/help/clones.html#$$"
chr22 10000000 10004000 cloneA 960
chr22 10002000 10006000 cloneB 200
chr22 10005000 10009000 cloneC 700
chr22 10006000 10010000 cloneD 600
chr22 10011000 10015000 cloneE 300
chr22 10012000 10017000 cloneF 100
Other Site Functions: Genome Graphs
• Visualizes GWA data (SNP, linkage, etc)
• Your data in UCSC-readable format
— chromosome base: e.g. chr1 130000 (Note that the first base in a chromosome is
considered position 0.)
— STS Marker: e.g. RH75228
— dbSNP rsID: e.g. rs12345
— Affymetrix 500k Gene Chip: e.g. SNP_A-1780270
— Affymetrix Genome-Wide SNP Array 6: e.g. SNP_A-8575125
— Affymetrix SNP Array 6 Structural-Variation: e.g. CN_47396
— Illumina HumanHap300 Bead Chip: e.g. rs3934834
— Illumina HumanHap550 Bead Chip: e.g. rs3094315
— Illumina HumanHap650 Bead Chip: e.g. rs3094315
— Agilent CGH 244A: e.g. A_14_P112718
Genome Graphs
• Example: ChIPSeq data
– Go to genome graphs, click upload and paste following
into URL box:
• http://genometest.cse.ucsc.edu/ABRF2010/chr21_extended.txt_redbin.sgr
• Key features: can set significance threshold, browse
significant hits, gene sorter for information on genes
Other Site Functions: Table Browser
• Retrieve data associated with tracks, intersect
data, retrieve sequences and output data
• Go to ToolsTable Browser:
–
–
–
–
Group: “Genes and gene predictions”
Table: “Known Genes”
Click [paste list]
Copy and paste the list at:http://genometest.cse.ucsc.edu/~kuhn/workshops/ashg2014/genelis
t
– [get output]
– Select fields you want
Other Site Functions
• Gene Sorter: sort genes by similarity
measures, tissue expression features, etc
• VisiGene: find IHC/other imaging data on
genes of interest
• Utilities: features liftOver tools, formatting
optimization, code downloads
Additional Resources
• http://genome.ucsc.edu/goldenPath/help/hgT
racksHelp.html
• http://www.nature.com/scitable/ebooks/guid
e-to-the-ucsc-genome-browser16569863/contents
• http://genomewiki.ucsc.edu/index.php/ABRF2
010_Tutorial