Download Genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Genetic engineering wikipedia , lookup

DNA barcoding wikipedia , lookup

Adeno-associated virus wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Human genetic variation wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene wikipedia , lookup

Copy-number variation wikipedia , lookup

Designer baby wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome (book) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Transposable element wikipedia , lookup

Metagenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Public health genomics wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

NUMT wikipedia , lookup

Minimal genome wikipedia , lookup

Helitron (biology) wikipedia , lookup

Sequence alignment wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

ENCODE wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic library wikipedia , lookup

Human genome wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomics wikipedia , lookup

Genome editing wikipedia , lookup

Genome evolution wikipedia , lookup

Human Genome Project wikipedia , lookup

Transcript
Sequence Comparison and
Genome Alignment in the
Human Genome
Jian Ma
PowerPoint by Casey Hanson
Jian Ma | Sequence Comparison and Genome Alignment | 2015
1
Introduction
This goals of the lab are as follows:
1.
Gain experience using BLAST and Genome Browsers by looking at repeat
families in the VHL gene.
2.
Become familiar with BLAT and the UCSC website by discovering the
identity of a mystery sequence.
3.
Visualize pairwise multi-genome alignment and chromosomal
rearrangements.
4.
View phylogeny based multi-genome alignment.
5.
Use UCSC tools and Galaxy to intersect annotated functional regions
between human and other placental animals.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
2
Step 0: Local Files
For viewing and manipulating the files needed for this laboratory
exercise, insert your flash drive.
Denote the path to the flash drive as the following:
[course_directory]
We will use the files found in:
[course_directory]/07_Comparative_Genomics/data/
Jian Ma | Sequence Comparison and Genome Alignment | 2015
3
BLAST & Genome Browser
In this exercise, we will use BLAST (Basic Local Alignment Search Tool) to search for significant
occurrences of a class of transposable elements (TEs) called Short INterspersed Elements
(SINEs), specifically of the ALU family, in the well-known VHL tumor suppressor gene.
The goal of this exercise is to gain experience using BLAST, particularly blastN, and the UCSC
genome browser to answer biologically relevant questions.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
4
Step 1A: BLAST VLH in ALU Database
Go to the following web page: http://blast.ncbi.nlm.nih.gov/Blast.cgi
Click nucleotide_blast
In the Enter Query Sequence box, paste the accession # for VHL:
AF010238
In the Database drop-down list, select the following:
Human ALU repeat elements (alu_repeats)
Click the BLAST button.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
5
Step 1B: BLAST VLH in ALU Database
Jian Ma | Sequence Comparison and Genome Alignment | 2015
6
Step 2A: Interpreting BLAST Results
Color Indicates Quality of Match
Coordinates of VHL gene
Very Good Matches
A match is a significant
similarity between a
region of the query and a
region of a database
sequence.
Good Matches
Lines between boxes
indicate ‘gaps’ between
matches in the query
sequence. (The next slide
has a legend for
interpretation)
Okay Matches
Jian Ma | Sequence Comparison and Genome Alignment | 2015
7
Step 2B: Interpreting BLAST Results
Exonic regions less likely to have ALU repeats.
Matches like this are likely to be located in intronic regions.
Note the following legend for interpreting a match.
Intron
Excellent Match
Exon
Intron
Good Match
Exon
Jian Ma | Sequence Comparison and Genome Alignment | 2015
Intron
Okay Match
8
Step 3A: Examine VHL in UCSC Browser
Let’s look at the structure of the VHL gene in a Genome Browser to verify
that ALU elements are confined to the introns.
Go to the following web page: http://genome.ucsc.edu/
Click Genome Browser
Select genome, Human
In the search term, type VHL
Click submit
Click the 2nd link: VHL (uc003bvd.3) at chr3:10183319-10195354
Jian Ma | Sequence Comparison and Genome Alignment | 2015
9
Step 3B: Examine VHL in UCSC Browser
Enter chr3:10,181,000-10,196,000 into input box and click go.
Right click on tracks NOT shown below and hide them.
Right click on the RepeatMasker track and click full. It is dense by default.
Adjust the zoom until you get a view you are comfortable with.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
10
Step 3C: Examine VHL in UCSC Browser
Repeat tracks are 3’ to the gene, 5’ to the gene, or in the intronic region. This
validates our hypothesis.
ALUs are not the only family of SINEs located in the intronic regions. What
other SINE families does VHL have? What about other TE classes other than
SINE?
Jian Ma | Sequence Comparison and Genome Alignment | 2015
11
BLAT
In this exercise, we will use BLAT (Basic Local Alignment Tool) to search for the identity of a
mystery gene annotated in the human genome.
The goal of this exercise is to gain experience using BLAT and the UCSC genome browser to
answer biologically relevant questions.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
12
BLAST v. BLAT
BLAST
BLAT
 Can find matches to a query in any set of
GenBank sequences.
× Limited to matches to a query in a
particular reference genome.
 Not limited to a given k-mer size.
× Limited to non-overlapping 11-mers for
DNA.
× Consumes a lot of memory.
 Can fit an entire genome in memory ( <
1GB) of RAM.
× Slow compared to BLAT.
 Fast compared to BLAST.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
13
Step 1A: BLAT the Mystery Sequence
Go to the following web page: http://genome.ucsc.edu/
Click BLAT
Open our mystery sequence, located below, in Notepad.
[course_directory]/07_Comparative_Genomics/data/mystery_sequence.txt
Paste the sequence into the text area.
Click Submit
Jian Ma | Sequence Comparison and Genome Alignment | 2015
14
Step 1B: BLAT the Mystery Sequence
Screenshot of the web form for BLAT.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
15
Step 2A: Identify Mystery Sequence
BLAT will return a list of significant matches in the genome.
Investigate the matches in the list by clicking browser for each match
For example, click the first browser link here.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
16
Step 2B: Identify Mystery Sequence
The screenshot below shows UCSC and RefSeq genes aligned to the Mysterious Sequence. In
particular, CYP2A13.
Examine the other matches on the previous slide in the genome browser.
Keep in mind 2 questions: (Answers provided at the end of the document)
A. How many potential genes does the mystery sequence come from?
B. What is the relationship among these genes?
Jian Ma | Sequence Comparison and Genome Alignment | 2015
17
Pairwise Whole Genome
Alignments
In this exercise, we will utilize the UCSC Genome Browser to view whole genome alignments
computed by lastZ of the following genomes individually to human: organutan, mouse, dog,
and opossum. We will investigate these alignments to see if we can discover chromosomal
rearrangements.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
18
Step 1: Create a Custom UCSC Track
Go to the UCSC Genome Browser: http://genome.ucsc.edu/index.html
Under the My Data Tab, click Create Custom Tracks:
In the Paste URLs textbox paste the following and click submit: (no commas)
chr13 58481798 58486558
On the next page, click Go to Genome Browser
Jian Ma | Sequence Comparison and Genome Alignment | 2015
19
Step 2A: Track Addition
The track should look similar to what is below:
Jian Ma | Sequence Comparison and Genome Alignment | 2015
20
Step 2B: Track Addition and Removal
To get ‘Pairwise Alignments’ we need to turn a few tracks on and one track off.
Specifically, we need to select:
Primate Chain/Net
Placental Chain/Net
Vertebrate Chain/Net.
Underneath the Comparative Genomics Tab, turn these tracks to dense.
Additionally, set Conservation to hide and click refresh.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
21
Step 2C: Track Addition
The resulting view should look like the figure below.
There is one problem: our species of interest are not being displayed.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
22
Step 2D: Species Selection
To select the correct species, go back to the Comparative Genomics Tab.
Click on the Primate Chain/Net link.
In the resulting window, set Chains to hide and make sure only Orangutan is
selected.
Click Submit
Jian Ma | Sequence Comparison and Genome Alignment | 2015
23
Step 2E: Species Selection Continued
Conduct Step 2D for the other two tracks:
Placental Chain/Net
Vertebrate Chain/Net
Make sure your configuration resembles the screenshots below:
Placental Chain/Net
Vertebrate Chain/Net
Jian Ma | Sequence Comparison and Genome Alignment | 2015
24
Step 2F: Expand Tracks
On the tracks for each species, Right Click and select Full.
The resulting Genome Browser (after moving the tracks to the top) should look like the
following:
Jian Ma | Sequence Comparison and Genome Alignment | 2015
25
Step 3: Whole Genome Alignment Analysis.
Investigate the tracks for each species and answer the
following questions.
A. Are the sequence counterparts co-linear with respect
to human? If not, is their evidence of genomics
rearrangements in this region? Which kind?
B.
Can you infer when these rearrangements happened
evolutionarily on the diagram to the right?
Jian Ma | Sequence Comparison and Genome Alignment | 2015
26
Phylogeny Based Whole Genome
Alignment
In this exercise, we will utilize the UCSC Genome Browser to view a refined whole genome
alignment of orangutan, mouse, dog, and opossum genomes to human. This alignment is
produced by Multiz, a program that utilizes pairwise whole genome alignments of many
species and, using a phylogenetic tree, improves the alignment.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
27
Step 1: Setup Multiz Visualization
Go to the UCSC Genome Browser: http://genome.ucsc.edu/index.html
Upload the following as a Custom Track and go to the genome browser, as
in the previous exercise: (no commas)
chr20 61733467 61733528
Under the Comparative Genomics tab in the genome browser, click on
Conservation.
Ensure the following settings are in place on the next 2 pages:
Jian Ma | Sequence Comparison and Genome Alignment | 2015
28
Step 1B: Setup Multiz Visualization
Jian Ma | Sequence Comparison and Genome Alignment | 2015
29
Step 1C: Setup Multiz Visualization
Once your configuration resembles the last 2 figures, click submit
Jian Ma | Sequence Comparison and Genome Alignment | 2015
30
Step 2: Multiz Visualization Analysis
After rearranging tracks, the genome browser should resemble the figure
below:
Investigate the tracks for each species and answer the following
questions:
A. Is this region highly conserved in mammals?
B.
Look closely at the Multiz track. Do you see anything strange in the
human sequence compared to the other species? What could be the
reason for this discrepancy?
Jian Ma | Sequence Comparison and Genome Alignment | 2015
31
Intersection of Annotated
Regulatory Regions in Human
and Placental Mammals
In this exercise, we will use Galaxy to intersect annotated regulatory regions in
human with annotated regions in other placental mammals.
We will then view the intersection in the UCSC genome browser
Jian Ma | Sequence Comparison and Genome Alignment | 2015
32
Step 1A: Place Regulatory Data in Galaxy
Connect to Galaxy : https://usegalaxy.org/
Upload the sequence of predicted regulatory regions in h19 to Galaxy:
[course_directory]/07_Comparative_Genomics/data/PRe_Mod_hg19.bed
Make sure to identify hg19 as your reference genome.
Acquire all conserved regions in placental mammals from the UCSC Main Table Browser in Galaxy:
Jian Ma | Sequence Comparison and Genome Alignment | 2015
33
Step 1B: Place Regulatory Data in Galaxy
Select Comparative Genomics for Group
Select 100 Vert. E1 (phastConsElements100way) for table.
Select Genome for region.
Select Galaxy for send output to.
Click Get Output.
On the next screen, click Send Query to Galaxy.
Jian Ma | Sequence Comparison and Genome Alignment |
2015
34
Step 2: Intersect Datasets
Go to Operate on Genomic Intervals in Galaxy and select Intersect.
Select the parameters below and click Execute.
When finished, click display at UCSC in history pane.
UCSC Results
chr19 regulatory regions.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
35
Step 3: Predicted Modules Overlap with PAX5
Regulators
Jian Ma | Sequence Comparison and Genome Alignment | 2015
36
Exploratory Exercise
Pick a gene of interest. (VHL, CMYC, ETS1, TBP, USF2, GATA-1, …)
Visualize the intersected intervals in the UCSC Genome Browser.
See how this region correlates with results from ENCODE to
assess their functional roles.
We will come around to help.
Jian Ma | Sequence Comparison and Genome Alignment | 2015
37