* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Alternative Splicing Analysis Tools Through the UCSC Genome
Gene expression programming wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
X-inactivation wikipedia , lookup
Neocentromere wikipedia , lookup
Copy-number variation wikipedia , lookup
Microevolution wikipedia , lookup
Gene desert wikipedia , lookup
Genomic imprinting wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Gene expression profiling wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Metagenomics wikipedia , lookup
Primary transcript wikipedia , lookup
Non-coding DNA wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Designer baby wikipedia , lookup
Transposable element wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic library wikipedia , lookup
Alternative splicing wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome editing wikipedia , lookup
Tutorial on UCSC & GALAXY. March 2009. Schraga Schwartz & Eddo Kim Browsing the genome with an eye on splicing: An introduction to the UCSC genome browser, GALAXY and alternative splicing / 03.03.09 Hands-On session Exercise 1: basic use of UCSC and of servers for finding/analyzing splicing signals Outline: analyze the SCL35B3 gene in terms of alternative splicing; focus on exon 5, which is a cassette exon; examine the strength of the splicing signals flanking this exon. Examining the SCL35B3 gene in the UCSC genome browser: 1. 2. 3. 4. Open the UCSC genome browser (http://genome.ucsc.edu/) Choose the “Genomes” tab Select the human genome, and assembly 2006. Enter “SLC35B3” into the text box (This box is very versatile, and can receive text terms, as well as coordinates) 5. Click on the link to the RefSeq gene, accession NM_015948. What are the two other RefSeq genes? (two additional alternatively spliced forms) Overview of SLC35B3 – integrating information from different tracks 1. Scroll down to the bottom of the page, and load the following tracks using the “full” option, changing all other tracks status to "hide": Under “Genes and Gene Prediction Tracks” select UCSC Genes, Alt Events, RefSeq genes Under “mRNA and EST Tracks” select Human mRNAs, Spliced ESTs Under “Comparative Genomics” select the 17-Way Cons track Under “Variation and Repeats” select “RepeatMasker”. 2. Classify the 11 exons of the SLC35B3 RefSeq into alternative and constitutive exons, based on evidence from the mRNA and spliced EST track. What types of alternative splicing can you observe among the different exons? Can you find exons that are not supported in the RefSeq annotation? (Hint: you should) 3. Compare your classifications to the annotations in the Alt Events track. Who did a better job? Extracting Sequence of Specific Exon: extracting the sequence of the alternatively spliced exon 5 of SLC35B3 1. Click on the SLC35B3 RefSeq gene (the highlighted one) 2. Note the 100.0% identity indicated. What does it signify? (Hint: it means that when you align the RefSeq sequence against the DNA, there is a perfect match). 3. Click on the link to the mRNA/Genomic alignment 4. This screen contains a representation of the Refseq cDNA, with mismatches in comparison to the genome, and a representation of the genome, with mismatches in comparison to the RefSeq. On the left side of the screen there are links to 11 blocks of consecutive alignments; these blocks correspond to exons. Make sure you understand the color coding (splice junctions, UTR, etc.). 5. Click on “Block 5” (representing exon 5). Exon 5 now appears on the upper part of the screen, in the right pane, in blue capital letters. 6. Select and copy the entire sequence of exon 5, along with 14 nucleotides in the upstream intron and 6 nucleotides in the downstream intron (you’ll also be copying the numbers, but don’t worry). 7. Getting rid of the numbers: Use sequence massager (http://www.attotron.com/cybertory/analysis/seqMassager.htm) to get rid of the numbers, white spaces and line breaks in the sequence of exon 5. You should now have the full sequence of exon 5 (in upper case), along with 14 nucleotides of the upstream intronic sequence and 6 nucleotides of the downstream intronic sequence (in lower case). Save this sequence in a temporary text file, for your convenience. Analysis of Splicing Signals 1. Determining the strength of 3’ and 5’ splice sites of the SLC35B3 exon: Open the webserver in http://ast.bioinfo.tau.ac.il/SpliceSiteFrame.htm, and copypaste the 15-nt 3’ss (the first 15 nt of your sequence) and 9 nt 5’ss (the last 9 nt of your sequence) into the server. 2. Submit and examine the results. Note that you receive two sets of results for the 5’ss, but only one for the 3’ss. What do they represent? 3. Finding regulatory sequences: Open the webserver in http://rulai.cshl.edu/cgibin/tools/ESE3/esefinder.cgi?process=home, which contains data on experimentally identified splicing regulatory sequences. Paste the exon sequence into the server. Note the different thresholds, what do they mean? Using the default thresholds, submit and examine the results. 4. (Optional step, if you’ve got time: Use the server in http://ast.bioinfo.tau.ac.il/ESR.htm to find other types of potential regulatory sequences. Compare the results with those found in step 3). Examining an alternative exon in the UCSC genome browser: 1. Return to the UCSC genome browser main page. Click on Blat. Paste the exonic sequence (without the intronic parts) into the box. Submit. 2. You obtain several possible alignments of your sequence, when aligned against the genome. What is the difference between their score, identity and span? Which measure or combination of measures serves as an indicator for the “best” alignment? 3. Click on the “browser” link of the best alignment (i.e. the first ). The genome browser is now focused on exon 5 of the SLC35B3 gene. 4. Scroll down to the conservation track. How conserved is this exon among 17 mammals? 5. Scroll back up, and click on the “zoom out x10” button. Examine the intronic region flanking the exon. Is it conserved? 6. Examine the “Repeating Elements by RepeatMasker” track. Does the exon overlap with any repetitive elements? Do its flanking introns overlap with such elements? Exercise 2: basic use of Galaxy Goal Finding skipped exons originating from Alu transposable elements on human chromosome 1. Outline of steps Extract coordinates of alternatively skipped exons. Extract coordinates of transposable elements. Intersect between the two. Steps 1. Open the Galaxy web server (http://main.g2.bx.psu.edu/). 2. On the left, click on "Get Data" "UCSC Main table browser". 3. On the right, select genome: "Human", assembly: "Mar. 2006", group: "Genes and Gene Prediction Tracks", track:"Alt Events". Click on "Describe table schema" to view the structure of the Alt Events track. In the upper table, click on the "values" button of the "name" field to view the possible types of annotations for alternatively spliced exons. Which one do you need? (Hint: you are interested in cassetteExon). In the browser window (of Explorer), hit "back" twice to return to the "Get Data" screen in galaxy. 4. Make sure that the "genome" is checked in "region". Create a "filter" by clicking the "create" button. In the "chrom" field enter "does match" and "chr1" (to obtain exons only from chromosome 1). In the "name" field enter "does match" and "cassetteExon". Hit "submit". 5. Make sure that the "BED" format is selected as "output format" and that "Send output to Galaxy" is checked (BED format consists of four columns: chromosome, start, end, and strand). 6. Hit "get output", and on the following screen hit "Send query to Galaxy", leaving the other parameters as they are. 7. You will be returned to the main page of Galaxy. On the right you can see your request being processed. Hit "refresh" every now and then to check if it has finished. 8. Next, we will repeat steps 2-7 to obtain a dataset of all Alu repeating elements on Chr1. A mapping of the Alu elements is found in the "Variation and Repeats" group, in the RepeatMasker track. Using the "Describe table schema" identify how Alus are represented in this track and use it as a filtering option. (Hint: Alu is a name of a family of repeating elements). In addition, ensure that you request only repeating elements on Chr1. (Hint: in this track chromosomes are referred as "GenoName"). 9. Quality Control: click on the eye icon in each of the two processes on the right, to review the results (make sure both requests have ended). Randomly, select one result from each file, and enter the coordinates into the UCSC genome browser to ensure the results match your query. 10. Intersection between the two datasets: Click the "Operate on Genomic Intervals" "Intersect the intervals of two queries". Intersect the exons data with the repeats data (in this order!), using the "Overlapping Intervals" option. Select a minimum overlap threshold ("for at least" option) of 1 (this way even a 1-nt overlap between an Alu element and an alternative exon is sufficient). Hit "Execute". 11. Quality Control: click on the eye of the intersection process to review the results (make sure it has ended). Randomly, select one result and enter the coordinates into the UCSC genome browser to ensure the results are, indeed, of exons that overlap with Alus. Exercise 3: more advanced use of Galaxy Goal Comparing the conservation level of intronic positions flanking alternatively spliced exons to the conservation level in intronic positions flanking constitutive exons, on chromosome 3. Outline of steps Extract the coordinates of all RefSeq exons on chromosome 3. Extract the coordinates of alternatively spliced exons on chromosome 3. Remove all coordinates from A that intersect with B. Extract the coordinates of alternatively skipped exons (cassette exons) of RefSeq genes on chromosome 3. Calculate (and compare) the coordinates of the 100bp in the downstream flanking introns of constitutive and alternatively skipped exons. Steps 1. On the left, click on "Get Data" "UCSC Main table browser". 2. On the right, select genome: "Human", assembly: "Mar. 2006", group: "Genes and Gene Prediction Tracks", track:"RefSeq genes". 3. Use the filter to extract data for the plus (+) strand of chromosome 3 only (see step 4 of ex. 2, and also use the "strand" field). 4. Click on "get output" and on the next screen make sure you mark "Exons plus 0 bases…" before sending the query to Galaxy. 5. Follow step3 of exc2, with necessary adjustments to extract a dataset of all alternatively spliced exons on the plus strand of chromosome 3. 6. In order to create a dataset of constitutive exons you will now remove coordinates from the dataset of all exons that overlap with coordinates of alternative exons. To do that, use "Operate on Genomic Intervals" "Subtract". Select alternative exons as the first dataset, and all RefSeq exons as the second one. Make sure "Intervals with no overlap" is selected, and hit Execute. You have now created a dataset of constitutive exons. 7. You will now create a dataset of cassette exons using the alternative exons dataset you created in step 5. Use "Filter and Sort" "Filter". Select the alternative dataset as input and use "c4 == 'cassetteExon' ". Execute. 8. To extract the coordinates of the 100bp downstream introns of each of the exons in both datasets we will use "Text Manipulation" "Compute". Use the expression "c3+100" to add a new column to each of the two datasets defining the coordinate of the position 100bp downstream to the exon/intron junction (Note: you must repeat this step twice, once for each of the two datasets). Make sure the "Round" option is set to "Yes". 9. You now need to redefine the "start" and "end" attributes so that Galaxy will refer to the interval of the 100bp intronic region rather than to the exon. In order to do this, click on the pencil icon on the right next to the dataset name (between the eye and X icons). Click on "Auto-detect" for Galaxy to identify the newly created column. Then, click again on the pencil icon and select the start column as the previous end column (column 3), and the end column as the newly created one (column 7). Repeat this step for the other dataset. For each of the two datasets, you now have the coordinates of the 100bp flanking introns as the start/end coordinates for further manipulations. 10. To compute the conservation level of these intronic regions for both constitutive and cassette exons, we will now use "Get Genomic Scores" "Aggregate datapoints". Repeat this step for each of the datasets. (The Aggregate database function uses the 17-way conservation track in UCSC, which contains a conservation score for each nucleotide in the genome. Galaxy computes mean, maximum and minimum conservation scores for each of the intervals supplied as input, and outputs this data as three new columns in a new dataset. 11. You are now one step away from accomplishing the mission. In order to calculate and compare the mean conservation level of the intronic region downstream of constitutive and cassette exons, we will now use "Statistics" "Summary Statistics". Use this on both new datasets and enter "c8" (representing column 8, which is the column containing the mean conservation scores of the intervals) as the requested column. What are your conclusions? Compare your insights gained over the last 20 minutes to the ones found by Sorek et al, 2003, Genome Research, "Intronic sequences flanking alternatively spliced exons are conserved between human and mouse".