* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Expander6.3.1_workshop_handouts
Survey
Document related concepts
Transcript
1 Expander v6.3.1 – hands-on session: Launching Expander: Go to C:/ Expander6.3.1/ in this directory double click on Expander_2GB.bat. Before starting please do the following: 1) Create a directory named “analysis” and a directory named “data” in the Expander directory. 2) Copy the files serum.human.txt and stemCells.raw.txt from H:/classes/bioinfo/Expander/data/ into the data directory Part I: Human serum dataset analysis: In this section, we will analyze high-density oligonucleotide array data, which contains expression profiles measured in several time points after serum stimulation of human cell line. In this experiment expression levels were recorded using Affymetrix Human Genome U133 Plus 2.0 arrays. 1.1. Loading the data: We will load the gene expression data matrix and an ID conversion file. The conversion file maps each Affymetrix probe ID in the data file to its corresponding official gene ID (Entrez-Gene IDs). From the “File” menu, select “New Session” ”Expression Data” ”Tabular Data File”. In the “Load Study” dialog box, make sure the Organism is set to “Human”. Use the “Browse” buttons to load the expression data file and the ID conversion file from the following paths: Data: /data/serum.human.txt Conversion: /organisms/human/conversionFiles/ExpanderData/HGU133Plus2_affyids2Geneids.txt Set the “Data type” to “Oligonucleotide array”. Press the “OK” button. Look at the resulting box plots and identify the array that is biased toward the lower intensities. 1.2. Normalizing the data: Normalization of the data is required in order to remove systematic variation, i.e. variation arising from reasons other than biological differences between RNA samples. o From the “Visualizations” menu, select “Scatter Plots”. A dialog box will appear. o In the dialog box, select “t0” as the Base condition and “t2” as the Test condition. The option "simple plot" is selected by default. o Press the “OK” button. o Look at the resulting plot. If data was normalized, points should have been scattered around the y=x line (marked on the scatter plot). If a systematic bias exists in the data, the scatter plot is shifted from the x=y diagonal. o Right click on the bottom of the scatter plot tab ( )על הלשוניתand select “close” From the “Preprocessing” menu, select “Normalization” ”Quantile”. Press “OK” in the “Confirm operation” dialog box. A new data sheet will appear. Look at “Preprocessed Data Box-plot” in the data view (bottom right), and compare it to the “Raw Data Box-plot” (top right). 2 o From the “Visualizations” menu, select “Scatter Plots”. A dialog box will appear. o In the dialog box, select “t0” as the Base condition and “t2” as the Test condition. The option "simple plot" is selected by default. o Press the “OK” button. o Look at the resulting plot (Data Scatter 1.2). Points should now be scattered around the y=x line o Right click on the bottom of the scatter plot tab ( )על הלשוניתand select “close” 1.3. Filtering the data: From the “Preprocessing” menu, select “Filter Probes” ”Fold Change”. Set the fold-change factor to 3, leave the 'base condition' as "t0"; This will select all the probes whose expression levels were changed (up- or down-regulated) by at least a factor of 3 with respect to their expression level in the untreated sample (t0). Press “Ok”. A message box indicating the number of probes that will remain appears (750 probes)-> Click ”Yes”. 1.4. Standardizing the expression patterns: o In order to bring all expression levels to the same scale, we will apply standardization to the data. o From the “Preprocessing” menu, select “Standardization” ”Mean 0 and Variance 1”. o Look at the Y-axis values in the “Preprocessed Data Box-plots” at the right side of the data sheet (demonstrating the change in data scale after standardization). o Select the “Preprocessed Expression Matrix” tab at the right side of the data sheet to view the standardized expression values. 1.5. Clustering the data: The goal of clustering is to partition the genes into distinct sets such that genes that share common expression patterns are assigned to the same cluster, while genes assigned to different clusters should have "non-similar" expression patterns. Here we will use the CLICK algorithm to perform the clustering. From the “Unsupervised grouping” menu, select Clustering CLICK. Press “Ok” in the “CLICK” dialog box. After running the “CLICK” algorithm, the results will be displayed in a new tab. For each cluster you can see the mean pattern with 1 STD err bars as well as size and homogeneity Note the number of clusters detected as well as the average homogeneity and separation values presented at the top left pane. Click on the pattern chart of cluster #4. The genes assigned to this cluster will appear in the list on the right pane. Click on the “Gene Symbol” column header of the list to sort it according to the gene symbols. Scroll down the list to get an impression of its content. Click on a row in the list and look at the displayed pattern on the bottom (repeat this for a few probes). Select the “Cluster Matrix” tab to view the expression matrix of cluster #4. Repeat the previous steps for another cluster of your choice. 3 In the “File” menu select “Export to Text”. In the dialog box browse into the analysis directory and save the result as “Click1.1.txt”. Open the file analysis/Click1.1.txt to view its content using a text editor (note that this file can be loaded as a clustering solution using Unsupervised GroupingClusteringLoad Solution From the “Grouping” menu, select Clustering K-Means. Input the number of clusters detected by click as the expected number of clusters and press “Ok”. Browse through the K-means results. Compare values of average homogeneity and separation to the ones obtained by CLICK. o Repeat k-means run using a different number of clusters of your choice. o Browse through the K-means results. Compare values of average homogeneity and separation to the ones obtained by CLICK and previous K-Means runs. 1.6. Performing hierarchical clustering on the data: From the “Unsupervised Grouping” menu, select Hierarchical ClusteringCluster. In the dialog box, select “Average” as linkage type. Press the “OK” button. Look at the resulting display. Use the “Zoom in” and "Zoom out" buttons from tool bar to change zoom, and get a closer or more general view of the expression patterns arranged in the hierarchical clustering order. From the “Unsupervised Grouping” menu, select Hierarchical ClusteringGenerate Groups By distance. Type the number 0.4 in the distance threshold field and press OK. Browse through the resulting clustering solution. Check the mean homogeneity and separation values in the upper left panel. Click on one of the clusters in the display to open the corresponding cluster display on the right. 1.7. Saving the session: From the file menu select Save Session. /Analysis/SerumSession.zip and click OK. 1.8. the dialog box put the path In the dialog box put the path Loading the session: From the file menu select Load Session. /Analysis/SerumSession.zip and click OK. 1.9. In Identifying enriched GO categories within clusters: From the “Enrichment analysis” menu, select Functional Analysis TANGO. In the “Functional Analysis” dialog box select “CLICK 1.1” as the clustering on which analysis is to be performed. Press the “OK” button. Look at the resulting display. In which sets (clusters) has functional enrichment been detected? Open the “Enrichment Table” tab, press on the “Raw p-value” column title, to sort the results according to their statistical significance. Click on one of the columns in the enrichment diagram of cluster 2. The “Enrichment Info” dialog box will appear. In the “Enrichment info” dialog box, click on one of the Gene IDs in the table. An internet browser will open, containing information about the corresponding gene, from the Enterz-Gene site. Go back to the CLICK 1.1 solution tab. Open the content of the clusters for which a GO enrichment was identified by clicking on the corresponding row in the clusters table. The panel on the right now contains an additional tab with the title: CLICK 1.1 GO Enrich.1, which contains the GO enrichment diagram and information corresponding to the active cluster. 4 1.10 Identifying enriched pathways within clusters: From the “Enrichment Analysis” menu, select Pathway Analysis KEGG…. In the “Pathway Analysis” dialog box select “CLICK1.1” as the clustering solution on which the analysis will be performed, and set the p-value threshold to 0.05 with Bonferroni correction. Press the “OK” button. Look at the resulting display (diagrams and enrichment table). Which pathway enrichments have been detected? Click on one of the columns in the enrichment diagram. The “Enrichment Info” dialog box will appear. Click on the “Show genes in KEGG pathways” link – a web browser will open with the corresponding pathway map on which the set of enrichment inducing genes will be highlighted. 1.11 Identifying enriched promoter signals within clusters: From the “Enrichment Analysis” menu, select Promoter Analysis PRIMA In the “Promoter Analysis” dialog box select “CLICK1.1” as the clustering solution on which the analysis will be performed, and set the p-value threshold to 0.05 with Bonferroni correction. Press the “OK” button. Look at the resulting display (diagrams and enrichment table). Which transcription factor binding site enrichments have been detected? Click on one of the columns in the enrichment diagram. The “Enrichment Info” dialog box will appear. Click on the “View Binding Sites” tool button ( promoters ) to view binding site positions on the Part II: Human stem cells analysis: In this section, we will analyze a dataset published by Mueller et al. (Nature 2008) in which highdensity Illumina oligonucleotide array data was used to measure expression profiles from different cell lines, including embryonic stem cells (ESCs), neural stem cells (NSCs), differetiated cells and two placenta cancers." 2.1. Loading the data: From the “File” menu, select “New Session” ”Expression Data” “Tabular Data File”. In the “Load Study” dialog box, make sure the Organism is set to “Human”. Use the “Browse” buttons to load the expression data file from the following directory: Data: /data/ stemCells.raw.txt Select the option “Use probe Ids as gene Ids”. Make sure that the “Data type” is set to “Absolute Intensities”. Press the “Advanced” button. In the “Advanced” Dialog box select “Autofill symbols” and press “OK”. Press “OK” in the ”Load Study” dialog box. 5 2.2. Preprocessing the data: From the “Preprocessing” menu, select “Normalization” “Quantile”. From the “Preprocessing” menu, select “Filter Probes” “Variation Filter”. Set the number of genes parameter to 1000, and press “OK”. From the “Preprocessing” menu, select “Log Data”. 2.3. Using t-test statistics to detect differential expression: From the “Supervised Grouping” menu, select “Differential Expression” “t-Test”. o In the “Requested type of change” combo-box select “Differential”. o Press the “Select” button for the “Group 1 conditions”. o In the “Select Conditions” dialog box, check (mark) all groups (check boxes) except for the last one (“Cancer”) and press “OK”. o Press the “Select” button for the “Group 2 conditions”. o In the “Select Conditions” dialog box, check (mark) the “Cancer” check box’ and press “OK”. o Press “Ok”. Look at the resulting groups on the left. Click on the pattern chart of the down-regulated group. A table will appear on the right pane. It contains the relevant t-test results. Click on the p-value column title to sort the probes according to their p-values. Click on the fold-change column title, results will be sorted according to their fold-change. Select the “Expression Matrix” tab to view the expression matrix of the down-regulated probes. Optional: Repeat the previous steps for the up-regulated probes. From the “Supervised Grouping” menu, select “Differential Expression” “Save Solution”. Save the results to “results/ttest1.1.txt” Open the “results” directory and browse through it to see the content of this output file. 2.4. Operating the Matisse algorithm to detect modules (connected sub-networks whose components exhibit coherent expression profiles): From the “Unsupervised Grouping” menu, select “Network” “Matisse”. In the network file path select the file /organisms/human/interactions/ Expander.hsa.IntAct.sif. In the Matisse dialog box, press “OK”. After running the “Matisse” algorithm, the results will be displayed in a new tab. Click on the pattern chart of module #1. The genes assigned to this module will appear in the list on the right pane. Note that the back nodes (i.e. genes that do not appear in the filtered expression data, but are part of the module) are marked in pink. Click on the “Interactions” tab on the right to view the module interaction graph. Press on the tool bar button for automatic layout of the graph. Note that the back nodes (i.e. nodes that correspond to genes that do not appear in the filtered expression data, but are part of the module) are marked in pink. Select the “Expression Matrix” tab to view the expression matrix of module #1. 6 Select the “Positions” tab to view the chromosomal positions of the genes in the module. Optional: Repeat the previous steps for another module of your choice. 2.5. Saving the session From the file menu select Save Session. /Analysis/StemCellsSession.zip and click OK. 2.6. the dialog box put the path box put the path Loading the session From the file menu select Load Session. /Analysis/StemCellsSession.zip and click OK. 2.7. In In the dialog Identifying enriched chromosomal locations within the differential groups: From the “Enrichment Analysis” menu, select “Location Analysis” “Detect Enrichment”. Note that in the “Location Analysis” dialog box the “t-test 1.1” is selected as the target set on which analysis is to be performed. Uncheck the “Chromosomes” check box (leave only ”Arms” and “Bands” checked), press the “OK” button. Look at the two significantly enriched locations. Search for the corresponding colors at the top left panel. Select the “Diagrams” tab on the right and click on one of the corresponding columns. The “Enrichment Info” dialog box will appear. Press on the tool bar button (appears on top of the "Diagram" tab) to view chromosomal locations select the group in which the most highly enriched location was detected. Note that the genes in the enriched locations are colored in the same color assigned to that location. Close the chromosomal locations view. 2.8. Identifying enriched miRNA targets within the modules: From the “Enrichment Analysis” menu, select “miRNA Analysis” “FAME”. In the “miRNA Analysis” dialog box select “Matisse 2.1” as the grouping solution on which analysis is to be performed. Set the number of iterations to 500 (in practice should be at least 1000 for stability) and press the “OK” button. The analysis might take a few minutes. Look at the resulting display (diagrams and enrichment table). Which miRNA has been detected as enriched in the modules generated by Matisse? Click on one of the columns in the enrichment diagram. The “Enrichment Info” dialog box will appear. In the “Enrichment info” dialog box search for the raw p-value. 2.9. Identifying enriched KEGG pathways within the modules: 7 From the “Enrichment analysis” menu, select “Pathway Analysis” “KEGG…”. Make sure that in the “Pathway Analysis” dialog box “Matisse 2.1” is selected as the grouping solution on which analysis is to be performed. Press the “OK” button. Look at the resulting display (diagrams and enrichment table). Which pathway enrichments have been detected? Click on one of the columns in the enrichment diagram. The “Enrichment Info” dialog box will appear. 2.10. Browsing results for module 1: Go back to “Matisse 2.1” tab. Select “Module 1” from the list on the left. The entire set of results produced for module 1 will appear on the right. Browse through the results. Open the “Expression Matrix” tab to view the columns added on the left for each enrichment that has been detected. References 1. Müller FJ, Laurent LC, Kostka D, Ulitsky I, Williams R, Lu C, Park IH, Rao MS, Shamir R, Schwartz PH, Schmidt NO, Loring JF. Regulatory networks define phenotypic classes of human stem cell lines. Nature. 2008 Sep 18;455(7211):401-5. Epub 2008 Aug 24. s