Download Expander6.3.1_workshop_handouts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantitative comparative linguistics wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Transcript
1
Expander v6.3.1 – hands-on session:
Launching Expander:
Go to C:/ Expander6.3.1/ in this directory double click on Expander_2GB.bat.
Before starting please do the following:
1) Create a directory named “analysis” and a directory named “data” in the Expander directory.
2) Copy the files serum.human.txt and stemCells.raw.txt from H:/classes/bioinfo/Expander/data/
into the data directory
Part I: Human serum dataset analysis:
In this section, we will analyze high-density oligonucleotide array data, which contains expression
profiles measured in several time points after serum stimulation of human cell line. In this
experiment expression levels were recorded using Affymetrix Human Genome U133 Plus 2.0
arrays.
1.1. Loading the data:
We will load the gene expression data matrix and an ID conversion file. The conversion file maps
each Affymetrix probe ID in the data file to its corresponding official gene ID (Entrez-Gene IDs).
 From the “File” menu, select “New Session”  ”Expression Data”  ”Tabular Data File”.
 In the “Load Study” dialog box, make sure the Organism is set to “Human”. Use the “Browse”
buttons to load the expression data file and the ID conversion file from the following paths:
Data: /data/serum.human.txt
Conversion: /organisms/human/conversionFiles/ExpanderData/HGU133Plus2_affyids2Geneids.txt
 Set the “Data type” to “Oligonucleotide array”.
 Press the “OK” button.
 Look at the resulting box plots and identify the array that is biased toward the lower intensities.
1.2.
Normalizing the data:
Normalization of the data is required in order to remove systematic variation, i.e. variation arising
from reasons other than biological differences between RNA samples.
o From the “Visualizations” menu, select “Scatter Plots”. A dialog box will appear.
o In the dialog box, select “t0” as the Base condition and “t2” as the Test condition. The
option "simple plot" is selected by default.
o Press the “OK” button.
o Look at the resulting plot. If data was normalized, points should have been scattered around the
y=x line (marked on the scatter plot). If a systematic bias exists in the data, the scatter plot is
shifted from the x=y diagonal.
o Right click on the bottom of the scatter plot tab (‫ )על הלשונית‬and select “close”
 From the “Preprocessing” menu, select “Normalization”  ”Quantile”.
 Press “OK” in the “Confirm operation” dialog box. A new data sheet will appear.
 Look at “Preprocessed Data Box-plot” in the data view (bottom right), and compare it to the
“Raw Data Box-plot” (top right).
2
o From the “Visualizations” menu, select “Scatter Plots”. A dialog box will appear.
o In the dialog box, select “t0” as the Base condition and “t2” as the Test condition. The option
"simple plot" is selected by default.
o Press the “OK” button.
o Look at the resulting plot (Data Scatter 1.2). Points should now be scattered around the y=x
line
o Right click on the bottom of the scatter plot tab (‫ )על הלשונית‬and select “close”
1.3.
Filtering the data:
 From the “Preprocessing” menu, select “Filter Probes”  ”Fold Change”.
 Set the fold-change factor to 3, leave the 'base condition' as "t0"; This will select all the probes
whose expression levels were changed (up- or down-regulated) by at least a factor of 3 with
respect to their expression level in the untreated sample (t0). Press “Ok”.
 A message box indicating the number of probes that will remain appears (750 probes)-> Click
”Yes”.
1.4.
Standardizing the expression patterns:
o In order to bring all expression levels to the same scale, we will apply standardization to the
data.
o From the “Preprocessing” menu, select “Standardization”  ”Mean 0 and Variance 1”.
o Look at the Y-axis values in the “Preprocessed Data Box-plots” at the right side of the data
sheet (demonstrating the change in data scale after standardization).
o Select the “Preprocessed Expression Matrix” tab at the right side of the data sheet to view the
standardized expression values.
1.5.
Clustering the data:
The goal of clustering is to partition the genes into distinct sets such that genes that share common
expression patterns are assigned to the same cluster, while genes assigned to different clusters
should have "non-similar" expression patterns. Here we will use the CLICK algorithm to perform
the clustering.
 From the “Unsupervised grouping” menu, select Clustering  CLICK. Press “Ok” in the
“CLICK” dialog box.
 After running the “CLICK” algorithm, the results will be displayed in a new tab. For each
cluster you can see the mean pattern with 1 STD err bars as well as size and homogeneity
 Note the number of clusters detected as well as the average homogeneity and separation values
presented at the top left pane.
 Click on the pattern chart of cluster #4. The genes assigned to this cluster will appear in the list
on the right pane.
 Click on the “Gene Symbol” column header of the list to sort it according to the gene symbols.
Scroll down the list to get an impression of its content.
 Click on a row in the list and look at the displayed pattern on the bottom (repeat this for a few
probes).
 Select the “Cluster Matrix” tab to view the expression matrix of cluster #4.
 Repeat the previous steps for another cluster of your choice.
3
 In the “File” menu select “Export to Text”. In the dialog box browse into the analysis directory
and save the result as “Click1.1.txt”.
 Open the file analysis/Click1.1.txt to view its content using a text editor (note that this file can
be loaded as a clustering solution using Unsupervised GroupingClusteringLoad Solution
 From the “Grouping” menu, select Clustering  K-Means. Input the number of clusters
detected by click as the expected number of clusters and press “Ok”.
 Browse through the K-means results. Compare values of average homogeneity and separation
to the ones obtained by CLICK.
o Repeat k-means run using a different number of clusters of your choice.
o Browse through the K-means results. Compare values of average homogeneity and separation
to the ones obtained by CLICK and previous K-Means runs.
1.6. Performing hierarchical clustering on the data:
From the “Unsupervised Grouping” menu, select Hierarchical ClusteringCluster.
In the dialog box, select “Average” as linkage type. Press the “OK” button.
Look at the resulting display.
Use the “Zoom in” and "Zoom out" buttons from tool bar to change zoom, and get a closer or
more general view of the expression patterns arranged in the hierarchical clustering order.
 From the “Unsupervised Grouping” menu, select Hierarchical ClusteringGenerate
Groups By distance. Type the number 0.4 in the distance threshold field and press OK.
 Browse through the resulting clustering solution. Check the mean homogeneity and separation
values in the upper left panel.
 Click on one of the clusters in the display to open the corresponding cluster display on the
right.




1.7.
Saving the session:
From the file menu select Save Session.
/Analysis/SerumSession.zip and click OK.
1.8.
the
dialog
box
put
the
path
In
the
dialog
box
put
the
path
Loading the session:
From the file menu select Load Session.
/Analysis/SerumSession.zip and click OK.
1.9.
In
Identifying enriched GO categories within clusters:
 From the “Enrichment analysis” menu, select Functional Analysis  TANGO. In the
“Functional Analysis” dialog box select “CLICK 1.1” as the clustering on which analysis is to
be performed. Press the “OK” button.
 Look at the resulting display. In which sets (clusters) has functional enrichment been detected?
 Open the “Enrichment Table” tab, press on the “Raw p-value” column title, to sort the results
according to their statistical significance.
 Click on one of the columns in the enrichment diagram of cluster 2. The “Enrichment Info”
dialog box will appear.
 In the “Enrichment info” dialog box, click on one of the Gene IDs in the table. An internet
browser will open, containing information about the corresponding gene, from the Enterz-Gene
site.
 Go back to the CLICK 1.1 solution tab. Open the content of the clusters for which a GO
enrichment was identified by clicking on the corresponding row in the clusters table. The panel
on the right now contains an additional tab with the title: CLICK 1.1 GO Enrich.1, which
contains the GO enrichment diagram and information corresponding to the active cluster.
4
1.10
Identifying enriched pathways within clusters:
 From the “Enrichment Analysis” menu, select Pathway Analysis  KEGG….
 In the “Pathway Analysis” dialog box select “CLICK1.1” as the clustering solution on which
the analysis will be performed, and set the p-value threshold to 0.05 with Bonferroni
correction. Press the “OK” button.
 Look at the resulting display (diagrams and enrichment table). Which pathway enrichments
have been detected?
 Click on one of the columns in the enrichment diagram. The “Enrichment Info” dialog box will
appear. Click on the “Show genes in KEGG pathways” link – a web browser will open with the
corresponding pathway map on which the set of enrichment inducing genes will be
highlighted.
1.11
Identifying enriched promoter signals within clusters:
 From the “Enrichment Analysis” menu, select Promoter Analysis  PRIMA
 In the “Promoter Analysis” dialog box select “CLICK1.1” as the clustering solution on which
the analysis will be performed, and set the p-value threshold to 0.05 with Bonferroni
correction. Press the “OK” button.
 Look at the resulting display (diagrams and enrichment table). Which transcription factor
binding site enrichments have been detected?
 Click on one of the columns in the enrichment diagram. The “Enrichment Info” dialog box will
appear.
 Click on the “View Binding Sites” tool button (
promoters
) to view binding site positions on the
Part II: Human stem cells analysis:
In this section, we will analyze a dataset published by Mueller et al. (Nature 2008) in which highdensity Illumina oligonucleotide array data was used to measure expression profiles from different
cell lines, including embryonic stem cells (ESCs), neural stem cells (NSCs), differetiated cells and
two placenta cancers."
2.1. Loading the data:
 From the “File” menu, select “New Session”  ”Expression Data”  “Tabular Data File”.
 In the “Load Study” dialog box, make sure the Organism is set to “Human”. Use the “Browse”
buttons to load the expression data file from the following directory:
 Data: /data/ stemCells.raw.txt
 Select the option “Use probe Ids as gene Ids”.
 Make sure that the “Data type” is set to “Absolute Intensities”.
 Press the “Advanced” button.
 In the “Advanced” Dialog box select “Autofill symbols” and press “OK”.

Press “OK” in the ”Load Study” dialog box.
5
2.2. Preprocessing the data:
 From the “Preprocessing” menu, select “Normalization”  “Quantile”.
 From the “Preprocessing” menu, select “Filter Probes”  “Variation Filter”.
Set the number of genes parameter to 1000, and press “OK”.
 From the “Preprocessing” menu, select “Log Data”.
2.3. Using t-test statistics to detect differential expression:










From the “Supervised Grouping” menu, select “Differential
Expression”  “t-Test”.
o In the “Requested type of change” combo-box select “Differential”.
o Press the “Select” button for the “Group 1 conditions”.
o In the “Select Conditions” dialog box, check (mark) all groups (check boxes) except for
the last one (“Cancer”) and press “OK”.
o Press the “Select” button for the “Group 2 conditions”.
o In the “Select Conditions” dialog box, check (mark) the “Cancer” check box’ and press
“OK”.
o Press “Ok”.
Look at the resulting groups on the left.
Click on the pattern chart of the down-regulated group. A table will appear on the right pane.
It contains the relevant t-test results.
Click on the p-value column title to sort the probes according to their p-values.
Click on the fold-change column title, results will be sorted according to their fold-change.
Select the “Expression Matrix” tab to view the expression matrix of the down-regulated
probes.
Optional: Repeat the previous steps for the up-regulated probes.
From the “Supervised Grouping” menu, select “Differential
Expression”  “Save Solution”.
Save the results to “results/ttest1.1.txt”
Open the “results” directory and browse through it to see the content of this output file.
2.4. Operating the Matisse algorithm to detect modules (connected
sub-networks whose components exhibit coherent expression
profiles):





From the “Unsupervised Grouping” menu, select “Network”  “Matisse”.
In the network file path select the file /organisms/human/interactions/ Expander.hsa.IntAct.sif.
In the Matisse dialog box, press “OK”.
After running the “Matisse” algorithm, the results will be displayed in a new tab.
Click on the pattern chart of module #1. The genes assigned to this module will appear in the
list on the right pane. Note that the back nodes (i.e. genes that do not appear in the filtered
expression data, but are part of the module) are marked in pink.
 Click on the “Interactions” tab on the right to view the module interaction graph. Press on the
tool bar button for automatic layout of the graph. Note that the back nodes (i.e. nodes that
correspond to genes that do not appear in the filtered expression data, but are part of the
module) are marked in pink.
 Select the “Expression Matrix” tab to view the expression matrix of module #1.
6
 Select the “Positions” tab to view the chromosomal positions of the genes in the module.
 Optional: Repeat the previous steps for another module of your choice.
2.5.
Saving the session
From the file menu select Save Session.
/Analysis/StemCellsSession.zip and click OK.
2.6.
the
dialog
box
put
the
path
box
put
the
path
Loading the session
From the file menu select Load Session.
/Analysis/StemCellsSession.zip and click OK.
2.7.
In
In
the
dialog
Identifying enriched chromosomal locations within the
differential groups:
 From the “Enrichment Analysis” menu, select “Location Analysis”  “Detect Enrichment”.
Note that in the “Location Analysis” dialog box the “t-test 1.1” is selected as the target set on
which analysis is to be performed. Uncheck the “Chromosomes” check box (leave only
”Arms” and “Bands” checked), press the “OK” button.
 Look at the two significantly enriched locations. Search for the corresponding colors at the top
left panel.
 Select the “Diagrams” tab on the right and click on one of the corresponding columns. The
“Enrichment Info” dialog box will appear.
Press on the tool bar button (appears on top of the "Diagram" tab) to view chromosomal locations
select the group in which the most highly enriched location was detected. Note that the genes in the
enriched locations are colored in the same color assigned to that location. Close the chromosomal
locations view.
2.8. Identifying enriched miRNA targets within the modules:
 From the “Enrichment Analysis” menu, select “miRNA Analysis”  “FAME”. In the
“miRNA Analysis” dialog box select “Matisse 2.1” as the grouping solution on which analysis
is to be performed. Set the number of iterations to 500 (in practice should be at least 1000 for
stability) and press the “OK” button.
 The analysis might take a few minutes.
 Look at the resulting display (diagrams and enrichment table). Which miRNA has been
detected as enriched in the modules generated by Matisse?
 Click on one of the columns in the enrichment diagram. The “Enrichment Info” dialog box will
appear.
 In the “Enrichment info” dialog box search for the raw p-value.
2.9. Identifying enriched KEGG pathways within the modules:
7
 From the “Enrichment analysis” menu, select “Pathway Analysis”  “KEGG…”. Make sure
that in the “Pathway Analysis” dialog box “Matisse 2.1” is selected as the grouping solution on
which analysis is to be performed. Press the “OK” button.
 Look at the resulting display (diagrams and enrichment table). Which pathway enrichments
have been detected?
 Click on one of the columns in the enrichment diagram. The “Enrichment Info” dialog box will
appear.
2.10. Browsing results for module 1:
 Go back to “Matisse 2.1” tab.
 Select “Module 1” from the list on the left. The entire set of results produced for module 1 will
appear on the right. Browse through the results.
 Open the “Expression Matrix” tab to view the columns added on the left for each enrichment
that has been detected.
References
1.
Müller FJ, Laurent LC, Kostka D, Ulitsky I, Williams R, Lu C, Park IH, Rao MS, Shamir R,
Schwartz PH, Schmidt NO, Loring JF. Regulatory networks define phenotypic classes of human stem
cell lines. Nature. 2008 Sep 18;455(7211):401-5. Epub 2008 Aug 24.
s