Download Affymetrix Data Mining Tool manual

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
®
Affymetrix
Data
Mining
Tool
User’s Guide
Version 3.0
For Research Use Only.
Not for use in diagnostic procedures.
Affymetrix Confidential
700233 Rev. 3
Trademarks
™,
™, HuSNP™,
Affymetrix®, GeneChip®, EASI™,
,
™
™
™
™
™
™
™
GenFlex , Jaguar , MicroDB , 417 , 418 , 427 , 428 ,
Pin-and-Ring™, Flying Objective™, NetAffx™ and CustomExpress™ are
trademarks owned or used by Affymetrix, Inc.
Microsoft® is a registered trademark of Microsoft Corporation.
Oracle® is a registered trademark of Oracle Corporation.
Limited License
PROBE ARRAYS, INSTRUMENTS, SOFTWARE AND REAGENTS ARE
LICENSED FOR RESEARCH USE ONLY AND NOT FOR USE IN
DIAGNOSTIC PROCEDURES. NO RIGHT TO MAKE, HAVE MADE, OFFER
TO SELL, SELL, OR IMPORT OLIGONUCLEOTIDE PROBE ARRAYS OR
ANY OTHER PRODUCT IN WHICH AFFYMETRIX HAS PATENT RIGHTS
IS CONVEYED BY THE SALE OF PROBE ARRAYS, INSTRUMENTS,
SOFTWARE, OR REAGENTS HEREUNDER. THIS LIMITED LICENSE
PERMITS ONLY THE USE OF THE PARTICULAR PRODUCT(S) THAT THE
USER HAS PURCHASED FROM AFFYMETRIX.
Patents
Software products may be covered by one or more of the following patents:
U.S. Patent Nos. 5,733,729; 5,795,716; 5,974,164; 6,066,454; 6,090,555;
6,185,561 and 6,188,783; and other U.S. or foreign patents.
Copyright
©1999, 2001 Affymetrix, Inc. All rights reserved.
Contents
CHAPTER 1
Welcome
Data Mining Tool User’s Guide
3
3
What’s New in DMT 3.0
3
Conventions Used
4
On-line Documentation
5
Technical Support
6
Your Feedback is Welcome
6
CHAPTER 2
Installing Data Mining Tool 3.0
Before You Begin
9
9
®
9
Oracle LIMS Users
9
MicroDB™ Users
9
Microsoft SQL Server LIMS Users
®
Installing Data Mining Tool
10
Creating an Oracle® Alias
17
Oracle 8.1.7 Alias Configuration
CHAPTER 3
Affymetrix® Data Mining Tool Overview
Access Data
17
25
25
Affymetrix Publish Database
25
Affymetrix® Analysis Data Model
26
i
ii
Co n t e nt s
DMT Windowpanes
Query Data
27
30
Building and Running a Query
Viewing Query Results
30
33
Tables
33
Graphs
38
Analyze Query Results
43
Statistical Analyses
43
Cluster Analysis
43
Matrix Analysis
46
CHAPTER 4
Getting Started
49
Starting DMT
49
Managing Database Connections
50
Registering a Database
51
Unregistering a Database
52
Selecting a Database
53
Specifying the Default Directory
CHAPTER 5
Building and Running a Query
Building a Query
54
59
59
Starting a New Query
59
Specifying the Filters
61
Query Builder
68
Selecting Analyses for the Query
70
Specifying Analysis Filters
70
Running a Query
79
Affymetrix® Data Mining Tool User’s Guide
Normalizing GeneChip® Signal Data
79
Choosing Normalization Before a Query or Pivot
80
Choosing Normalization After a Query or Pivot
81
Normalization Options
81
CHAPTER 6
Managing Queries
Saving a Query
87
87
Using the Save As Command
88
Opening a Previously Saved Query
89
Deleting a Query
90
CHAPTER 7
Query Results Tables
Experiment Information Table
GeneChip®
Data Mode
Spot Data Mode
93
93
94
95
Query Table
96
Pivot Data Table
97
Selecting Results for the Pivot Table
99
Running the Pivot Operation
101
Including Probe Descriptions in the Pivot Table
102
Including Annotations in the Pivot Table
102
Sorting Pivot Table Columns
103
Pivot Options
104
Working with Tables
106
Finding Probes
106
Viewing Descriptions & Obtaining Further Gene Information
107
Annotating Probes
108
Adding Probes to the Filter Grid
109
iii
iv
Co n t e nt s
Copying Tables
110
Exporting Data
111
Expanding the Results Pane
111
Clearing the Results Pane
112
CHAPTER 8
Annotations
115
Annotating Probes
115
Loading Annotations
116
Querying Annotations
118
Adding Probes to the Filter Grid
121
Deleting Annotations
122
CHAPTER 9
Probe Lists
Creating Probe Lists
127
127
Creating a Probe List from the Query or Pivot Table
128
Creating a Probe List from Cluster Analysis
130
Creating a Probe List from Search Array Descriptions
131
Creating a Probe List from Filter
132
Creating a Probe List by Combining Existing Lists
132
Loading a Probe List
134
Specifying Probe List Members
134
Specifying an Input File
135
Using Probe Lists
137
Adding a Probe List to the Filter Grid
137
Displaying Selected Probe List Members
138
Managing Probe Lists
140
Viewing and Editing Probe List Members
140
Combining Probe Lists
142
Affymetrix® Data Mining Tool User’s Guide
Exporting a Probe List
143
Deleting a Probe List
144
CHAPTER 10
Array Sets
149
Creating an Array Set
149
Working with Array Sets
151
Viewing Array Sets
151
Managing Array Sets
152
Editing an Array Set
152
Deleting an Array Set
153
CHAPTER 11
Graphing Results
Scatter Graph
157
158
Plotting the Scatter Graph
158
Working with the Scatter Graph
161
Scatter Graph Options
168
Fold Change Graph
171
Plotting the Fold Change Graph
173
Working with the Fold Change Graph
176
Fold Change Graph Options
183
Series Graph
185
Plotting the Series Graph
186
Working with the Series Graph
188
Series Graph Options
191
Histogram
193
Plotting the Histogram
193
Working with the Histogram
195
Histogram Options
199
v
vi
Co n t e nt s
Other Graphing Features
202
Enlarging the Graph Pane
202
Changing Graph Colors
202
Copying and Clearing Graphs
204
Printing Graphs
204
CHAPTER 12
Statistical Analyses
209
Selecting an Operator
209
Average, Median, Standard Deviation or Inter-Quartile Range
210
Fold Change
212
T-Test
214
Mann-Whitney Test
216
Count & Percentage
218
CHAPTER 13
Matrix Analysis
Overview
223
Population Size
224
Running a Matrix Analysis
CHAPTER 14
223
Cluster Analysis
Self Organizing Map (SOM) Algorithm
225
231
231
Running a SOM Cluster Analysis
232
Saving a Probe List
237
SOM Filters
238
SOM Parameters
239
Affymetrix® Data Mining Tool User’s Guide
Correlation Coefficient Clustering Algorithm
240
Running the Correlation Coefficient Cluster
241
Correlation Coefficient Clustering Options
244
Effect of Changing Algorithm Parameters
246
Saving and Importing Seed Patterns
248
Saving a Probe List
CHAPTER 15
251
DMT Tutorial
Introduction
255
255
Step 1: Restoring the MicroDB™ Database
256
Step 2: Starting DMT
256
Step 3: Registering the Database
256
Step 4: Selecting the Tutorial Database
258
Step 5: Opening the DMT Session
258
Lesson 1: Identifying Highly Expressed Genes
259
Step 1: Specifying a Filter
259
Step 2: Selecting Analyses for the Query
260
Step 3: Pivoting on Signal & Detection Call
260
Step 4: Querying and Pivoting the Data
262
Step 5: Sorting the Pivot Table by Signal
263
Step 6: Saving a Probe List
263
Step 7: Plotting the Series Line Graph
264
Lesson 1 Summary
268
Suggested Exercise
269
Lesson 2: Calculating Averages of Replicates
270
Step 1: Specifying a Probe List for the Filter
270
Step 2: Selecting Analyses for the Query
272
Step 3: Pivoting on Signal
273
Step 4: Query and Pivot the Data
274
Step 5: Selecting Average & Standard Deviation Operators
276
Step 6: Sorting the Pivot Table
279
vii
viii
Co n t e nt s
Step 7: Displaying Probe Set Descriptions
280
Lesson 2 Summary
281
Suggested Exercise
281
Lesson 3: Summarizing Qualitative Data
282
Step 1: Pivoting on Detection Call
282
Step 2: Performing Count & Percentage Analysis
284
Step 3: Sorting Pivot Table Results
286
Step 4: Saving a Probe List
287
Step 5: Annotating Probe List Members
287
Lesson 3 Summary
288
Suggested Exercise
288
Lesson 4: Evaluating Difference Between Two Tissues
289
Step 1: Pivoting on Signal
290
Step 2: Mann-Whitney Test
292
Step 3: Annotating Probe Sets
295
Step 4: Saving a Probe List
295
Lesson 4 Summary
296
Suggested Exercise
296
Lesson 5: Evaluating Change Call Consistency
297
Step 1: Clearing the Filter Grid & Selecting Comparison Analyses
299
Step 2: Pivoting on Difference Call
300
Step 3: Comparison Ranking
301
Step 4: Annotating Probe Sets
303
Step 5: Saving a Probe List
304
Lesson 5 Summary
304
Suggested Exercise
304
Lesson 6: Self Organizing Map (SOM) Cluster Analysis
305
Step 1: Clearing the Filter Grid & Selecting Analyses
306
Step 2: Pivoting on Signal
307
Step 3: Computing Average Signal
308
Step 4: SOM Cluster Analysis
310
Affymetrix® Data Mining Tool User’s Guide
Step 5: Saving & Annotating a Probe List
318
Lesson 6 Summary
318
AP PE N D I X A
Filter Grid
GeneChip Data Mode
323
323
Statistical Expression Algorithm
323
Empirical Expression Algorithm
324
Spot Data Mode
AP PE N D I X B
330
Working with Windows & Tables
Query Windowpanes
333
333
Expanding a Windowpane
333
Resizing a Windowpane
333
Clearing the Results or Graph Pane
334
Tables
334
Selecting the Entire Table
334
Selecting Rows
334
Resizing Columns
335
Hiding Columns
335
Reordering Columns
336
AP PE N D I X C
Query Table Data
GeneChip® Data Mode
339
339
Statistical Expression Algorithm Metrics
339
Empirical Expression Algorithm Metrics
340
Spot Data Mode
346
ix
x
Co n t e nt s
DMT Algorithms
AP PE N D I X D
The SOM Algorithm
349
349
Neighborhood
351
Learning Rate
352
The Correlation Coefficient Clustering Algorithm
353
The Matrix Algorithm
354
AP PE N D I X E
Toolbars & Shortcuts
359
DMT Main Toolbar
359
Session Toolbar
360
Shortcut Descriptions
361
1
Chapter 1
Welcome
1
Welcome to the Affymetrix® Data Mining Tool (DMT) User’s Guide.
The DMT filters, queries and analyzes publish databases of
GeneChip® or spotted array expression data.
Data Mining Tool User’s
Guide
This manual explains how to use DMT to:
■
Build a query.
■
Display the query results in table or graph format.
■
Evaluate and compare replicate data using statistical analyses.
■
■
Calculate the overlap significance between two lists of GeneChip® probe
sets or spot probes.
Apply cluster analysis to experimental results to help identify gene
expression patterns.
This manual also includes a tutorial that demonstrates; 1) a data mining
strategy to identify genes that significantly change expression level, 2)
statistical analyses of replicate data, and 3) cluster analysis.
What’s New in DMT 3.0
Compatible with Microarray Suite Statistical or Empirical Expression
Algorithm
DMT can query and analyze experimental results generated by the Statistical
Expression algorithm (in Microarray Suite 5.0) as well as the Empirical
Expression algorithm (in versions of Microarray Suite prior to 5.0).
The filter includes both Statistical and Empirical metrics so that a query may
specify (in “OR” fashion) both types of metrics.
3
4
CH A P T E R 1
Welcome
Publish Database Security
Each publish database requires a login password to prevent unauthorized
database access.
Conventions Used
This manual provides a detailed outline for all tasks associated with
Affymetrix® Data Mining Tool. Various conventions are used throughout the
manual to help illustrate the procedures described. Explanations of these
conventions are provided below.
Steps
Instructions for procedures are written in a step format. Immediately
following the step number is the action to be performed. On the line below
the step there may be the following symbol: ⇒. This symbol defines the
system response or consequence as a result of user action; what you see and
what has happened that you may not see.
Following the response additional information pertaining to the step may be
found and is presented in paragraph format.
For example:
9.
Click Yes to continue.
⇒ The Delete task proceeds.
In the lower right pane the status is displayed.
To view more information pertaining to the delete task, right-click
Delete and select View Task Log from the shortcut menu.
Font Styles
Bold fonts indicate names of commands, buttons, options or titles within a
dialog box. When asked to enter specific information, such input appears in
italics within the procedure being outlined.
For example:
1.
To select another server, enter the server name in the Oracle Alias box.
2.
Enter DMT_3_Tutorial in the Publish Database box, then click
Register.
⇒ The tutorial database is available to DMT.
Affymetrix® Data Mining Tool User’s Guide
Screen Captures
The steps outlining procedures are frequently supplemented with screen
captures to further illustrate the instructions given.
The screen captures depicted in this manual may not exactly match
the windows displayed on your screen.
Additional Comments
Throughout the manual, text and procedures are occasionally accompanied
by special notes. These additional comments are and their meanings are
described below.
Information presented in tips provide helpful advice or shortcuts for
completing a task.
The Note format presents important information pertaining to the text
or procedure being outlined.
Caution notes advise you that the consequence(s) of an action may be
irreversible and/or result in lost data.
Warnings alert you to situations where physical harm to person or
damage to hardware is possible.
On-line Documentation
The CD with DMT includes an electronic version of this user’s guide. The
on-line documentation is in Adobe Acrobat format (a *.pdf file) and is
readable with the Adobe Acrobat® Reader software, available at no charge
from Adobe at http://www.adobe.com.
The electronic user’s guide is printable, searchable and fully indexed. You
can have it open and minimized on screen while using the DMT software.
5
6
CH A P T E R 1
Welcome
Technical Support
Affymetrix provides technical support to all licensed users via phone or
e-mail. To contact Affymetrix Technical Support:
Affymetrix Inc.
3380 Central Expressway
Santa Clara, CA 95051
USA
Tel: 1-888-362-2447 (1-888-DNA-CHIP)
Fax: 1-408-731-5441
E-mail: [email protected]
Affymetrix UK Ltd.,
Voyager, Mercury Park,
Wycombe Lane, Wooburn Green,
High Wycombe HP10 0HH
United Kingdom
Tel: +44 (0) 1628 552550
Fax: +44 (0) 1628 552585
E-mail: [email protected]
www.affymetrix.com
Your Feedback is Welcome
Affymetrix Technical Publications is dedicated to continually improving the
quality of our documentation and helping you get the information that you
need. We welcome any comments or suggestions you may have regarding
this manual. Please contact us at:
[email protected]
2
Chapter 2
Installing Data Mining Tool 3.0
2
Installing Data Mining Tool 3.0 will uninstall any previous version of
DMYT. You will no longer be able to use your previous version of
DMT after installing Data Mining Tool 3.0.
Before You Begin
This section guides you through the installation of Data Mining Tool 3.0.
Listed below is an overview of the steps needed to complete the installation.
Microsoft® SQL Server LIMS Users
1.
Obtain the name of the LIMS Server from your IT personnel if not
known (this is needed during installation).
2.
Install Data Mining Tool 3.0.
Oracle® LIMS Users
1.
Install Oracle Client Utilities on the workstation (Oracle Client Utilities
must be the same version installed on the LIMS Server).
2.
Install SQL* Loader (for better performance).
3.
4.
Create an Oracle Alias. (Refer to the section Creating an Oracle® Alias,
on page 17.)
Install Data Mining Tool 3.0.
MicroDB™ Users
Install Data Mining Tool 3.0.
9
10
CH A P T E R 2
Installing Data Mining Tool 3.0
Installing Data Mining Tool
The following are detailed instructions for installing DMT. Please note that
the screen captures depicted in this section may not exactly match the
windows displayed on your screen.
You must be logged in as administrator to install the DMT 3.0
software.
The screen captures depicted in this manual may not exactly match
the windows displayed on your screen.
1.
Log in as an administrator.
2.
Insert the Affymetrix® DMT 3.0 CD-ROM.
3.
If the autorun feature does not start the program:
a.
Click Start → Run.
b.
Type <cd drive letter>:\setup.exe.
Click OK.
⇒ The Affymetrix Software Setup window appears.
c.
Affymetrix® Data Mining Tool User’s Guide
4.
Click DMT 3.0 Setup.
⇒ The Welcome window appears (Figure 2.1).
Figure 2.1
Welcome window
5.
Click Next.
11
12
CH A P T E R 2
Installing Data Mining Tool 3.0
6.
Several consecutive Software License Agreement windows appear.
Review the contents in each and click Yes to accept the terms of the
agreement.
⇒ The Customer Information window appears (Figure 2.2).
Figure 2.2
Customer Information window
7.
Enter your Name, Company and Serial Number.
The serial number is located on the Affymetrix® Software Product
Registration card.
If you do not have a serial number, contact Affymetrix Technical
Support. If you are upgrading from a previous version, the Serial
Number field populates automatically.
8.
Click Next.
⇒ The Choose Destination Location window appears (Figure 2.3).
Affymetrix® Data Mining Tool User’s Guide
Figure 2.3
Choose Destination Location window
9.
Select the destination where Data Mining Tool will be installed.
13
14
CH A P T E R 2
Installing Data Mining Tool 3.0
10.
Click Next.
⇒ The Select Database Compatibility window appears (Figure 2.4).
Figure 2.4
Select Database Compatibility window
11.
12.
Select the type of database that DMT will connect with.
■
Affymetrix® LIMS - if connecting to a LIMS Server.
■
Affymetrix® MicroDB - if connecting to a local publish database
using MicroDB™.
Click Next.
⇒ If connecting to a LIMS server, the Select Database Type window
appears (Figure 2.5).
If using MicroDB™ go to step 16.
Affymetrix® Data Mining Tool User’s Guide
Figure 2.5
Select Database Type window
13.
Select the type of database used on the LIMS server, either SQL Server
or Oracle.
If you do not know the type of database you are using with the LIMS
Server, please contact your IT personnel or DBA.
15
16
CH A P T E R 2
Installing Data Mining Tool 3.0
14.
Click Next.
⇒ The Enter Information window appears (Figure 2.6).
Figure 2.6
Enter Information windows for the SQL Server database (left) or the Oracle® database (right)
15.
In the Enter Information window complete one of the following;
■
If SQL Server is selected, enter the SQL Server Name (usually the
name of the LIMS Server).
■
If Oracle® is selected, enter the Oracle Alias Name.
16.
Click Next.
⇒ Database connectivity is verified and the Start Copying Files
window appears.
17.
In the Start Copying Files window, verify the information and click
Next.
⇒ Program files are copied and the system configures the registry. The
Setup Complete window appears.
⇒ For Oracle systems: If a warning message regarding SQL Loader
appears, continue the DMT install until complete. Then, install SQL
Loader (part of Oracle) for better DMT performance. After SQL
Loader is installed, re-install DMT.
18.
Select Yes, I want to restart my computer now and click Finish.
Affymetrix® Data Mining Tool User’s Guide
Creating an Oracle® Alias
To create an Oracle alias, use the Net8 Assistant. The following steps guide
you through creating an alias.
Oracle 8.1.7 Alias Configuration
1.
Start → Programs → <Oracle directory> → Network
Administration → Net8 Assistant.
⇒ Oracle Net8 Assistant window appears (Figure 2.7).
Figure 2.7
Oracle® Net8 Assistant window
2.
Expand Local.
17
18
CH A P T E R 2
Installing Data Mining Tool 3.0
3.
Highlight Service Naming, then from the menu bar click Edit →
Create.
⇒ The Net Service Name Wizard Welcome window appears
(Figure 2.8).
Figure 2.8
Net Service Name Welcome window
4.
Enter the Net Service Name (which is the alias name).
The name must be the same name as the local LIMS server.
If creating a remote publish server alias, Host Name must be the same
as the computer name of the remote publish server.
5.
Click Next.
⇒ The Networking Protocol window appears (Figure 2.9).
Affymetrix® Data Mining Tool User’s Guide
Figure 2.9
Networking Protocol window
6.
Select TCP/IP (Internet Protocol).
7.
Click Next.
⇒ The Host Name window appears (Figure 2.10).
Figure 2.10
Host Name window
The Host Name is the name of the local LIMS Server.
19
20
CH A P T E R 2
Installing Data Mining Tool 3.0
The Port Number is left as the default value 1521, unless it has been
changed.
If creating a remote publish server alias, the Host Name must be the
name of the remote publish server.
Click Next.
⇒ The Database SID window appears (Figure 2.11).
8.
Figure 2.11
Database SID window
Select (Oracle8i) Service Name option. Enter the name of the Oracle
database instance on the local LIMS server.
9.
If creating a remote publish database server alias, the Database SID
name should be the instance created on the remote publish server.
10.
Click Next.
⇒ The Test Service window appears (Figure 2.12).
Affymetrix® Data Mining Tool User’s Guide
Figure 2.12
Test Service window
11.
Click Test... to test the alias created.
⇒ A Connection Test Information window appears (Figure 2.13).
Figure 2.13
Connection Test Information window
21
22
CH A P T E R 2
Installing Data Mining Tool 3.0
12.
If the connection was successful go to step 13.
If the connection was unsuccessful, follow the instructions below.
a.
If the test fails, click Change Login....
Figure 2.14
Change Login window
b.
Enter Username and Password, then click OK.
c.
Repeat step 11.
13.
Click Close.
14.
Click Finish.
15.
Repeat the above steps to create and test the second alias if using remote
publish database server.
16.
Save the configuration settings.
If your test was unsuccessful, verify that your listener is listening for
your alias.
3
Chapter 3
Affymetrix® Data Mining Tool Overview
3
Affymetrix® Data Mining Tool (DMT) provides a flexible and intuitive
query interface to a large data warehouse of published expression
databases and helps you sift through hundreds or thousands of
experimental results.
This chapter provides an overview of DMT and how it interacts with
publish databases. It explains the steps involved in running a query
and the options available to you for viewing and analyzing results.
Access Data
DMT operates in GeneChip® data or spot data mode. It enables you to
access, query and analyze data found in a publish database populated with
Affymetrix GeneChip® probe array expression analysis results (*.chp) or
spotted probe array intensity results (*.spt).
The data mode and location of the publish database determine the DMT
features available.
Affymetrix Publish Database
An Affymetrix publish database is created by an Affymetrix publishing
application (see Table 3.1, on page 26). These applications import or publish
analysis data (*.chp or *.spt) to a publish database located on the LIMS
server or a local workstation (MicroDB™).
Published data are available to DMT or other third party analysis tools, as
well as database management tools such as Microsoft Access® 2000.
25
26
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
Table 3.1
Affymetrix® publishing applications
Publishing
Application
Data Published
Publish Database
Location
Affymetrix® LIMS
GeneChip® probe array expression
analysis data (*.exp, *.cel, *.chp)
LIMS server
Affymetrix® MicroDB™
GeneChip® probe array expression
analysis data (*.chp)
Local workstation
Affymetrix® MicroDB™
Affymetrix® Jaguar™ spotted array
intensity data (*.spt)
Local workstation
You can also use DMT to query other appropriately formatted
databases populated with Affymetrix® GeneChip® expression analysis
results (*.chp) or spotted probe array intensity results (*.spt).
Affymetrix® Analysis Data Model
DMT is compatible with any Affymetrix® Analysis Data Model (AADM)
compliant database populated with Affymetrix GeneChip® probe array
expression analysis results (*.chp) or AADM-derived database populated
with spotted probe array intensity results (*.spt). AADM is available at
www.affymetrix.com.
Affymetrix® Data Mining Tool User’s Guide
DMT Windowpanes
The DMT session appears when you start a new query or previously saved
query. DMT has four different panes for filtering and displaying expression
data (Figure 3.1, Figure 3.2). The panes are:
Filter grid
Enables you to specify the filters and the limits the
data must meet to be returned by the query.
Data tree
Displays analyses, array sets and probe lists. You can
select analyses or array sets from the data tree for the
query.
Graph pane
Displays graphs (scatter, fold change, series, or
histogram graph) and cluster analysis results.
Results pane
Displays the experiment information, query and pivot
tables.
Use the filter grid and data tree to specify query conditions. The graph and
results panes display query and analysis results.
27
28
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
Figure 3.1
DMT display in GeneChip® data mode
Affymetrix® Data Mining Tool User’s Guide
Figure 3.2
DMT display in spot data model
29
30
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
Query Data
A query searches a publish database to find vital experimental and
expression data. User-defined filters specify the search criteria. The query
returns only those records that meet the criteria or limits specified by the
query filters.
Building and Running a Query
To build a query:
■
Specify the filter conditions that the expression data must satisfy (Table 3.2
lists the type of filters for the various DMT data modes).
■
Select the analyses for the query from the data tree (Figure 3.1, Figure 3.2).
Analysis filters are available in GeneChip LIMS data mode. You can
filter the current database so that the data tree displays only the
analyses that meet user-specified criteria.
Affymetrix® Data Mining Tool User’s Guide
Table 3.2
DMT Filters
DMT Data
Mode
Publishing
Application
GeneChip®
probe array
Spotted
probe array
Analysis Filtersa
Expression Filtersb
Affymetrix®
LIMS
Sample template
Experiment template
Attribute
Attribute value
Sample project
Probe Array
Sample type
Operator
Sample name
Absolute or comparison
expression metrics
(See Appendix A)
MicroDB™
Not available
Absolute or comparison
expression metrics
(See Appendix A)
MicroDB™
Not available
Intensity result
Standard deviation intensity
Pixel intensity
Background
Standard deviation
Background
Ratio
(See Appendix A)
a. In GeneChip LIMS mode, analysis filters interrogate the publish database and determine the
analyses displayed in the data tree.
b. Filters interrogate the analyses selected in the data tree.
31
32
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
You can specify analysis filters in the Filter Analysis dialog box (LIMS data
mode only) (Figure 3.3) that interrogate the current database and determine the
analyses displayed in the data tree.
To filter the analyses, select View → Analysis Filters from the menu bar.
Figure 3.3
Filter Analysis dialog box (available when connected to a publish database on the LIMS server in GeneChip® data mode)
Affymetrix® Data Mining Tool User’s Guide
Viewing Query Results
You can view the data retrieved from the database in both tables and graphs.
This section describes the various types of tables and graphs available in
DMT.
Tables
The results pane (Figure 3.1) contains three tables:
■
Experiment Information table
■
Query table
■
Pivot table
The query and pivot tables provide two different views of expression data.
Experiment Information Table
The experiment information table contains information about analyses or
array sets selected in the data tree.
In GeneChip® data mode, the experiment information table (Figure 3.4)
displays:
■
User-specified information such as project and experiment name.
■
Information automatically captured by Affymetrix® LIMS during
hybridization, scanning and analysis of GeneChip probe arrays including
experiment template parameters.
■
Values for user-modifiable expression algorithm parameters (used to
calculate the expression metrics).
In spot data mode, the experiment information table (Figure 3.5) displays:
■
Probe array and operator name.
■
Parameters associated with the analysis.
33
34
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
To populate the experiment information table:
■
Select analyses or array sets in the data tree.
■
Click the Info toolbar button
Information.
, or select Query → Experiment
Figure 3.4
Experiment information table, GeneChip® data mode
Affymetrix® Data Mining Tool User’s Guide
Figure 3.5
Experiment information table, spot data mode
Query Table
The query table displays the expression data for probes (probe sets or spot
probes) that met the query criteria (Figure 3.6 and Figure 3.7). Each row
displays the probe name, analysis and expression data (for example, signal
and detection) for every analysis.
If the query results include the same probe from different analyses, the query
table displays a separate row for each probe/analysis pair. For example, if a
query returned the same probe set from four different analyses, the query
table would display four rows of results for the same probe set (one row per
analysis).
35
36
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
To populate the query table:
■
Select Analyses Or Array Sets In The Data Tree.
■
Specify filters (optional).
■
Click the Query button
bar.
Figure 3.6
Query table, GeneChip® data mode
Figure 3.7
Query table, spot data mode
or select Query → Run Query from the menu
Affymetrix® Data Mining Tool User’s Guide
Pivot Table
When a query returns the same probe from several different analyses, it is
often more convenient to view the probe data (probe set or spot probe) from
each analysis side by side in the same row.
DMT can retrieve analyses from the database and organize the data in the
pivot table so that all analysis results for the same probe are displayed in one
row (Figure 3.8 and Figure 3.9). In the pivot table, the column headers display
the analysis names; the columns display the expression data.
The pivot table columns are available for graphing, statistical analysis, or
cluster analysis.
To populate the pivot table:
■
Select analyses or array sets in the data tree.
■
Specify filters (optional).
■
Click the Run Pivot button
menu bar.
Figure 3.8
Pivot table, GeneChip® data mode
or select Query → Retrieve Data from the
37
38
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
Figure 3.9
Pivot table, spot data mode
Graphs
DMT can display the pivot table data in graphical formats. The types of
graphs available include:
■
Scatter graph
■
Fold Change graph
■
Series graph
■
Histogram graph
Each type of graph is displayed in a separate tab of the graph pane (Figure 3.1
and Figure 3.2).
The graphing functions are only available for the analyses displayed
in the pivot table.
Affymetrix® Data Mining Tool User’s Guide
Scatter Graph
The scatter graph plots multiple pairs of user-specified numeric columns
from the pivot table using a traditional scatter plot (Figure 3.10). Each point
represents a probe (probe set or spot probe) common to both columns in the
comparison. A point is defined by the intersection of the value on the x and y
axes for the common probe.
Figure 3.10
Scatter graph, GeneChip® data mode
39
40
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
Fold Change Graph
The fold change graph (Figure 3.11) compares multiple pairs of user-specified
numeric pivot table columns (base and comparison columns). It displays a
scatter plot of the fold change of the comparison column compared to the
base column. (See Appendix A for the fold change calculation.)
Each point represents a probe (probe set or spot probe) that is common to the
base and comparison columns. The y-axis coordinate is the average fold
change for all of the base-comparison pairs that contain the probe. The xaxis coordinate is the average of the comparison column value for all of the
comparison analyses that contain the probe.
Figure 3.11
Fold change graph, GeneChip® data mode
Series Graph
The series graph plots any numeric pivot table column in a line or bar graph
format (Figure 3.12, Figure 3.13). The series graph is a useful way to monitor
gene expression across different experiments or over a time course.
Affymetrix® Data Mining Tool User’s Guide
Figure 3.12
Series line graph, GeneChip® data mode
Figure 3.13
Series bar graph, GeneChip® data mode
41
42
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
Histogram
The histogram plots a frequency distribution of any numeric pivot table
column (Figure 3.14). The histogram sorts the metric values into groups or bins
(x-axis coordinate) and plots the number of probes (probe sets or spot
probes) in each bin (y-axis coordinate).
For example, a histogram of probe set expression signal values can help
evaluate the proportion of genes expressed at different levels.
Figure 3.14
Histogram, expression signal data
Affymetrix® Data Mining Tool User’s Guide
Analyze Query Results
You can apply statistical and cluster analyses to the results displayed in the
pivot table. This section describes the various types of statistical and cluster
analysis available in DMT.
Statistical Analyses
DMT can apply the following statistical analyses to numeric pivot table
columns:
■
Average
■
Median
■
Standard deviation
■
Inter-Quartile range
■
Fold change
■
T-Test
■
Mann-Whitney test
■
Count & Percentage
The pivot table displays the resulting data in new columns that are available
for graphing (scatter, fold change, or series graph), clustering, or further
statistical analysis.
Cluster Analysis
Cluster analysis finds expression profiles that have similar shapes. DMT
provides two different algorithms, the Self Organizing Maps (SOM) and
Correlation Coefficient algorithms, for finding those clusters.
Self Organizing Map Algorithm
The self organizing map (SOM) algorithm is designed to identify patterns in
expression signals. However any numeric pivot table column may be
selected for cluster analysis.
The algorithm represents the selected data of probe sets in n experiments as
points in k-dimensional space. Initially, the algorithm randomly maps a grid
of nodes in space, then iteratively adjusts the node positions toward
collections of points until the nodes reflect clusters of probe sets with similar
43
44
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
expression patterns. (See Appendix D for more information about the SOM
algorithm.)
shows the patterns and probe set members of clusters found by the
SOM algorithm.
Figure 3.15
Figure 3.15
SOM cluster results
Affymetrix® Data Mining Tool User’s Guide
Correlation Coefficient Algorithm
The correlation coefficient algorithm uses a nearest neighbor approach to
find groups of probe sets with similar pattern. The average pattern of a group
defines a cluster seed. Probe sets whose patterns are closely matched to the
seed pattern are assigned to the seed’s cluster.
Figure 3.16
Correlation coefficient cluster results
45
46
CH A P T E R 3
Affymetrix® Data Mining Tool Overview
Matrix Analysis
Matrix analysis enables you to compare probe lists and determine the
overlap between two lists (Figure 3.17).
The matrix algorithm computes the probability (P-value) that the observed
overlap is expected due to random chance. The algorithm converts the
P-value to an overlap significance value that is displayed in the matrix. The
overlap significance value = -logP, and may range from near zero to a large
number. Appendix D provides further information on the Matrix algorithm.
The matrix highlights values that exceed the overlap significance threshold
(pink) and values that exceed the non-overlap significance threshold
(yellow).
Figure 3.17
Matrix displays the overlap significance values for two probe lists
4
Chapter 4
Getting Started
4
This chapter provides step by step instructions for completing the
basic tasks that are necessary to start and run Affymetrix® Data
Mining Tool (DMT).
Starting DMT
1.
Click the Windows Start button
, then select Start → Programs
→ Affymetrix → Data Mining Tool.
⇒ The Publish Database Login dialog box appears (Figure 4.1).
This dialog does not appear in MicroDB mode.
Figure 4.1
Publish Database login for LIMS mode
49
50
CH A P T E R 4
Getting Started
2.
Enter the password for the publish database and click Login.
⇒ The main window appears (Figure 4.2).
Figure 4.2
DMT main window, Database02 selected
In the DMT main window, you can:
■
Register or unregister a publish database.
■
Select a database for the query.
■
Start a new DMT session.
■
Open or delete a previously saved query.
Managing Database
Connections
DMT connects with databases created using the Affymetrix® LIMS or
Affymetrix® MicroDB applications (or other appropriately formatted
databases). The tasks involved with managing these database connections
include registering a database, selecting a database for use with DMT, or
unregistering a database.
Affymetrix® Data Mining Tool User’s Guide
Registering a Database
You must register a publish database to make it available to DMT. To
register a database, use the appropriate procedure outlined below that is
suited to your particular system.
Publish Database on Windows Workstation (MicroDB™ System)
1.
Select Edit → Register Database from the menu bar.
⇒ The Register Database dialog box appears (Figure 4.3).
Figure 4.3
Register Database dialog box, publish database on Windows NT workstation
2.
Select a database from the Publish Database drop-down list, then click
Register.
⇒ The publish database is now available to DMT.
Publish Database on LIMS Server (Affymetrix® LIMS)
1.
Select Edit → Register Database from the menu bar.
⇒ The Register Database dialog box appears (Figure 4.4).
The Publish Database box contains a list of publish databases on the
server.
Figure 4.4
Register Database dialog box, publish database on LIMS server
2.
Enter the SQL server or Oracle Alias name in the Server Name box.
51
52
CH A P T E R 4
Getting Started
3.
Click List Databases to display the publish databases for the server in
the Publish Database drop-down list.
4.
Select a database from the Publish Database drop-down list.
5.
Click Register.
⇒ The Publish Database login dialog box appears (Figure 4.5).
Figure 4.5
Publish Database Login
6.
Enter the database password and click Login.
⇒ The database is available to DMT.
Unregistering a Database
Unregistering a database removes it from the lists of available, or registered,
databases which may be queried.
1.
Select Edit → Unregister Database from the menu bar.
⇒ The Unregister Database dialog box appears (Figure 4.6).
Figure 4.6
Unregister Database dialog box
Affymetrix® Data Mining Tool User’s Guide
Select a database, then click Unregister.
⇒ The database is no longer available to DMT.
2.
Selecting a Database
DMT connects to a single publish database at a time. By default, DMT
connects to the most recently registered database.
Select the database of interest before opening a DMT session.
1.
2.
Close all the DMT sessions and return to the main DMT window
(Figure 4.2).
Select Edit → Select Database from the menu bar, then select a
database.
⇒ The Publish Database Login dialog box appears (Figure 4.7).
Figure 4.7
Publish Database Login
3.
Enter the password, then click Login.
⇒ The status bar at the bottom of the main window displays the
current database name (Figure 4.2).
If the status bar is not displayed, select View → Status Bar from the
menu bar.
53
54
CH A P T E R 4
Getting Started
Specifying the Default
Directory
You must specify a default directory that identifies the location of files for
import (for example, when loading probe lists or annotations) or export
when the data export option is selected.
1.
Open a DMT session.
2.
Click the Options button .
Alternatively, select View → Options from the menu bar.
⇒ The Data Mining Options dialog box appears (Figure 4.8).
Figure 4.8
Data Mining Options dialog box, Default Directory tab
3.
Click the Default Directory tab.
4.
Click the Browse button .
⇒ The Browse for Folder dialog box appears (Figure 4.9).
Affymetrix® Data Mining Tool User’s Guide
Figure 4.9
Browse for Folder dialog box
5.
Locate the default directory, then click OK.
55
56
CH A P T E R 4
Getting Started
5
Chapter 5
Building and Running a Query
5
A query is the key to obtaining interesting data for subsequent
analysis using Affymetrix® Data Mining Tool. This chapter explains
how to define a query (specify the conditions that the data must meet
to be retrieved from the database) and select analyses for the query
from the current database.
Building a Query
The three main steps for building a query are:
■
Open a DMT session.
■
Specify the filters.
■
Select analyses or array sets for the query.
Affymetrix® Data Mining Tool operates in GeneChip® data mode or spot
data mode. The data mode and location of the publish database (LIMS server
or Windows NT workstation) determine the DMT features available.
Starting a New Query
To start a new query in GeneChip® data mode, select Data → New →
GeneChip Mining from the menu bar.
⇒ A new DMT session starts (Figure 5.1).
To start a new query in spot data mode, select Data → New → Spotted
Array Mining from the menu bar.
⇒ A new DMT session starts (Figure 5.2).
You can open more than one DMT session at a time. Select
Window → Cascade, or Window → Tile from the menu bar to
organize the open windows.
59
60
CH A P T E R 5
Building and Running a Query
Figure 5.1
DMT session, GeneChip® data mode (graph pane not displayed until a graph or cluster result is generated)
Figure 5.2
DMT session, spot data mode (graph pane not displayed until a graph or cluster result is generated)
Affymetrix® Data Mining Tool User’s Guide
DMT Session Components
Session toolbar
Provides access to additional functions specific to the
DMT session. See Appendix E for detailed toolbar
information.
Filter grid
Provides a flexible interface for selecting expression
metrics for filtering and entering the limits the data
must meet to be returned by the query.
Data tree
Displays the analyses in the current publish database.
When the database is on the LIMS server, the Filter
Analysis dialog box can be used to filter the analyses
displayed in the data tree. The data tree also displays
array sets and probe lists. Select the analyses for the
query from the data tree.
Results pane
Displays the experiment information, query, and pivot
tables that contain information about the analyses or
array sets selected in the data tree, and query results.
Graph pane
Displays graphs or cluster analysis results. This pane
is not displayed until a graph or cluster analysis is
generated.
Specifying the Filters
The filter grid (Figure 5.3, Figure 5.4) enables you to select expression metrics
for filtering and specify the limits that the data must meet to be returned by
the query.
Figure 5.3
Filter grid, GeneChip® data mode
61
62
CH A P T E R 5
Building and Running a Query
Figure 5.4
Filter grid, spot data mode
Filter Grid Components
Column headers
Displays the probe set or spot probe name and
expression metrics available for the filter.
GeneChip® data mode: Any absolute or comparison
expression analysis metric generated by the Statistical
Expression algorithm or the Empirical Expression
algorithm (in versions of Microarray Suite lower than
5.0). (See the Affymetrix Microarray Suite User’s
Guide for more information about the expression
algorithms and metrics.)
Spot data mode: Intensity, intensity standard
deviation, pixel count, background, background
standard deviation, ratio.
Sort
Specifies a sort order (ascending, descending, or none)
in the query table for a results column.
Note: This sort specification does not affect the pivot
table. To sort a pivot table column, right-click the
column header and select a sort option from the
shortcut menu.
Line 1 through n
Accommodates the entries that specify metric limits.
Limits entered in two or more cells of the same row
are combined in AND fashion (intersection). Limits
entered in subsequent rows are combined in OR
fashion (union).
Affymetrix® Data Mining Tool User’s Guide
Entering Limits
1.
Double-click the cell of interest in Line 1 of the grid (not the Sort row).
⇒ The blinking cursor in the cell indicates DMT is ready to accept
typed input (Figure 5.5).
Table 5.1 and Table 5.2 describe query operators and statements and
provide example limits.
Figure 5.5
Filter grid, GeneChip® data mode
If you double-click a cell in the last row of the filter grid, DMT
automatically adds another row to the grid.
2.
Enter the limit, then do one of the following:
■
Double-click the next cell where you want to enter a limit.
■
Press the ENTER key to complete the entry and move the cursor to
the grid cell below in line 2.
■
Press the TAB key to complete the entry and move the cursor to the
right to the next cell in the row.
Limits may be entered in all columns and many rows of the filter grid.
Limits in two or more cells in the same row are logically connected with
an AND (intersection) statement. Limits entered in subsequent rows are
logically connected with an OR (union) statement.
Enter limits for Statistical algorithm metrics and Empirical algorithm
metrics on separate lines in the filter grid.
Figure 5.6 shows an example filter that specifies probe sets with a signal
greater than 400 AND detection p-value < 0.1.
63
64
CH A P T E R 5
Building and Running a Query
Figure 5.6
Filter grid, GeneChip® data mode
Use Probe Lists to quickly add a group of associated probes to the
filter. Right-click the cell in the Probe Set Name column, and select
Probe List from the shortcut menu. See page 137 for more
information.
Entering Multiple Limits in a Single Cell
Limits containing AND (intersection) or OR (union) operators may be
entered in a single cell.
For example, the limit in Figure 5.7 defines the range between 500 and 1000
(the intersection of the range > 500 and the range < 1000). The query returns
probe sets where: 500 < signal < 1000. Probe sets with signal < 500 or >
1000 are not returned.
Figure 5.7
Filter grid, GeneChip® data mode
DMT automatically adds a blank row to the bottom of the grid to
accommodate another OR entry. The last row of the grid may remain blank
with no effect on the query.
Affymetrix® Data Mining Tool User’s Guide
Editing Limits
Double-click the cell to highlight the entire limit, then do one of the
following:
■
Enter a new limit (overwrites the old limit).
■
Right-click the mouse and make a selection from the shortcut menu
of edit commands.
■
Use the mouse to select part of the limit, then enter new text.
An Oracle® database is case sensitive.
Specifying a Sort Order for the Query Table
You may specify a sort order (ascending, descending, or not sorted) for the
query table.
1.
In the filter grid, click the cell in the Sort row (first row) for the metric
you want to sort (for example, Signal in Figure 5.8).
⇒ An arrow button appears.
Figure 5.8
Filter grid, GeneChip® data mode
2.
Click the arrow, and select a sort order from the drop-down list that
appears (Figure 5.9).
Figure 5.9
Filter grid, sort options (GeneChip® data mode)
3.
Repeat steps 1 - 2 for additional metrics you want to sort.
65
66
CH A P T E R 5
Building and Running a Query
If a sort order is specified for two or more metrics, the sort is prioritized
from left to right. For example, the limits in Figure 5.10 sort the query
results first by descending signal, then by ascending detection p-value.
Figure 5.10
Filter grid, multiple column sort (GeneChip® data mode)
Table 5.1
Query operators and example query statements
Comparison
Operators
=
Definition
Equal (number or character field)
Example Limit
=3
=’P’
Returns the Record for
Metric Data...
Equal to 3
Called present
>
Greater than
>5
Greater than 5
<
Less than
<20
Less than 20
>=
Greater than or equal to
>=6
Greater than or equal to 6
<=
Less than or equal to
<=19
Less than or equal to 19
!=
Not equal to (number or character
field)
!=25
Not equal to 25
Returns the Record for
Metric Data...
Ranges
Definition
Example Limit
BETWEEN
Returns records with the metric
value between the user-specified
limits
BETWEEN
2 AND 5
NOT BETWEEN
Returns records where the metric
value is not between the userspecified limits
NOT BETWEEN 1 and Not between 1 and 1.5
1.5
Between 2 and 5
Affymetrix® Data Mining Tool User’s Guide
Table 5.1
Query operators and example query statements
Lists
Definition
Example Limit
Returns the Record for
Metric Data...
IN
Returns records that match any one
of the values in the list
IN (‘cre’, ‘bioB’)
cre or bioB
NOT IN
Returns records that do not match
any one of the values in the list
NOT IN (‘cre’, ‘biobB’) Not cre or biobB
LIKE
Searches character fields such as
probe name and returns records
that match the pattern in the LIKE
statement
LIKE ‘cre’
cre
LIKE ‘cr_’
cr followed by any single character
(the underscore symbol (_) is the
wild card for a single character)
LIKE ‘cr%’
cr followed by any string of zero or
more characters (the % symbol is
the wild card for any string of zero or
more characters)
NOT LIKE ‘cr%’
Not cr followed by any string of zero
or more characters (the % symbol is
the wild card for any string of zero or
more characters)
Example
Statement
Returns the Record for
Metric Data...
NOT LIKE
Local
Operators &
Complex
Statements
Searches character fields such as
probe name and returns records
that do not match the pattern in the
NOT LIKE statement
Definition
AND
Connects two conditions and only
returns results when both
conditions are true
>5 AND <6
Greater than 5 and less than 6
OR
Connects two conditions and
returns results when either
condition is true
<5 OR >9
Less than 5 or greater than 9
NOT
Negates a condition when
combined with various operators.
For example, NOT LIKE, NOT IN
NOT < 5000
Not less than 5000
()
Used to force the order of
evaluation of two or more
combined conditions
(>5 AND <10) OR
(>200 AND < 500)
Greater than 5 and less than 10 or
greater than 200 and less than 500
67
68
CH A P T E R 5
Building and Running a Query
Table 5.2
Expression call search strings (GeneChip® data mode)
Absolute Call
Limit
Present
=’P’
Marginal
=’M’
Absent
=’A’
No call
=’No Call’
Difference Call
Limit
Increased
=’I’
Marginally increased
=’MI’
No change
=’NC’
Marginally decreased
=’MD’
Decreased
=’D’
An Oracle database is case-sensitive. Use upper case letters to specify
the call, except for ‘No Call’.
Query Builder
The Query Builder helps you input complex limits in the filter grid without
prior knowledge of correct syntax for operators such as BETWEEN and
LIKE. You need only specify text or numbers where appropriate. The Query
Builder inserts the logical operators and syntactically correct limit into the
user-specified cell of the filter grid.
Entering Limits
1.
Right-click the cell of interest in the filter grid (do not click the Sort
row).
2.
Select Show Query Builder from the shortcut menu that appears.
⇒ The Build Filter dialog box appears for the chosen type of result
(Figure 5.11).
Affymetrix® Data Mining Tool User’s Guide
Figure 5.11
Build Avg Diff Filter dialog box
3.
Click an operator or statement button.
See Table 5.1, on page 66 for information on operators and statements.
4.
Enter appropriate text to complete the limit.
Lower case text in the query builder is a place holder that must be
replaced with your input. A text search string must contain single
quotation marks (for example, LIKE ‘YDR154C/’).
5.
Click OK or press the ENTER key to place the limit in the filter grid.
Editing Limits
1.
Click Undo in the Build Filter dialog box.
⇒ The last entry is deleted.
2.
Alternatively, select the text you want to edit, then make a new entry or
right-click to open a shortcut menu of edit commands.
The BACKSPACE, DELETE, and arrow keys are supported during
editing in the Build Filter dialog box.
69
70
CH A P T E R 5
Building and Running a Query
Selecting Analyses for the Query
An analysis includes the GeneChip® expression analysis results (*.chp) or
spotted array intensity results (*.spt) derived from an experiment. An
analysis is computed using particular values for user-modifiable algorithm
parameters.
Selecting Analyses from the Data Tree
The data tree displays analyses in the current database as well as array sets.
See Chapter 10 for more information about array sets.
■
To select analyses for the query, click the analyses or array sets in the
data tree.
■
To select adjacent analyses, press and hold the SHIFT key while you click
the first and last analysis in the selection. To select non-adjacent
analyses, press and hold the CTRL key while you click the analyses.
Specifying Analysis Filters
If the publish database is on the LIMS server, you may specify analysis
filters that determine the analyses displayed in the data tree.
■
Click the Display Analysis Filters button
Analysis → Filters from the menu bar.
. Alternatively, select View
⇒ The Filter Analysis dialog box appears (Figure 5.12).
The Filter Analysis dialog box contains an Attribute section (top) and a
Sample section (bottom). Analysis filters can be specified in the
Attribute section, Sample section, or both sections.
Analysis filters specified in the Attribute and Sample sections are
combined in OR fashion (union).
Affymetrix® Data Mining Tool User’s Guide
Figure 5.12
Filter Analysis dialog box, GeneChip® data mode (publish database on the LIMS server)
71
72
CH A P T E R 5
Building and Running a Query
Attribute Section
The Attribute section includes the template tree, attribute list, and value list
(Figure 5.13). Together these components comprise a hierarchy that enables
you to specify particular attribute values as analysis filters (see Table 5.3).
The data tree displays the analyses that contain the selected attribute values
when the Apply button is clicked.
Figure 5.13
Filter Analysis dialog box, Attribute section
Affymetrix® Data Mining Tool User’s Guide
Table 5.3
Filter Analysis dialog box, Attribute section components
Component
Displays...
Select...
Template tree
sample and experiment
templates in the current
database
one or more templates from the
template tree to display the associated
attributes in the attribute list
Attribute list
attributes associated with
the templates selected in
the template tree
attributes from this list to display all
values for the selected attributes in the
value list
Value list
all the values in the current
database for the attributes
selected in the attribute list
particular attribute values from this list
for use as analysis filters
Selecting Analysis Filters in the Attribute Section
To select adjacent items in the template tree, attribute list, or value list, press
and hold the SHIFT key, then click the first and last row in the selection. To
select non-adjacent items, press and hold the CTRL key, then click the
desired rows.
1.
Click the template names of interest in the template tree.
⇒ The Attribute list displays all attributes associated with the
selected templates (Figure 5.14).
73
74
CH A P T E R 5
Building and Running a Query
Figure 5.14
Template tree (top) and attribute list (bottom)
2.
Click the attribute(s) of interest in the Attribute list (Figure 5.14).
⇒ The value list displays all values for the selected attribute(s)
(Figure 5.15).
Affymetrix® Data Mining Tool User’s Guide
Figure 5.15
Value list displays all values for the selected attribute(s)
3.
In the value list, click the attribute Value(s) you want to use as an
analysis filter(s).
4.
Click Clear to clear all selections from the Attribute section.
5.
When finished specifying analysis filters, click Apply.
⇒ The data tree in the Query window displays the analyses selected by
the filters.
If analysis filters are specified in both the Attribute and Sample
sections, DMT combines the filters in OR fashion (union).
If no attribute values are highlighted in the value list, then all values
are selected.
75
76
CH A P T E R 5
Building and Running a Query
Finding Templates or Attributes
The Find function in the Filter Analysis dialog box searches for templates,
attribute names, or attribute values. The Find button is located in the lower
right corner of the Attribute section in the Filter Analysis dialog box.
1.
To begin a search, click Find in the Filter Analysis dialog box
(Figure 5.12).
⇒ The Find dialog box appears (Figure 5.16).
Figure 5.16
Find dialog box
2.
Enter the text string for the search (up to 256 alphanumeric characters
and spaces) in the Find what box.
3.
Select Templates, Attribute Names, or Attribute Values from the
Look in drop-down list.
4.
Click Find Now.
⇒ Template search highlights templates in the template tree that
contain the search text string.
⇒ Attribute name search highlights the: 1) attributes in the attribute
list that contain the search text string, and 2) corresponding attribute
values in the value list.
⇒ Attribute value search highlights the attribute values in the value list
that contain the search text string.
5.
Click Close to close the Find dialog box.
Affymetrix® Data Mining Tool User’s Guide
Sample Section
The Sample section of the Filter Analysis dialog box (Figure 5.17) displays the
attributes that LIMS requires during sample registration and experiment
setup (see Table 5.1).
Figure 5.17
Sample section of the Filter Analysis dialog box
Table 5.4
Filter Analysis dialog box, Sample section
Component
Contents
Sample Project
Projects in the current database.
You can assign a sample to a project before publishing data. Several
samples can be assigned to the same project for faster selection in
DMT. If samples have been assigned to multiple projects, select all
pertinent projects from the sample project list.
Probe Array
GeneChip® probe array types in the current database.
Sample Type
Samples types in the current database.
You can organize experiments according to sample type before
publishing analysis results data. The sample type may be used to create
groups of results for a project.
Many experiments may be associated with one sample type for faster
selection in DMT. For example, experiment results may be assigned to
Treated Liver or Untreated Liver sample types in the Liver project.
Operator
Logon user names of operators who created experiments.
Sample Name
Identifies the RNA source of the target hybridized to the GeneChip®
probe array.
You can assign the same sample name to different GeneChip probe
arrays or experiments, then select the name to conveniently obtain all
results for the sample from different experiments.
77
78
CH A P T E R 5
Building and Running a Query
Selecting Filters in the Sample Section
The Sample section of the Filter Analysis dialog box (Figure 5.17) organizes
attributes with increasing specificity from left to right.
1.
Starting at the left in the Sample section, click the items of interest in
each component list.
Selected items in the same component list are combined in OR fashion
(union). Selections from different component lists are combined in
AND fashion (intersection).
Table 5.5
Filter Analysis dialog box, Sample section
Select This From
Component List
To Display...
Sample Project
in the data tree, the analyses associated with the projects
Probe Array
in the data tree, the analyses associated with the probe arrays
Sample Type
operators and sample names associated with the sample types
Operator
sample names associated with the selected sample types AND
operators
Sample Name
in the data tree, the analyses associated with the selected
sample types AND operators AND sample names
If no items in a list have been highlighted, then all of the items in the
list are selected by default.
2.
When finished specifying analysis filters, click Apply.
⇒ The data tree in the Query window displays the analyses selected by
the filters.
If analysis filters are specified in both the Attribute and Sample section,
DMT combines the filters in OR fashion (union).
Affymetrix® Data Mining Tool User’s Guide
Running a Query
After specifying the filter and selecting the analyses from the data tree, the
query is ready to run.
To run the query, do one of the following:
■
Click the Query button
bar.
or select Query → Run Query from the menu
⇒ The query table displays the query results.
■
Click the Pivot button
bar.
or select Query → Pivot Data from the menu
⇒ The pivot table displays the query results.
For more information about results tables, see Chapter 7, Query Results
Tables.
Normalizing GeneChip®
Signal Data
Normalization is a mathematical technique that minimizes discrepancies in
results data from different experiments due to non-biological variables such
as sample preparation, hybridization conditions, staining, amount of spotted
probe, or GeneChip® probe array lot.
Results data may be normalized prior to publishing in Affymetrix®
Microarray Suite (GeneChip data) or Affymetrix® Jaguar™ (spotted array
data).
If GeneChip signal data were not normalized or were not normalized
consistently, normalization can be performed in DMT. In DMT, you may
normalize the data before or after a query or pivot operation.
The normalization option is only available in GeneChip® data mode.
79
80
CH A P T E R 5
Building and Running a Query
Choosing Normalization Before a Query or Pivot
1.
Click the Options button .
⇒ The Data Mining Options dialog box appears.
2.
Click the Normalization tab.
⇒ The normalization options are displayed (Figure 5.18).
Figure 5.18
Data Mining Options dialog box, Normalization tab
3.
Select the Compute Normalization option and confirm the All Probe
Set Normalization algorithm is selected.
4.
Click OK.
⇒ After a query, the query and pivot table display normalized signal
values for each probe set.
If the pivot table does not display the normalized data column, verify
that the pivot data includes Norm Signal or Norm Avg Diff (select
Query → Pivot Data from the menu bar).
Affymetrix® Data Mining Tool User’s Guide
Choosing Normalization After a Query or Pivot
1.
2.
After a query or pivot operation is run, select Query → Normalize
from the menu bar.
To display the normalized signal data in the query table, click the Query
tab in the results pane (displays the query table), then select Query →
Normalize from the menu bar.
If the Query Normalize menu item is not available, verify that the All
Probe Set Normalization algorithm is selected in the Data Mining
Options dialog box (click the Options button
, then click the
Normalization tab).
3.
To display the normalized signal data in the pivot table, click the Pivot
tab in the results pane (displays the pivot table), then select Query →
Normalize from the menu bar.
If the pivot table does not display the normalized data values, check to
make sure the pivot data includes Norm Signal or Norm Avg Diff
(select Query → Pivot Data from the menu bar).
Normalization Options
1.
Click the Options button .
⇒ The Data Mining Options dialog box appears.
2.
Click the Normalization tab.
⇒ The normalization options are displayed (Figure 5.19).
81
82
CH A P T E R 5
Building and Running a Query
Figure 5.19
Data Mining Options dialog box, Normalization tab
3.
Click Settings.
⇒ The All Probe Set Normalization Settings dialog box appears
(Figure 5.20).
Figure 5.20
All Probe Set Normalization Settings
Affymetrix® Data Mining Tool User’s Guide
Target Intensity
Select this option to normalize the signal data to a user-specified target
intensity (default = 5000). When selected, DMT computes the
Normalization Factor (NF) for an analysis n so that:
Target Intensity = NFn x average signaln
If the user-specified Target Intensity option is not selected, DMT sets the
Target Intensity equal to the average signal of all analyses queried, not just
the analyses returned by the query.
Intensity Threshold
Select the Intensity Threshold option to specify a threshold for the signal
values used to compute the average signal.
When the signal of a probe set is less than the intensity threshold, DMT
omits the probe set from the average signal calculation.
Low and High Percentage
DMT does not include a signal value in the average signal calculation when
it falls in the Low Percentage or High Percentage range. The default
values are the bottom 2% and the top 2%.)
If an Intensity Threshold is specified, the Low and High Percentage range
is applied to the signal values above threshold.
83
84
CH A P T E R 5
Building and Running a Query
6
Chapter 6
Managing Queries
6
Saving and opening filters saves time when one or more complex
filters are used on a regular basis. This chapter outlines the tasks of
saving a query, opening previously saved queries and deleting
queries.
Saving a Query
You may save the filter parameters you specify. You can apply the saved
filter parameters to subsequent experimental results or use them to
regenerate the current query results in a future session.
1.
When a DMT session is open, select Data → Save from the menu bar.
⇒ The Save dialog box appears (Figure 6.1).
Figure 6.1
Save dialog box
2.
Enter a name for the query in the Name box, then click Save.
⇒ This saves the filter parameters.
87
88
CH A P T E R 6
Managing Queries
Using the Save As Command
Queries created by other users may be opened as read-only. Changes to
read-only queries cannot be saved unless the query is renamed. This
prevents users from modifying queries created by other users.
You can also use the Save As command if you want modify, but not
overwrite, one of your own queries.
1.
Select Data → Save As from the menu bar.
⇒ The Save dialog box appears (Figure 6.2).
Figure 6.2
Save dialog box
2.
Enter a new name for the modified query, then click Save.
⇒ This saves the filter parameters.
Affymetrix® Data Mining Tool User’s Guide
Opening a Previously Saved
Query
1.
When a DMT session is open, select Data → Open from the menu bar.
⇒ The Open dialog box appears (Figure 6.3).
The Open dialog box displays all saved queries in the default directory,
unless the Only show my queries option box is selected. You may open
any saved query.
Figure 6.3
Open dialog box
2.
Select a query, then click Open.
⇒ The DMT session starts.
89
90
CH A P T E R 6
Managing Queries
Deleting a Query
1.
Select Data → Delete Query from the menu bar.
⇒ The Delete dialog box appears (Figure 6.4).
Figure 6.4
Delete dialog box
2.
Select a query, then click Delete.
⇒ The selected query is permanently removed from the system.
Users (identified by the logon name) cannot delete queries created by
other users.
7
Chapter 7
Query Results Tables
7
The results tables display experimental information and expression
data that satisfy the query filter conditions. This chapter explains how
to use these results tables.
The results tables are generated independently. Therefore, you can
change the analyses displayed in one table without affecting the
contents of the other tables.
Experiment
Information Table
The experiment information table displays information about the analyses or
array sets selected in the data tree.
1.
To view experiment information for several analyses or array sets:
a.
In the data tree, select the analyses or array sets you want to view
To select adjacent analyses, press and hold the SHIFT key while you
click the first and last analysis in the selection. To select non-adjacent
analyses, press and hold the CTRL key while you click the analyses.
b.
Click the Info button
or select Query → Experiment
Information from the menu bar
⇒ The selected analyses are displayed in the experiment
information table (Figure 7.1, Figure 7.2).
2.
To view experiment information for one analysis, right-click the
analysis in the data tree and select Experiment Info from the shortcut
menu.
3.
If necessary, click the Experiment Info tab to view the table.
93
94
CH A P T E R 7
Query Results Tables
The experiment information table displays each analysis in a separate
column. You can resize, reorder, or hide columns as desired (see
Appendix B).
GeneChip® Data Mode
The experiment information table for GeneChip® data (Figure 7.1) displays
information about analyses or array sets selected in the data tree, including:
■
Information entered during GeneChip® probe array experiment setup.
■
Data and experiment attributes automatically captured during
hybridization, scanning and analysis.
Figure 7.1
Experiment information table, GeneChip® data mode
Affymetrix® Data Mining Tool User’s Guide
Spot Data Mode
The experiment information table for spot data (Figure 7.2) displays
information about selected analyses, including:
■
Probe array and operator name.
■
Parameters associated with the analysis.
Figure 7.2
Experiment information table, spot data mode
95
96
CH A P T E R 7
Query Results Tables
Query Table
The query table presents query data in rows that identify an analysis and
probe name (probe set or spot probe), followed by the expression metrics
that met the limits specified by the filter. Appendix C describes the metrics
and other types of data included in the query table.
To populate the query table:
1.
In the data tree, click analyses or array sets of interest.
2.
Specify filters (optional).
3.
Click the Query button .
⇒ The query table displays the query results (Figure 7.3, Figure 7.4).
If you specified a sort order for a particular metric(s) in the filter grid, the
corresponding query table rows are arranged accordingly (ascending,
descending, or no sort). The columns may be resized, reorder, or hidden as
desired (see Appendix B).
Figure 7.3
Query table, GeneChip® data mode
Affymetrix® Data Mining Tool User’s Guide
Figure 7.4
Query table, spot data mode
In the Spot column, the spot coordinates (in parentheses) follow the probe
name.
Pivot Data Table
The results of a query frequently include the same probe (probe set or spot
probe) from different analyses. The pivot operation organizes the query
results so that all analysis results for a particular probe are displayed side by
side in the same row of the pivot data table (Figure 7.5). The pivot table is
blank until the pivot operation is run.
The pivot table makes it easier to review and compare the query results from
different analyses that are associated with a particular probe.
97
98
CH A P T E R 7
Query Results Tables
Figure 7.5
Pivot table, GeneChip® data mode
Figure 7.6
Pivot table, spot data mode
Affymetrix® Data Mining Tool User’s Guide
Selecting Results for the Pivot Table
Before running the pivot operation, specify the type of expression metrics
you want to view in the pivot table.
1.
Click the Options button .
⇒ The Data Mining Options dialog box appears.
2.
Click the Pivot tab.
⇒ This tab displays the results available for the pivot operation
(Figure 7.7, Figure 7.8).
Figure 7.7
Data Mining Options dialog box, Pivot tab (GeneChip® data mode)
99
100
CH A P T E R 7
Query Results Tables
Figure 7.8
Data Mining Options dialog box, Pivot tab (spot data mode)
3.
Place (or remove) a check mark next to the result you want to include
(or exclude) from the pivot operation.
DMT applies the data selections to the next pivot operation. The pivot
table displays only the types of results selected for the pivot operation.
4.
Click OK to close the Data Mining Options dialog box.
Viewing Results Selected for the Pivot Table
The menu bar also shows the metrics selected for the pivot table.
1.
2.
To view Statistical algorithm metrics, select Query → Select Pivot
Data → Statistical Algorithm Results from the menu bar.
To View Empirical algorithm results, select Query → Select Pivot
Data → Empirical Algorithm Results from the menu bar.
⇒ This displays a drop-down list of metrics (Figure 7.9).
Check marks indicate items selected for the pivot table.
To include (or exclude) a result in the pivot table, click the result to add
(or remove) a check mark.
Affymetrix® Data Mining Tool User’s Guide
Figure 7.9
Pivot data drop-down list (GeneChip® data mode), Empirical algorithm (left) and Statistical
algorithm (right)
Running the Pivot Operation
You can query analyses or array sets selected in the data tree and display the
results in the pivot table.
1.
In the data tree, click the analyses you want to query and pivot.
To select adjacent analyses, press and hold the SHIFT key while you
click the first and last analysis in the selection. To select non-adjacent
analyses, press and hold the CTRL key while you click the analyses.
2.
To view the query results in the pivot table, do one of the following:
■
Click the Pivot button
■
Right-click a highlighted analysis or array set in the data tree and select
Pivot Data from the shortcut menu.
■
.
Select Query → Pivot from the menu bar.
⇒ This displays the pivot table (Figure 7.5, Figure 7.6).
The pivot table must be populated before the scatter, fold change, or
bar graph can be plotted.
101
102
CH A P T E R 7
Query Results Tables
Including Probe Descriptions in the Pivot Table
The pivot table can include probe (probe set or spot probe) descriptions. The
descriptions are derived from public databases (except for custom probe
arrays).
■
To display probe descriptions, select Query → Pivot Descriptions from
the menu bar.
⇒ This adds the Description column to the pivot table.
Including Annotations in the Pivot Table
The pivot table can include several columns of annotations. For more
information about annotating probes, see Chapter 8, Annotations.
1.
Select Query → Pivot Annotations from the menu bar.
⇒ If there is more than one annotation type, the Select Pivot
Annotation Type dialog box appears (Figure 7.10).
Figure 7.10
Select Pivot Annotation Type dialog box
2.
Select an annotation type, then click OK.
⇒ This adds a column of annotations (one annotation type per column)
to the pivot table (far right).
Affymetrix® Data Mining Tool User’s Guide
Sorting Pivot Table Columns
You can specify a sort order for up to four columns in the pivot table.
1.
2.
Click the Pivot tab in the results pane.
Select Edit → Sort from the menu bar. Alternatively, right-click the
pivot table and select Sort from the shortcut menu.
⇒ The Sort dialog box appears (Figure 7.11).
Figure 7.11
Sort dialog box
3.
Click the Sort By drop-down arrow.
4.
Select a pivot column from the drop-down list (Figure 7.12).
Figure 7.12
Sort dialog box
103
104
CH A P T E R 7
Query Results Tables
5.
Select the Ascending or Descending sort order option.
6.
To specify another sort order, click the next drop-down arrow in the
Then By box, and repeat steps 4 and 5.
7.
Click OK when finished.
Pivot Options
The Data Mining Options dialog box displays the pivot options (Figure 7.13).
1.
To open the Data Mining Options dialog box, do one of the following:
■
Click the Options button
■
Right-click the pivot table and select Options from the shortcut
menu.
■
.
Select View → Options from the menu bar, then click the Pivot tab.
Figure 7.13
Data Mining Options dialog box, GeneChip® data mode (left) and spot data mode (right)
Affymetrix® Data Mining Tool User’s Guide
Show Order Analyses Dialog
Select this option to display the Order Pivot Analysis dialog box (Figure 7.14)
before the pivot operation begins. This dialog box enables you to specify an
order for the analyses (columns) in the pivot table. The analysis order in the
pivot table determines the order of the analyses in the series bar and
histogram graphs.
Figure 7.14
Order Pivot Analyses dialog box
1.
Use the drag-and-drop method to change the order of the analyses in the
Order Analyses dialog box.
2.
Click OK to pivot the data.
You can also reorder columns in the pivot table using the drag-anddrop method (see Appendix B).
105
106
CH A P T E R 7
Query Results Tables
Working with Tables
Working with results tables is the same in GeneChip® data mode
(shown in this section) or spot data mode.
Finding Probes
DMT can perform a text search in the query or pivot table.
1.
To specify the text string for the search, do one of the following:
■
Click the Find button
■
Right-click the query or pivot table and select Find In Results from
the shortcut menu.
■
.
Select Edit → Find In Results from the menu bar.
⇒ The Find Probe dialog box appears (Figure 7.15).
Figure 7.15
Find Probe dialog box, GeneChip® data mode
2.
Enter the text string for the search in the Find What box, or select a
previously entered text string from the Find what drop-down list.
3.
Select the Match and Direction search options, then click Find Next.
If the pivot table includes descriptions, the find function searches the
probe set or spot probe name and description columns.
4.
Click Find Next again to continue the search.
The Find Next command finds all strings that match the search text
string. For example, using the Find Next command to search for the
text string biob would find AFFX-BioB-5 as well as other occurrences
of BioB (unless either the Match whole word only or Match case
option is selected).
Affymetrix® Data Mining Tool User’s Guide
Viewing Descriptions & Obtaining Further Gene Information
The Description dialog box (Figure 7.16) is available in the query or pivot
table. It enables you to:
■
View descriptions.
■
View or enter annotations for the selected probe (probe set or spot
probe).
■
Access an Internet website for further gene information.
1.
Double-click the query or pivot table row that contains the probe of
interest.
⇒ The Description dialog box appears (Figure 7.16).
The Description dialog box displays:
■
The probe name and a brief description.
■
The target sequence the probe set is designed to interrogate.
■
Annotations.
The Description dialog box is automatically updated when you click
another probe in the query or pivot table.
Figure 7.16
Description dialog box, GeneChip® data mode
2.
To obtain further gene information, select an Internet website from the
drop-down list, then click Information.
⇒ The default Internet browser is started and automatically opens the
selected website.
3.
To annotate the selected probe, click Annotate.
107
108
CH A P T E R 7
Query Results Tables
See the following section for information about annotating probes.
Annotating Probes
You can annotate probes (probe sets or spot probes) displayed in the query or
pivot table.
1.
Select one or more probe names in the query or pivot table.
To select adjacent names, press and hold the SHIFT key while you click
the first and last name in the selection. To select non-adjacent names,
press and hold the CTRL key while you click the names.
2.
Right-click the query or pivot table and select Annotate Probes from
the shortcut menu. Alternatively, select Annotations → Annotate
Probes from the menu bar.
⇒ The Annotate dialog box appears and displays the selected probe
names in the Probe Set(s) box (Figure 7.17).
3.
Enter an Annotation Type or make a selection from the drop-down list.
4.
Enter comments in the Annotation box.
Figure 7.17
Annotate dialog box
5.
Click OK to add the annotation and close the Annotate dialog box.
Affymetrix® Data Mining Tool User’s Guide
Adding Probes to the Filter Grid
You can add all or selected probes (probe sets or spot probes) in the query or
pivot table to the current filter. DMT saves the selected probes as a probe
list, then adds the probe list to the filter. (See Chapter 9 for more information
about probe lists.)
Adding Selected Probes
1.
Select one or more probe names in the query or pivot table.
To select adjacent names, press and hold the SHIFT key while you click
the first and last name in the selection. To select non-adjacent names,
press and hold the CTRL key while you click the names.
2.
Right-click the table and select Add Selected Rows to Filter from the
shortcut menu. Alternatively, select Edit → Add Selected Rows to
Filter from the menu bar.
⇒ The selected probe set names are added to the Probe Set Name
column (or the selected spot probe names to the Spot column) in the
filter grid.
Adding All Probes
1.
Right-click the query or pivot table and select Add All Rows to Filter
from the shortcut menu. Alternatively, select Edit → Add All Rows to
Filter from the menu bar.
⇒ All probe set names are added to the Probe Set Name column (or
all spot probe names to the Spot column) in the filter grid.
If the option Always Prompt to Create List is chosen (select Edit
→ Lists → Always Prompt to Create List from the menu bar),
DMT prompts you to create a list of the probe sets (or spot probes)
you want to add to the filter.
DMT adds the list name to the filter instead of the probe names. See
Chapter 9 for more information about lists.
109
110
CH A P T E R 7
Query Results Tables
Copying Tables
All or a selected portion of a results table can be copied to the system
clipboard, then pasted into other applications.
1.
To select the entire results table, click the upper left corner of the table
(Figure 7.18).
Figure 7.18
Query table (GeneChip® data mode), all rows selected
2.
To select part of a results table, do one of the following:
■
Click and drag the mouse to select the desired rows.
■
Click a row header to select the entire row.
To select adjacent rows, press and hold the SHIFT key while you click
the first and last row in the selection. To select non-adjacent rows, press
and hold the CTRL key while you click the rows.
3.
To copy the selection to the system clipboard, do one of the following:
■
Click the Copy Cells button
.
■
Right-click the table and select Copy Cells from the shortcut menu.
■
Select Edit → Copy Cells from the menu bar.
⇒ The selected table cells are copied to the system clipboard.
4.
To copy the selection to Excel, select Edit → Copy to Excel from the
menu bar.
⇒ Microsoft® Excel opens and the selection is pasted into a new
spreadsheet.
Affymetrix® Data Mining Tool User’s Guide
Exporting Data
The experiment information, query, or pivot table data may be exported
(saved) to a tab-delimited text file (*.txt), then imported into other
applications.
Hidden table columns are not exported.
1.
Select Data → Export As from the menu bar.
⇒ The Export As dialog box appears (Figure 7.19).
Figure 7.19
Export As dialog box
2.
3.
Select a directory from the Save in drop-down box.
Enter a File name, then click Save.
Expanding the Results Pane
When the Query window displays both the graph and results pane, you can
enlarge the results pane.
1.
Right-click a table in the results pane and select Expand Results from
the shortcut menu. Alternatively, select View → Expand Results from
the menu bar.
⇒ The graph pane is hidden and the results pane is enlarged.
2.
To return the results pane to its original size, repeat step 1.
111
112
CH A P T E R 7
Query Results Tables
Clearing the Results Pane
To clear the results pane, select Edit → Clear Results from the menu
bar.
⇒ All tables from the results pane are cleared.
8
Chapter 8
Annotations
8
You can annotate probes (probe sets or spot probes) and view the
annotations in the pivot table. The annotations may be queried and
the query results may be added to the filter.
Creating and working with annotations is the same in GeneChip® data
mode (shown in this chapter) or spot data mode.
Annotating Probes
1.
Select one or more probes in the query or pivot table.
2.
Right-click the query or pivot table and select Annotate Probes from
the shortcut menu. Alternatively, select Annotations → Annotate
Probes from the menu bar.
⇒ The Annotate dialog box appears and displays the selected probes
in the Probe Set(s) box (Figure 8.1).
Figure 8.1
Annotate dialog box, GeneChip® data mode
3.
Enter the Annotation Type, or select from the drop-down list.
115
116
CH A P T E R 8
Annotations
Enter comments in the Annotation box, then click OK.
4.
Loading Annotations
You can add or load annotations previously saved in a text file (*.txt) to the
system.
Creating a Text File
Use the following procedure to create an annotation text file.
1.
Create a text file (*.txt) following the tab delimited format shown in
Figure 8.2.
2.
In the first line, enter the columns names (as defined in Table 8.1)
delimited by tabs (Figure 8.2).
Table 8.1
Annotation text file, column names
Column Number
3.
Column Name
1
Probe name
2
Type
3
Annotation
In the next line, enter the probe name, annotation type and the
annotation delimited by tabs (Figure 8.2).
Enter only one annotation per line. Each annotation can include up to
2000 characters.
Affymetrix® Data Mining Tool User’s Guide
Figure 8.2
Annotations text file (*.txt), GeneChip® data mode
Loading the Annotations
Use the following procedure to add an annotations text file to the system.
1.
Select Annotations → Load Annotations from the menu bar.
⇒ The Open dialog box appears (Figure 8.3) and displays the contents of
the default directory specified in the Data Mining Options dialog
box (Default Directory tab).
ó
Figure 8.3
Open dialog box
2.
Select the text file that contains the annotations, then click Open.
⇒ The annotations are added to the GeneInfo database and are
available to DMT.
117
118
CH A P T E R 8
Annotations
Querying Annotations
The Query Annotations window (Figure 8.4) enables you to build an
annotation query (top pane) and view the returned results (bottom pane).
1.
Select Annotations → Query Annotations from the menu bar.
⇒ The Query Annotations window appears (Figure 8.4).
Figure 8.4
Query Annotations window
2.
Click the drop-down arrow in the Field column to display a drop-down
list of field types. Select <None> if you want to clear a previously
selected feld type.
3.
Use the scroll bar to view the list and select a field type (or none) from
the drop-down list.
4.
Enter the search text string in the Search For box.
⇒ DMT combines the field type with the text string in AND fashion
(intersection) (see Table 8.2).
5.
To edit the text string, highlight the entry and right-click the cell.
⇒ A shortcut menu of edit commands is displayed.
Affymetrix® Data Mining Tool User’s Guide
Table 8.2
Field types
Field Type
Returns Annotations....
Probe
for probe set or spot probe names that contain the text string
User
created by a user whose name contains the text string
Description
for probe sets with descriptions that contain the text string
Annotation Type
of the specified type that contain the text string
6.
To enter another row of criteria, click the Operation column.
7.
Click the drop-down arrow, then select the AND (intersection) or OR
(union) operator.
⇒ DMT automatically adds another row to the Query Annotations
filter grid.
8.
To specify additional query criteria, repeat step 2 through step 6.
9.
To run the query, do one of the following:
■
Click the Query Annotations button
■
Right-click the top pane and select Run Query from the shortcut
menu.
■
.
Select Annotations → Run Query from the menu bar.
⇒ The bottom pane of the Query Annotations window displays the
returned results (Figure 8.5).
119
120
CH A P T E R 8
Annotations
Figure 8.5
Query Annotations window, query criteria (top) and query results (bottom)
Annotation Query Results
Type
Annotation type selected when the annotation was created.
Annotation
Text entered by the user who created the annotation.
User
Windows NT name of the user who logged onto the
workstation when the annotation was created.
Date
Date when the annotation was created or last updated.
Description
Probe description (derived from a public database).
Copying Annotation Query Results
Annotation query results may be copied to the system clipboard and pasted
into other applications. The row numbers are also copied with the selected
cells for reference.
1.
2.
Click the row number in the query results to select the entire row.
Select Annotations → Copy Cells from the menu bar.
⇒ The selection is copied to the system clipboard.
Affymetrix® Data Mining Tool User’s Guide
Clearing the Annotation Query or Query Results
1.
To clear the annotation filter grid (top pane of the Query Annotations
window), select Annotations → Clear Query from the menu bar.
2.
To clear the annotation query results (bottom pane of the Query
Annotations window), select Annotations → Clear Results from the
menu bar.
Adding Probes to the Filter Grid
Probes (probe sets or spot probes) returned by an annotation query may be
added to the current filter.
1.
Select one or more probes in the bottom pane of the Query Annotations
window.
2.
Right-click the selection, then select Add Selected Results To Filter
from the shortcut menu. Alternatively, select Annotations → Add
Selected Results To Filter from the menu bar.
⇒ The selected probe set names are added to the Probe Set Name
column (or the selected spot probe names to the Spot column) in the
filter grid.
If the option Always Prompt to Create List is selected (Edit →
Lists → Always Prompt to Create List from the menu bar), DMT
prompts you to create a list of the probe sets (or spot probes) you
want to add to the filter.
DMT adds the list name to the filter instead of the probe names. See
Chapter 9 for more information about lists.
121
122
CH A P T E R 8
Annotations
Deleting Annotations
An annotation may only be removed from the database by the user who
created it. The delete command permanently removes an annotation from the
system.
1.
Select Annotations → Query Annotations from the menu bar.
⇒ The Query Annotations window appears (Figure 8.6).
Figure 8.6
Query Annotations window, specifying search for annotations created by the user
2.
Select User from the Field drop-down list.
3.
Enter your user name in the Search For box.
4.
Click the Query Annotations button .
⇒ All of the annotations that meet the criteria are displayed.
5.
To select a row, click the row number. To select the all rows, click the
upper left corner of the query results pane (Figure 8.7).
Figure 8.7
Query Annotations window
Affymetrix® Data Mining Tool User’s Guide
6.
Select Annotations → Delete Selected Annotations from the menu
bar. Alternatively, right-click a selected annotation, then select Delete
Selected Annotations from the shortcut menu.
⇒ The selected annotations are permanently removed.
123
124
CH A P T E R 8
Annotations
9
Chapter 9
Probe Lists
9
A user-specified group of probes (probe sets or spot probes) can be
saved as a probe list. Probe lists are displayed in the data tree and
may be added to the filter grid (probe set name or spot column), or
used to view specific query results. A text file (comma delimited *.txt)
that specifies a probe list may also be added to the system.
This section covers the methods for creating or loading probe lists
and how to use and manage probe lists in Affymetrix® Data Mining
Tool.
Creating and working with probe lists is the same in GeneChip® data
mode (shown in this chapter) or spot data mode.
Creating Probe Lists
A probe list may be generated from probes selected from:
■
The query or pivot table.
■
Cluster analysis results.
■
Search array descriptions.
■
The filter grid.
Additionally, existing probe lists may be combined to create new lists. This
section outlines the various procedures for creating probe lists.
127
128
CH A P T E R 9
Probe Lists
Creating a Probe List from the Query or Pivot Table
1.
Select one or more probes in the query or pivot table.
2.
Right-click the table and select Create Probe List from the shortcut
menu
⇒ The Save Probe List dialog box appears (Figure 9.1).
Figure 9.1
Save Probe List dialog box
3.
Enter a name for the list in the Name box, then click Save.
⇒ The Probe List Members dialog box appears and displays the
members in the saved list (Figure 9.2).
Affymetrix® Data Mining Tool User’s Guide
Figure 9.2
Probe List Members dialog box
4.
Click Close when finished viewing the probe list members.
⇒ The data tree displays the probe list in the Probe Lists directory
(Figure 9.3).
5.
In the data tree, click the plus sign (+) next to the probe list name to
display the probe list members.
For example, in Figure 9.3, the probe list L1 contains five members.
Figure 9.3
Data tree, Probe Lists Directory
129
130
CH A P T E R 9
Probe Lists
Creating a Probe List from Cluster Analysis
The cluster members identified by cluster analysis may be saved as a probe
list. (See Chapter 14 for more information about cluster analysis.)
1.
After the cluster analysis results are returned, click the cluster of interest
in the Clusters tab of the graph pane.
⇒ The cluster members (probe sets or spot probes) are displayed in the
Probes box (Figure 9.4).
Figure 9.4
Graph pane, Clusters tab (GeneChip® data mode)
2.
Enter a Probe List Name, then click Save Selected.
⇒ A probe list is created that includes the members of the selected
cluster and is displayed in the data tree Probe List directory.
Affymetrix® Data Mining Tool User’s Guide
Creating a Probe List from Search Array Descriptions
1.
Select Edit → Search Array Descriptions from the menu bar.
⇒ The Search Array Descriptions dialog box appears (Figure 9.5)
Figure 9.5
Search Array Descriptions dialog box
2.
In the Search for box, enter the description, or partial description, then
click Find.
⇒ Results for the search are displayed in the list box (Figure 9.6).
Figure 9.6
Search Array Descriptions dialog box with search results
131
132
CH A P T E R 9
Probe Lists
3.
Press and hold the CTRL key while you click to select the desired probe
set names.
4.
Click Add to Filter.
⇒ The Save Probe List dialog box appears.
5.
Enter a Name for the probe list, then click Save.
⇒ The Probe List Members dialog box appears.
6.
Click Close when finished.
⇒ The data tree displays the probe list.
Creating a Probe List from Filter
1.
2.
Enter the probe set names in the Probe Set Name column of the filter
grid.
Select → Edit → Probe Lists → Create Probe List from Filter.
⇒ The Save Probe List dialog box appears.
3.
Enter a Name for the probe list, then click Save.
⇒ The Probe List Members dialog box appears.
4.
Click Close when finished.
⇒ The data tree displays the probe list.
Creating a Probe List by Combining Existing Lists
1.
Select Edit → Probe Lists → Combine Probe Lists.
⇒ The Combine Probe Lists dialog box appears (Figure 9.7).
Figure 9.7
Combine Probe Lists dialog box
Affymetrix® Data Mining Tool User’s Guide
2.
Select or clear the Only show my probe lists option, as desired.
3.
In the Combine Probe List drop-down box, select a probe list.
4.
Select a second probe list from the lower drop-down box.
5.
Select either the And or Or option.
6.
■
And specifies that probe names must belong to both lists to be
included in the new list.
■
Or specifies probe names belonging to either one or both lists will be
included in the new list.
Enter a new probe list name.
Figure 9.8
Combining all probes belonging to lists Hu6800 or Like_Affx IN_cre
7.
Click OK.
⇒ The Probe List Members dialog box appears displaying all probes
in new probe list.
8.
Click Close when finished.
⇒ The data tree displays the new probe list.
133
134
CH A P T E R 9
Probe Lists
Loading a Probe List
In addition to creating a probe list (described in the preceding section), a
probe list may be loaded or added to the system. There are two methods
available for loading a probe list:
■
Specify members. Select this option to manually enter the probe list
members.
■
Specify input file. Select this option to load a previously saved text file
(*.txt) that specifies the probe list members.
Specifying Probe List Members
1.
Select Edit Probe Lists → Load Probe List from the menu bar.
⇒ The Load List dialog box appears (Figure 9.9).
2.
Enter a Probe List name.
3.
Select the Specify members (comma delimited) option and enter the
probe set or spot probe names using a comma delimited format
(terminate the entry with a comma) (Figure 9.9).
.
Figure 9.9
Load List dialog box, Specify members option
4.
Click OK.
⇒ The list is created and displayed in the data tree Probe Lists
directory.
Affymetrix® Data Mining Tool User’s Guide
Specifying an Input File
To load a probe list using the Specify input file option, you must first create
the list (*.txt) so that you can select it from the Load List dialog box.
Creating the Input File
1.
Create a text file (*.txt) following the comma delimited format shown in
Figure 9.10.
2.
Enter the probe names (probe set or spot probe) in comma delimited
format (terminate the entry with a comma) (Figure 9.10).
Figure 9.10
Comma delimited probe list entries
3.
Save the text file.
135
136
CH A P T E R 9
Probe Lists
Selecting the Input File
1.
Select Edit → Probe Lists → Load Probe List from the menu bar.
⇒ The Load List dialog box appears (Figure 9.11).
Figure 9.11
Load List dialog box
2.
Enter a Probe List name.
3.
Select the Specify input file option and enter the name of the text file
(*.txt) that contains the list members.
Alternatively,
a.
Click the Browse button .
⇒ The Select List dialog box appears (Figure 9.12).
Figure 9.12
Select List dialog box
Affymetrix® Data Mining Tool User’s Guide
b.
Select a text file, then click Open.
⇒ The Load List dialog box displays the selected input file
(Figure 9.13).
Figure 9.13
Load List dialog box
4.
Click OK.
⇒ The probe list is created and displayed in the data tree Probe Lists
directory.
Using Probe Lists
Probe lists provide a convenient way to quickly add a group of associated
probes (probe sets or spot probes) to the filter, or to highlight and view
results for only selected probes.
Adding a Probe List to the Filter Grid
You can add an existing probe list to the filter.
1.
In the filter grid, right-click a cell in the Probe Set Name or Spot column
and select Probe List... from the shortcut menu (Figure 9.14).
⇒ The Open Probe List dialog box appears (Figure 9.14).
137
138
CH A P T E R 9
Probe Lists
Figure 9.14
Shortcut menu and Open Probe LIst dialog box
The Open Probe List dialog box displays all probe lists contained on the
server, unless the Only show my probe lists option is selected.
2.
From the Open dialog, select the probe list that you want to add to the
filter.
3.
Click Open.
⇒ The probe list is added to the filter.
Displaying Selected Probe List Members
Use probe lists to highlight probe list members in the scatter or fold change
graph, or exclusively display members in the pivot table or a series line
graph.
Pivot the analyses of interest and plot the scatter, fold change and
series line graph before highlighting a probe list(s).
1.
Select one or more probe lists in the data tree.
To select adjacent probe lists, press and hold the SHIFT key while you
click the first and last list in the selection. To select non-adjacent probe
lists, press and hold the CTRL key while you click the lists.
2.
Right-click a selected probe list and select Display Selected Probes
from the shortcut menu.
⇒ If the scatter or fold change graph is the active (selected) graph, the
corresponding points are highlighted. If the series graph is active,
Affymetrix® Data Mining Tool User’s Guide
only the data for the selected probe list members is displayed
(Figure 9.15).
⇒ The pivot table displays only the rows for the probe list members
(Figure 9.16).
3.
To restore all rows to the pivot table, right-click the pivot table and
select Show All Pivot Rows from the shortcut menu.
Figure 9.15
Series line graph of probe list L5
Figure 9.16
Pivot table displaying probe list L5
139
140
CH A P T E R 9
Probe Lists
Managing Probe Lists
List management is the same in GeneChip® data mode (shown in this
section) or spot data mode.
Viewing and Editing Probe List Members
1.
Select Edit → Lists → View Members from the menu bar.
⇒ The Probe List Members dialog box appears (Figure 9.17).
Figure 9.17
Probe List Members dialog box
2.
Select a Probe List from the drop-down list.
⇒ The Probe List Members box displays the list members.
If the Only show my probe lists option is selected (Figure 9.17), the
Probe Lists drop-down list only displays lists created by you (identified
by the logon name).
3.
To add a probe set to the list, enter the probe set name in the bottom box
(Figure 9.19), then click Add Member.
Affymetrix® Data Mining Tool User’s Guide
Figure 9.18
Probe list Members dialog box
4.
To remove a probe from the list, highlight it in the Probe List Members
box (Figure 9.19), then click Remove Member.
Figure 9.19
Probe List Members dialog box
5.
Click Close when finished viewing or editing the list.
141
142
CH A P T E R 9
Probe Lists
Combining Probe Lists
1.
Select Edit → Lists → Combine Lists from the menu bar.
⇒ The Combine Probe Lists dialog box appears (Figure 9.20).
Figure 9.20
Combine Probe Set Lists dialog box
2.
Make a selection from the upper and lower Combine Probe List dropdown list box (Figure 9.21).
Figure 9.21
Combine Probe Lists dialog box
3.
Select the And (intersection) or Or (union) combination option for the
lists.
4.
Enter a New probe list name for the new list, then click OK.
⇒ If the Show members after saving option is selected (Figure 9.21),
the Probe List Members dialog box appears and displays the new
list members (Figure 9.22).
Affymetrix® Data Mining Tool User’s Guide
Figure 9.22
Probe List Members dialog box
5.
Click Close when finished viewing or editing list members.
Exporting a Probe List
A probe list may be exported as a text file (*.txt).
1.
Right-click the probe list for export and select Export Probe List from
the shortcut menu. Alternatively, select Probe Lists → Export Probe
Lists from the menu bar.
⇒ The Save As dialog box appears (Figure 9.23).
Figure 9.23
Save As dialog box
143
144
CH A P T E R 9
Probe Lists
2.
Choose a directory from the Save in drop-down list.
3.
Enter a name for the text file and click Save.
Deleting a Probe List
Using the Shortcut Menu
1.
Right-click the probe list you want to delete and select Delete Probe
List from the shortcut menu.
⇒ DMT prompts you to confirm the probe list to be deleted
(Figure 9.24).
Figure 9.24
Delete probe list prompt
2.
Click OK to delete the probe list.
Using the Menu Bar
1.
Select Edit → Probe Lists → Delete Saved Probe List from the menu
bar.
⇒ The Delete Probe List dialog box appears (Figure 9.25).
Figure 9.25
Delete Probe List dialog box
Affymetrix® Data Mining Tool User’s Guide
2.
Select the list you want to delete and click Delete.
145
146
CH A P T E R 9
Probe Lists
10
Chapter 10
Array Sets
10
An array set is a user-specified group of GeneChip® probe array
analyses. An array set provides a convenient way to select a group of
analyses for a query, the pivot operation, graphing, statistical
analyses, or clustering.
Array sets are only available for GeneChip® probe array analyses.
Creating an Array Set
1.
In the data tree, click the analyses you want to include in an array set
(Figure 10.1).
To select adjacent analyses, press and hold the SHIFT key while you
click the first and last analysis in the selection. To select non-adjacent
analyses, press and hold the CTRL key while you click the analyses.
Figure 10.1
Right-click selected analyses in data tree for the shortcut menu
149
150
CH A P T E R 10
Array Sets
2.
Right-click a selected analysis, then select Create Set from the shortcut
menu (Figure 10.1).
Alternatively, select Edit → Sets → Create Set from the menu bar.
⇒ The Save Array Set dialog box appears (Figure 10.2).
Figure 10.2
Save Array Set dialog box
The Virtual Set option is available if the analyses selected for the array
set are derived from different GeneChip® probe array types. When a
virtual set is pivoted, DMT merges the analyses and displays them in a
single column of the pivot table. A virtual set is a convenient way to
manage the analyses from a multiple GeneChip probe array set.
If the same probe set occurs in more than one analysis, the pivot table
displays each probe set-analysis combination in a separate row to ensure
no data are lost.
For example, a control probe that is found across a set of four probe
arrays will generate four pivot table rows. Each row is distinguished by
the probe set-analysis name in the row header.
3.
Enter a Name for the array set.
4.
Select the Virtual Set option if you want to merge the analyses into a
single column in the pivot table.
5.
Click Save.
⇒ The array set is saved and displayed in the data tree under the My
Array Sets directory (Figure 10.3).
Affymetrix® Data Mining Tool User’s Guide
Figure 10.3
Data tree, My Array Sets directory
Saved Array Sets are stored in the registry on the computer and are
only available when using that specific computer.
Working with Array Sets
An array set is available for graphing (see Chapter 11), statistical analysis
(see Chapter 12) and cluster analysis (see Chapter 14).
Viewing Array Sets
The results tables are displayed independently. Therefore, changing
the analyses displayed in the experiment information table does not
affect the query or pivot table contents.
Experiment Information Table
1.
Click an array set(s) in the data tree.
To select adjacent array sets, press and hold the SHIFT key while you
click the first and last array set in the selection. To select non-adjacent
array sets, press and hold the CTRL key while you click the array sets.
2.
Right-click a highlighted array set and select Experiment Info from the
shortcut menu.
⇒ The experiment information table displays information for the
analyses in the array set(s).
151
152
CH A P T E R 10
Array Sets
Pivot Table
1.
Select an array set(s) in the data tree.
2.
Right-click a highlighted array set in the data tree and select Pivot Data
from the shortcut menu.
⇒ The pivot table displays the analysis results from the selected array
set(s).
The pivot table displays a single column of results for a virtual array set.
Managing Array Sets
Array sets that you have created can be edited or deleted from DMT. Only
array sets created by you, as identified by the logon name, are displayed in
the data tree.
Editing an Array Set
1.
2.
Select an array set in the data tree.
Select Edit → Sets → Edit Set from the menu bar.
⇒ The Array Set Members dialog box appears and displays the
selected array set and its members (Figure 10.4).
Figure 10.4
Array Set Members dialog box
Affymetrix® Data Mining Tool User’s Guide
3.
4.
Do one or both of the following:
■
Add a member to the array set: Enter the analysis name (from the
current database) in the bottom box, then click Add member.
■
Remove a member from the array set: Select the analysis in the
Array Set Members box, then click Remove member.
Click Close when finished editing the array set.
Deleting an Array Set
1.
In the data tree, select the array set(s) you want to delete.
2.
Right-click a selected array set and select Delete Sets from the shortcut
menu. Alternatively, select Edit → Sets → Delete Sets from the menu
bar.
⇒ DMT prompts you to confirm the array set(s) to be deleted.
3.
Click OK to delete the array set(s).
153
154
CH A P T E R 10
Array Sets
11
Chapter 11
Graphing Results
11
DMT can plot user-specified columns of numeric pivot table data in a
scatter, fold change, series, or histogram graph. This includes:
■
Analysis results
■
Statistical data generated using the analysis function (see Chapter 12).
The graph pane of the DMT session displays each type of graph in a
separate tab (Figure 11.1).
The pivot operation must be run before the graphs can be plotted.
Figure 11.1
Graph pane, Scatter graph tab
157
158
CH A P T E R 11
Graphing Results
Scatter Graph
The scatter graph (Figure 11.1) is an x-y graph that compares numeric pivot
table data (from user-specified columns) using a traditional scatter plot.
Multiple pivot table columns may be assigned to each axis. This enables
quick comparison of the results from different experiments.
Each point in the scatter graph represents a probe common to the two pivot
table columns in the comparison. A point is defined by the intersection of the
result value on the x and y axes for the common probe.
The scatter graph displays up to eight fold change lines (four pairs) to help
identify results that have changed significantly. The fold change lines are
defined in pairs: y = mx and y = 1/mx where m = 2,3,5 and 10 by default.
In GeneChip® data mode, average difference and fold change metrics are
generally the most informative because probe sets with significant changes
in expression levels can be easily identified.
Plotting the Scatter Graph
Plotting the scatter graph is the same in GeneChip® data mode (shown
in the following section) or spot data mode.
1.
Click the Scatter Graph button .
Alternatively, select Graph → Scatter from the menu bar.
⇒ The Scatter Graph dialog box appears and displays the pivot table
columns available for the scatter graph (Figure 11.2).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.2
Scatter Graph dialog box (GeneChip® data mode), pivot table columns available for the scatter
graph
2.
Use the drag-and-drop method to select each x-axis column in the
Available Columns box and place it in the Select X-Axis Column(s)
box (Figure 11.3).
Alternatively, select one or more columns in the Available Column box,
then click the down arrow
above the Select X-Axis Column(s) box.
To select adjacent columns, press and hold the SHIFT key while you
click the first and last column in the selection. To select non-adjacent
columns, press and hold the CTRL key while you click the columns.
3.
Use the drag-and-drop method to select each y-axis column in the
Available Columns box and place it in the Select Y-Axis Column(s)
box (Figure 11.3).
Alternatively, select one or more columns in the Available Column box,
then click the down arrow
above the Select Y-Axis Column(s) box.
The analysis results for a GeneChip® probe array set must be ordered
identically because the scatter graph compares the first analysis on
the x-axis to the first analysis on the y-axis (and so forth). If the
analyses are not identically ordered, many probe sets will not be
compared and plotted (only the common probe sets such as the
controls).
159
160
CH A P T E R 11
Graphing Results
Figure 11.3
Scatter Graph dialog box, GeneChip® probe data mode
4.
To change the order of an column in the Select X-Axis (or Select YAxis) Column(s) box, use the drag-and-drop method to move the
column to a new position in the list.
Alternatively, select the column, then click the up
or down
arrow
located at the inside of the Select X or Select Y Axis Column(s) box.
5.
To change the scatter graph axes from log scale (default) to linear scale,
click the Log Scale option to remove the check mark.
6.
Click OK.
⇒ The graph pane displays the scatter graph (Figure 11.4).
The points are color-coded using the display option colors in the Scatter
Graph tab of the Data Mining Options dialog box (click the Options
button).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.4
Scatter graph, GeneChip® data mode, signal metric
Working with the Scatter Graph
Working with the scatter graph is the same in GeneChip® data mode
(shown in the following section) or spot data mode.
Magnifying the Graph
1.
Press and hold the SHIFT key while using the click-and-drag method to
draw a rectangle over the graph area of interest (Figure 11.5).
2.
Release the mouse key.
⇒ The area selected by the rectangle is magnified (Figure 11.6).
161
162
CH A P T E R 11
Graphing Results
Figure 11.5
Scatter graph, rectangle selects an area to magnify (GeneChip® data mode)
Figure 11.6
Magnified area in the scatter graph
Affymetrix® Data Mining Tool User’s Guide
3.
To zoom out and restore the graph, right-click the graph and select Full
Out Zoom from the shortcut menu.
Locating a Probe
Select a probe in the pivot table to quickly locate it in the scatter graph.
1.
Click and hold the probe name in the pivot table.
⇒ The corresponding point in the scatter graph is highlighted
(Figure 11.7).
The highlighting is removed when the mouse button is released.
Figure 11.7
Scatter graph highlights the probe selected in the pivot table (GeneChip® data mode)
163
164
CH A P T E R 11
Graphing Results
Viewing Probe Information & Annotating Probes
1.
To display probe and corresponding gene information, click a point in
the scatter graph.
⇒ The probe name, analyses names, metrics from the pivot table and a
brief description of the gene are displayed to the right of the graph
(Figure 11.8).
Figure 11.8
Scatter graph displaying probe information (GeneChip® data mode)
2.
To obtain further gene information, select an Internet website from the
drop-down list, then click Information.
⇒ The default Internet browser is started and automatically opens the
selected website.
3.
Double-click a point in the graph or a pivot table row to display the
Description dialog box (Figure 11.9).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.9
Description dialog box
⇒ The Description dialog box displays a brief description of the probe
(probe set or spot probe), the sequence that is designed to
interrogate and any annotations associated with the probe.
4.
To enter an annotation, click Annotate.
⇒ The Annotate dialog box appears (Figure 11.10).
Figure 11.10
Annotate dialog box
5.
Enter comments in the Annotation box, then click OK.
⇒ The annotation is added to the Description dialog box.
165
166
CH A P T E R 11
Graphing Results
Selecting Points in the Graph
The lasso feature enables you to quickly select and focus on points of
interest in the scatter graph by drawing a line around them (roping). The
pivot table displays only the rows that correspond to the roped points
(all other rows are hidden).
Probes selected by roping may be conveniently annotated as a group or
saved in a probe list that can be applied to the filter grid of a subsequent
query.
1.
Click the Lasso Points button .
Alternatively, select Graph → Lasso Points.
⇒ The mouse pointer changes to a pair of cross hairs (+) when it is
positioned over the scatter graph.
2.
To rope points of interest, position the cross hairs near the group of
points, then do one of the following:
■
Click and hold the mouse button while you draw a complete circle
around the points (Figure 11.11); or
■
Click the mouse, move it to draw a line segment, then click the
mouse again to start drawing a new line segment. Repeat until you
return the cross hairs to the starting point and the lines segments
enclose the points of interest (Figure 11.12).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.11
Roped points in the scatter graph
Figure 11.12
Roped points in the scatter graph
167
168
CH A P T E R 11
Graphing Results
3.
To terminate the roping operation, double-click the mouse or press the
ESC key.
⇒ The scatter graph displays the selected points in orange color
(default selected point color that is user-specified in the Options
dialog box, see Changing Graph Colors on page 202).
⇒ The pivot table displays only the rows that correspond to the roped
points (all other rows are hidden).
4.
To restore the hidden rows to the pivot table, right-click the pivot table
and select Show All Pivot Rows from the shortcut menu.
5.
To clear the selection from the graph, right-click the graph and select
Clear Selection from the shortcut menu.
⇒ The roped points are deselected and all rows (probes) are restored to
the pivot table.
Scatter Graph Options
Preferences for the scatter graph display may be set in the Data Mining
Options dialog box (Figure 11.13). Newly selected options are immediately
applied to an existing graph and subsequent sessions for you.
1.
Click the Options button
Alternatively:
■
■
, then click the Scatter Graph tab.
Right-click the scatter graph and select Options from the shortcut
menu; or
Select View → Options from the menu bar, then click the Scatter
Graph tab.
⇒ The Data Mining Options dialog box appears and displays the
scatter graph options (Figure 11.13).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.13
Data Mining Options dialog box, Scatter graph tab, GeneChip® data mode (left) and spot data mode (right)
Point Options
Point size
The point size number determines the dot size for a
graph point. Enter a larger point size for easier viewing.
Use a smaller point size for higher resolution graphs.
Color by
Absolute Call
In GeneChip® data mode, select this option to colorcode the points according to the colors assigned to the
absolute or detection call combination of the x and yaxis analyses (as displayed in the Scatter Graph tab of
the Data Mining Options dialog box, see Table 11.1).
Note: You must pivot the absolute or detection call data.
169
170
CH A P T E R 11
Graphing Results
Color by
Difference Call
In GeneChip data mode, select this option to color-code
the points according to the colors assigned to the
difference or change call for the x-axis analysis (as
displayed in the Colors section of the Data Mining
Options dialog box). There are five possible difference
calls: decrease (D), marginal decrease (MD), no change
(NC), marginal increase (MI) and increase (I).
If the X-axis analysis does not have a difference or
detection call, then the difference or detection call for
the y-axis analysis is used. If neither the x or y-axis
analysis has a difference or detection call, Point Color is
used.
Use Point Color
Select this option to display all graph points using the
Point Color (default is black) in the Colors section of the
Data Mining Options dialog box.
Table 11.1
Absolute or detection call combinations in the scatter graph (GeneChip® data mode)
Absent in Y
Marginal in Y
Present in Y
Absent in X
A-A
A-M
A-P
Marginal in X
M-A
M-M
M-P
Present in X
P-A
P-M
P-P
Colors
The colors of the absolute and difference call categories as well as other
scatter graph items (graph points, graph background, selected or roped
points, fold change lines) may be changed. (For further information, see
Changing Graph Colors on page 202.)
Affymetrix® Data Mining Tool User’s Guide
Fold Change Lines
The default fold change lines are defined in four pairs: y = 2x and y = 1/2x,
y = 3x and y = 1/3x, y =10x and y = 1/10x, y = 30x and y = 1/30x.
1.
To redraw the fold change lines, enter new values in the edit boxes.
Only integer values may be entered.
2.
Remove the check mark to turn off the display of that pair of fold
change lines.
Fold Change Graph
The fold change graph is a scatter plot that displays the fold change for a
user-specified set of base and comparison columns. (Appendix A describes
the fold change calculation.) Numeric pivot table columns are available for
the fold change graph.
Each point in the graph represents a probe (probe set or spot probe) that is
common to the base and comparison column. The y-axis coordinate of a
point is the average fold change for all of the base-comparison column pairs
that contain the probe. The x-axis coordinate is the average result value for
all of the comparison columns that contain the probe.
The fold change graph supports calculations with replicates. All pairs of
replicate comparison and base columns contribute to the fold change graph.
The fold change is averaged when the probe is repeated (for example, when
the query returns analysis results from several different GeneChip® probe or
spot arrays of the same type, or analysis results from the same probe found
on different types of GeneChip probe or spot arrays).
For the example replicate data in Table 11.2, DMT calculates the average fold
change values from rows 1 and 2, 3 and 4, 5 and 6, and 7 and 8 (excluding
the control probes).
171
172
CH A P T E R 11
Graphing Results
Table 11.2
Sample replicate data for the fold change calculation
Base Column
Comparison Column
1
rep1base000A
rep1samp030A
2
rep2base000A
rep2samp030A
3
rep1base000B
rep1samp030B
4
rep2base000B
rep2samp030B
5
rep1base000C
rep1samp030C
6
rep2base000C
rep2samp030C
7
rep1base000D
rep1samp030D
8
rep2base000D
rep2samp030D
Multiple pivot table columns may be assigned to each axis. For example,
Figure 11.14 displays the fold change for two sets of base and comparison
columns. N002AS-Avg Diff and N004AS-Avg Diff are the base columns.
N006AS-Avg Diff and N008AS-Avg Diff are the comparison columns.
Affymetrix® Data Mining Tool User’s Guide
Figure 11.14
Fold change graph
Plotting the Fold Change Graph
Plotting the fold change graph is the same in GeneChip® data mode
(shown in the following section) or spot data mode.
1.
Click the Fold Change Graph button .
Alternatively, select Graph → Fold Change from the menu bar.
⇒ The Fold Change Graph dialog box (Figure 11.15) appears and
displays the pivot table columns available for the fold change
graph.
173
174
CH A P T E R 11
Graphing Results
Figure 11.15
Fold Change Graph dialog box (GeneChip® data mode), pivot table columns available for the fold
change graph
2.
Use the drag-and-drop method to select each base column in the
Available Columns box and place it in the Select Base Column(s) box
(Figure 11.16).
Alternatively, select one or more base columns in the Available
Columns box, then click the down arrow
above the Select Base
Column(s) box.
To select adjacent columns, press and hold the SHIFT key while you
click the first and last column in the selection. To select non-adjacent
columns, press and hold the CTRL key while you click the columns.
3.
Use the drag-and-drop method to select each comparison column in the
Available Columns box and place it in the Select Comparison
Column(s) box (Figure 11.16).
Alternatively, select one or more comparison columns in the Available
Columns box, then click the down arrow
above the Select
Comparison Column(s) box.
Affymetrix® Data Mining Tool User’s Guide
Figure 11.16
Fold Change Graph dialog box, GeneChip® probe data mode
4.
To change the order of a column in the Select Base (or Select
Comparison) Column(s) box, use the drag-and-drop method to move
the column to a new position in the list.
Alternatively, select the column, then click the up
or down
arrow
located at the inside of the Select Base (or Comparison) Column(s)
box.
5.
To change the fold change graph axes from log scale (default) to linear
scale, click the Log Scale option to remove the check mark.
6.
Click OK.
⇒ The graph pane displays the fold change graph (Figure 11.17).
175
176
CH A P T E R 11
Graphing Results
Figure 11.17
Fold change graph (GeneChip® data mode)
Working with the Fold Change Graph
Working with the fold change graph is the same in GeneChip® data
mode (shown in the following section) or spot data mode.
Magnifying the Graph
1.
Press and hold the SHIFT key while using the click-and-drag method to
draw a rectangle over the area of interest in the graph (Figure 11.18).
2.
Release the mouse key.
⇒ The area selected by the rectangle is magnified (Figure 11.19).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.18
Fold change graph, rectangle selects area to magnify
Figure 11.19
Magnified area in the fold change graph
177
178
CH A P T E R 11
Graphing Results
3.
To zoom out and restore the graph, right-click the graph and select Full
Out Zoom from the shortcut menu.
Locating Probes in the Graph
Select a probe in the pivot table to quickly locate it in the scatter graph.
1.
Click and hold the probe name in the pivot table.
⇒ The corresponding point is highlighted in the fold change graph
(Figure 11.20).
The highlighting is removed when the mouse button is released.
Figure 11.20
Click a probe name to in the pivot table to highlight the corresponding point in the fold change graph
Affymetrix® Data Mining Tool User’s Guide
Viewing Probe Information & Annotating Probes
1.
To display probe and corresponding gene information, click a point in
the fold change graph.
⇒ The probe name, analyses names, results from the pivot table and a
brief description of the gene are displayed to the right of the graph
(Figure 11.21).
Figure 11.21
Fold change graph displaying probe information (GeneChip® data mode)
2.
To obtain further gene information, select an Internet website from the
drop-down list, then click Information.
⇒ The default Internet browser is started and automatically opens the
selected website.
3.
Double-click a point in the graph or a pivot table row to display the
Description dialog box (Figure 11.22).
179
180
CH A P T E R 11
Graphing Results
Figure 11.22
Description dialog
⇒ The Description dialog box appears and displays a brief description
of the probe (probe set or spot probe), the sequence that it is
designed to interrogate and any annotations associated with the
probe.
4.
To enter an annotation, click Annotate.
⇒ The Annotate dialog box appears (Figure 11.23).
Figure 11.23
Annotate dialog box
5.
Enter comments in the Annotation box, then click OK.
⇒ The annotation is added to the Description dialog box.
Affymetrix® Data Mining Tool User’s Guide
Selecting Points in the Graph
The lasso feature enables you to quickly select and focus on points of
interest in the fold change graph by drawing a line around them (roping).
The pivot table displays only rows that correspond to the roped probes (all
other rows are hidden). Probes selected by roping may be conveniently
annotated as a group or included in a probe list that can be applied to the
filter grid of a subsequent query.
1.
Click the Lasso Points button
Points from the menu bar.
. Alternatively, select Graph → Lasso
⇒ The mouse pointer changes to a pair of cross hairs (+) when
positioned over the fold change graph.
2.
To rope points of interest, position the cross hairs near the group of
points, then do one of the following:
■
Click and hold the mouse button while you draw a complete circle
around the points (Figure 11.24); or
■
Click the mouse, move it to draw a line segment, then click the
mouse again to start drawing a new line. Repeat until the cross hairs
return to the starting point and the lines segments enclose the points
of interest (Figure 11.25).
Figure 11.24
Roped points in the fold change graph
181
182
CH A P T E R 11
Graphing Results
Figure 11.25
Roped points in the fold change graph
3.
To terminate the roping operation, double-click the mouse or press the
ESC key.
⇒ The fold change graph displays the selected points in orange color
(default selected point color is user-specified in the Options dialog
box, see Changing Graph Colors on page 202).
The pivot table displays only the rows that correspond to the roped
points (all other rows are hidden).
4.
To restore the hidden rows to the pivot table, right-click the pivot table
and select Show All Pivot Rows from the shortcut menu.
5.
To clear the selection from the graph, right-click the graph and select
Clear Selection from the shortcut menu.
⇒ The roped graph points are deselected and all probes (rows) are
restored to the pivot table.
Affymetrix® Data Mining Tool User’s Guide
Fold Change Graph Options
Preferences for the fold change graph display may be set in the Data Mining
Options dialog box (Figure 11.26). Newly selected options are immediately
applied to an existing graph and subsequent sessions for you.
1.
Click the Options button , then click the Fold Change tab.
Alternatively, do either of the following:
■
■
Right-click the fold change graph and select Options from the
shortcut menu; or
Select View → Options from the menu bar, then click the Fold
Change tab.
⇒ The Data Mining Options dialog box appears and displays the fold
change options (Figure 11.26).
Figure 11.26
Data Mining Options dialog box, Fold Change tab
183
184
CH A P T E R 11
Graphing Results
Point Options
Point size
The point size number determines the dot size for a
graph point. Enter a larger point size for easier viewing,
but use a smaller point size for higher resolution
graphs.
Fold Change Calculation
Default Threshold
The intensity threshold value used to calculate the fold
change is a function of the noise, scaling or
normalization factor and the noise multiplier of the two
analyses. The intensity threshold value is calculated by
the expression algorithm and is stored in the Mining
database.
If the intensity threshold value is not found in the
database, then DMT uses the default threshold value you
entered for the intensity threshold.
(Appendix A describes the fold change calculation.)
Note: In spot data mode, set the default threshold to zero
(Data Mining Options dialog box, Fold Change tab).
Y-Axis Gridlines
The fold change graph displays major and minor y-axis gridlines as
horizontal lines with y-intercepts specified in the edit boxes (only integer
values may be entered).
A solid line labeled with the y-intercept value represents the major Y-axis
gridline and a dotted line represents the minor gridline.
Colors
The colors assigned to the fold change graph points, background, or selected
(roped) points may be changed. (See Changing Graph Colors on page 202.)
Affymetrix® Data Mining Tool User’s Guide
Series Graph
The series graph displays numeric pivot table columns in a line (default) or
bar graph format (Figure 11.27 and Figure 11.28). Both graph formats plot
numeric pivot columns on the x-axis and the data associated with each probe
(probe set or spot probe) in the column on the y-axis.
The series graph is an extremely useful way to:
■
Monitor gene expression across different experiments or over a time
course.
■
View probes roped in the scatter or fold change graph.
■
View individual data for cluster members (saved in a probe list).
Figure 11.27
Series line graph (GeneChip® data mode)
185
186
CH A P T E R 11
Graphing Results
Figure 11.28
Series bar graph (GeneChip® data mode)
Plotting the Series Graph
Plotting the series graph is the same in GeneChip® data mode (shown
in the following section) or spot data mode.
1.
Click the Series Graph button . Alternatively, select Graph →
Series from the menu bar.
⇒ The Series Graph dialog box appears and displays the pivot table
columns available for the series graph (Figure 11.29).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.29
Series Graph dialog box, pivot table columns available for the series graph
2.
Select the pivot table columns for the series graph.
To select adjacent columns, press and hold the SHIFT key while you
click the first and last column in the selection. To select non-adjacent
columns, press and hold the CTRL key while you click the columns.
3.
Click OK.
⇒ The graph pane displays the series graph (Figure 11.30).
The line graph format is the default. (See Series Graph Options on
page 191 to specify the bar graph format.
)
The series graph does not include probe sets that are hidden in the
pivot table.
187
188
CH A P T E R 11
Graphing Results
Working with the Series Graph
Working with the series graph is the same in GeneChip® data mode
(shown in the following section) or spot data mode.
Locating Probes in the Graph
Select a probe in the pivot table to quickly locate it in the series graph.
1.
Click and hold the probe name in the pivot table.
⇒ In the series line graph, the corresponding line in the graph is
highlighted.
⇒ In the series bar graph, the portion of the graph that contains the
probe is displayed.
The highlighting is removed when the mouse button is released.
Figure 11.30
Series line graph, highlighted line (top) corresponds to the selected pivot table row
Affymetrix® Data Mining Tool User’s Guide
Viewing Probe Information & Annotating Probes
1.
To display information about a probe, move the pointer over a point (or
bar) in the series graph.
⇒ A pop-up tool tip displays the probe name and associated data
(Figure 11.31).
2.
To view sequence information, double-click a point or bar in the series
graph, or a pivot table row.
⇒ The Description dialog box appears (Figure 11.32) and displays a brief
description of the gene, its sequence or the portion of the gene
sequence the probe is designed to interrogate.
Figure 11.31
Series graph
189
190
CH A P T E R 11
Graphing Results
Figure 11.32
Description dialog box
3.
To view further gene information, select an Internet website from the
drop-down list, then click Information.
⇒ The default Internet browser is started and automatically opens the
selected website.
4.
To enter an annotation, click Annotate.
⇒ The Annotate dialog box appears (Figure 11.33).
Figure 11.33
Annotate dialog box
5.
Enter comments in the Annotation box, then click OK.
⇒ The annotation is added to the Description dialog box.
Affymetrix® Data Mining Tool User’s Guide
Series Graph Options
Preferences for the series graph display may be set in the Data Mining
Options dialog box (Figure 11.26). Newly selected options are immediately
applied to an existing graph and subsequent sessions for you.
1.
Click the Options button
Alternatively:
■
■
, then click the Series Graph tab.
Right-click the series graph, select Options from the shortcut menu;
or
Select View → Options from the menu bar, then click the Series
Graph tab.
⇒ The Data Mining Options dialog box appears and displays the
Series Graph options (Figure 11.34).
Figure 11.34
Data Mining Options dialog box, Series Graph tab
191
192
CH A P T E R 11
Graphing Results
Graph Type
Bar Graph or
Line Graph Option
Select a format option to display information in the
series graph as described in Table 11.3.
Table 11.3
Series graph formats
Bar
Line
X-axis
Probe set or spot probe names
Pivot table column or probe
Y-axis
User-specified data for each probe in
the analysis
User-specified data for each probe in
the column
Series Bar Graph Options
Probe Set Width (%)
Determines the width of the graph bar.
X-Axis Parameters
Visible
The number of probes displayed on the x-axis in the
viewable portion of the graph pane.
Series Line Graph Options
X-Axis
Select probe or pivot table columns for display on
the x-axis.
Point Size
Determines the dot size for a graph point. Enter a
larger point size for easier viewing, but use a smaller
point size for higher resolution graphs.
X-Axis Parameters
Visible
Specifies the number of probes or columns
displayed on the x-axis in the viewable portion of
the graph pane.
Affymetrix® Data Mining Tool User’s Guide
Colors
Up to 25 different colors are applied to the bars or lines. If there are more
than 25 bars or lines, the colors are re-used.
The color of the series graph points, background, lines, or bars may be
changed. (See Changing Graph Colors on page 202.)
Histogram
The histogram plots a frequency distribution of data from numeric pivot
table columns. DMT sorts the data into groups or bins (x-axis coordinate)
and plots the number of probe (probe sets or spot probes) per bin (y-axis
coordinate) for each analysis. The resulting data distribution helps evaluate
the proportion of genes expressed at a particular level.
Plotting the Histogram
Plotting the histogram is the same in GeneChip® data mode (shown in
the following section) or spot data mode.
1.
Click the Histogram button . Alternatively, select Graph →
Histogram from the menu bar.
⇒ The Histogram dialog box (Figure 11.35) appears and displays the
numeric pivot table columns available for the histogram.
Figure 11.35
Histogram dialog box
193
194
CH A P T E R 11
Graphing Results
2.
Select the desired columns for the histogram.
To select adjacent columns, press and hold the SHIFT key while you
click the first and last column in the selection. To select non-adjacent
columns, press and hold the CTRL key while you click the columns.
3.
Click OK.
⇒ The graph pane displays the histogram (Figure 11.36).
Figure 11.36
Histogram of average difference data (GeneChip® data mode)
Affymetrix® Data Mining Tool User’s Guide
Working with the Histogram
Working with the histogram is the same in GeneChip® data mode
(shown in the following section) or spot data mode.
Viewing Histogram Information & Annotating Probes
1.
To view information for a particular histogram bar, place the mouse
pointer over that area of the histogram.
⇒ A pop-up tool tip displays the minimum and maximum value for the
bin and the number of probe sets from the corresponding column in
the bin (Figure 11.36).
2.
To view sequence information, double-click a row in the pivot table.
⇒ The Description dialog box appears (Figure 11.37) and displays a brief
description of the gene, its sequence or the portion of the gene
sequence the probe is designed to interrogate.
Figure 11.37
Description dialog box
3.
To view further gene information, select an Internet website from the
drop-down list, then click Information.
⇒ The default Internet browser is started and automatically opens the
selected website.
4.
To enter an annotation, click Annotate.
⇒ The Annotate dialog box appears (Figure 11.33).
195
196
CH A P T E R 11
Graphing Results
Figure 11.38
Annotate dialog box
5.
Enter comments in the Annotation box, then click OK.
⇒ The annotation is added to the Description dialog box.
Adding Landmarks
One or more landmarks (Figure 11.40) may be added to the histogram to
identify where a user-specified probe falls in the distribution.
1.
Right-click the histogram and select Add Landmark from the shortcut
menu.
⇒ The Landmarks dialog box appears (Figure 11.39).
Figure 11.39
Landmarks dialog box
Affymetrix® Data Mining Tool User’s Guide
2.
Select one or more columns, then enter a probe name.
3.
Click OK.
⇒ The histogram displays the landmark labeled with the column and
probe name (Figure 11.40).
Figure 11.40
Histogram with landmark for average difference value of probe set M95787_at in analysis N004AS
4.
To hide the landmark(s), right-click the histogram and select Hide
Landmarks from the shortcut menu.
5.
To display the hidden landmark(s), right-click the histogram and select
Show Landmarks from the shortcut menu.
6.
To clear all landmarks, right-click the histogram and select Remove
Landmarks from the shortcut menu.
197
198
CH A P T E R 11
Graphing Results
Magnifying the Histogram
1.
Press and hold the SHIFT key while using the click-and-drag method to
draw a rectangle over the graph area of interest (Figure 11.41).
2.
Release the mouse key.
⇒ The area selected by the rectangle is magnified (Figure 11.42).
Figure 11.41
Histogram, rectangle selects area to magnify
Figure 11.42
Magnified area of the histogram
3.
To zoom out and restore the graph, right-click the histogram and select
Full Out Zoom from the shortcut menu.
Affymetrix® Data Mining Tool User’s Guide
Histogram Options
Preferences for the histogram display may be set in the Data Mining Options
dialog box (Figure 11.43). Newly selected options are immediately applied to
an existing graph and subsequent sessions for you.
1.
Click the Options button , then click the Histogram tab.
Alternatively, do either of the following:
■
■
Right-click the histogram and select Options from the shortcut
menu; or
Select View → Options from the menu bar, then click the
Histogram tab.
⇒ The Data Mining Options dialog box appears (Figure 11.43).
Figure 11.43
Data Mining Options dialog box, Histogram tab
Graph Options
Combined
Histogram
All of the pivot table columns in a single bin are combined into
one bar (Figure 11.44). If a single column was selected for the
histogram, the Combined Histogram and Separate
Histograms options are identical.
199
200
CH A P T E R 11
Graphing Results
Separate
Histograms
Each bar in the histogram represents one pivot table column
and is color-coded according to the legend at the right of the
histogram (Figure 11.45). Select the Separate Histograms option
to plot a separate frequency distribution for each column.
Figure 11.44
Histogram, combined histogram option
Figure 11.45
Histogram, separate histograms option
Affymetrix® Data Mining Tool User’s Guide
Bin Options
Range
Select this option to specify the range of values for
the histogram.
Fixed Bin Size
Select this option to define the range of data values
for each bin. Each bin is set to the user-specified Bin
Size.
If a range is specified, it determines where the first
bin begins, otherwise the lowest data value is used.
The first bin begins at the lowest data value or the
low value set in the Range option. The histogram
creates sufficient bins to plot all of the data using the
user-specified Bin Size.
If a Range is specified, the number of bins = Range/
Bin Size.
Variable Bin Size
Select this option to define the number of histogram
bins.
Number of Bins is the number of bins plotted.
First Bin Upper Limit defines the boundary value
between the first and second bin. The first bin
includes all values less than or equal to the first bin
upper limit.
The user-specified Range and Number of Bins
determine the size of the remaining bins (increases
exponentially).
Use the Variable Bin Size and Range options to compare the
distribution of values from one or more analyses.
For example, set the Number of Bins = 10, First Bin Upper Limit = 40
and Range = 0 to 10,000. The histogram plots 10 bins that contain an
increasingly larger range of values.
201
202
CH A P T E R 11
Graphing Results
X-Axis Options
Ticks per label
Defines the number of graph markers or tick marks
on the x-axis between the numeric labels. The
numeric label shows the range for a bin. The
histogram displays a tick mark for each bin.
Note: Set the Ticks per label option to 1/2 or 1/4 the
Number of Bins in the Variable Bin Size option.
This displays enough labels to view the histogram
ranges without overloading the graph.
Color Options
The color of the histogram background, landmarks,
or bars may be changed. (See Changing Graph
Colors on page 202.)
Other Graphing Features
Enlarging the Graph Pane
1.
Right-click the graph pane and select Expand Graph from the shortcut
menu. Alternatively, select View → Expand Graph from the menu bar.
2.
Repeat step 1 to restore the graph pane to its original size.
Changing Graph Colors
1.
Click the Options button , then click the graph tab of interest.
Alternatively, do one of the following, then click the graph tab of
interest:
■
Right-click the graph and select Options from the shortcut menu; or
■
Select View → Options from the menu bar.
⇒ The Data Mining Options dialog box appears and displays the
selected graph options (Figure 11.46).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.46
Data Mining Options dialog box, Scatter Graph tab
2.
To change the color of an item (for example, Selected Point Color in
Figure 11.46), click the associated color square in the Data Mining
Options dialog box.
⇒ The Color palette appears (Figure 11.47).
Figure 11.47
Color palette (expanded palette, right)
203
204
CH A P T E R 11
Graphing Results
3.
Click a new basic color in the palette or click Define Custom Colors to
define a custom color.
⇒ The color palette expands to display the custom color field
(Figure 11.47).
4.
To define a custom color, use the click-and-drag method to position the
cross hairs in the custom color field. In the luminosity scale to the right,
adjust the color brightness by moving the arrow up or down the scale.
⇒ The Color|Solid swatch displays the custom color.
5.
When finished, click Add to Custom Colors to apply the color.
Color selections are saved on a per user basis.
Copying and Clearing Graphs
1.
To copy a graph to the system clipboard, right-click the graph and select
Copy Graph from the shortcut menu. Alternatively, select
Edit → Copy Graph from the menu bar.
2.
To clear a graph from the graph pane, right-click the graph and select
Clear Graph from the shortcut menu.
3.
To clear all graphs from the graph pane, select Edit → Clear Graphs
from the menu bar.
Printing Graphs
1.
In the graph pane, click the graph tab you want to print.
2.
Click the Print button in the toolbar.
⇒ The Print dialog box appears (Figure 11.48).
Affymetrix® Data Mining Tool User’s Guide
Figure 11.48
Print dialog box
3.
Confirm that the Graph option is selected.
4.
Click OK.
205
206
CH A P T E R 11
Graphing Results
12
Chapter 12
Statistical Analyses
12
DMT offers several types of statistical analyses to help evaluate and
compare replicate data. Statistical operators can be applied to
numeric pivot table columns. The resulting data are displayed in the
pivot table and are available for graphing and further statistical
analysis.
Selecting an Operator
Open the Analysis Function dialog box to select a statistical operator(s).
■
Select Analyze → Analysis Function from the menu bar.
⇒ The Analysis Function dialog box appears (Figure 12.1).
Figure 12.1
Analysis Function dialog box
209
210
CH A P T E R 12
Statistical Analyses
Average, Median,
Standard Deviation
or Inter-Quartile Range
One or more of the following operators can be applied to user-specified
numeric columns in the pivot table:
Average
Computes the average for the selected pivot table
column(s).
Median
Computes the median (50th percentile) for the
selected pivot table column(s).
Standard Deviation
Calculates the standard deviation for the selected
pivot table column(s).
Inter-Quartile Range
Computes the 75th and the 25th percentile value for
the selected pivot table column(s). The inter-quartile
range is the 75th percentile minus the 25th percentile.
1.
In the Analysis Function dialog box (Figure 12.2), select one or more of
the operators: Average, Median, Standard Deviation, or InterQuartile Range.
Figure 12.2
Analysis Function dialog box
Affymetrix® Data Mining Tool User’s Guide
2.
Enter a name for the new column(s) of data that will be generated, then
click Next.
⇒ The column selection dialog box appears (Figure 12.3).
Figure 12.3
Column selection dialog box, average and standard deviation analysis
3.
Select one or more pivot table columns for the operators selected in Step
2, then click Finish.
⇒ The pivot table (right side) displays the new column(s) of statistical
results (Figure 12.4).
The column header displays the user-specified name, followed by the
type of operator. For example, in Figure 12.4, the new column names are
Tumor-Average and Tumor-Stdev.
Figure 12.4
Pivot table displaying average and standard deviation results
211
212
CH A P T E R 12
Statistical Analyses
Fold Change
The fold change (FC) operator compares user-specified pivot table columns
(base and comparison analysis) and computes the fold change for each probe
in the comparison. (See Appendix A for more information about the fold
change calculation).
1.
In the Analysis Function dialog box, select the Fold Change operator
(Figure 12.5).
Figure 12.5
Analysis Function dialog box
2.
Enter a column name for the fold change results, then click Next.
⇒ The Column selection dialog box appears (Figure 12.6).
Affymetrix® Data Mining Tool User’s Guide
Figure 12.6
Column selection dialog box, fold change analysis
3.
Select a base and comparison column, then click Finish.
⇒ The pivot table (right side) displays the fold change results
(Figure 12.7).
Figure 12.7
Pivot table, fold change results
213
214
CH A P T E R 12
Statistical Analyses
T-Test
The T-Test analyzes two groups of pivot table columns (control and
experiment) and determines the significance of change of the means of the
two groups as well as the direction of the change.
It computes a p-value for each comparison. The p-value is the probability
value that the observed difference occurred by chance. A small p-value (for
example, 0.01) means it is unlikely (only a one in 100 chance) that such a
mean difference would occur by chance under the assumption that the mean
difference was zero.
The T-Test assumes two samples of unequal variances and a normal
distribution of the data. DMT uses an unpaired, one-sided T-Test and
converts the p-value to a two-sided p-value. It shows the direction of change
in a separate pivot table column.
1.
In the Analysis Function dialog box, select the T-Test operator
(Figure 12.8).
Figure 12.8
Analysis Function dialog box
2.
Enter a column name for the T-Test results.
3.
Confirm the default P Cutoff, or enter a new value.
Affymetrix® Data Mining Tool User’s Guide
If the computed p-value for a call is greater than the p Cutoff, the
Change Direction call is None (no change).
4.
Click Next.
⇒ The column selection dialog box appears (Figure 12.9).
Figure 12.9
Column selection dialog box, T-Test
5.
Select two or more pivot columns for the Control and two or more pivot
columns for the Experiment, then click Finish.
⇒ The pivot table displays two columns of T-Test results: the
computed P Value and the Change Direction call (Figure 12.10).
Figure 12.10
Pivot table displaying T-Test results
215
216
CH A P T E R 12
Statistical Analyses
Mann-Whitney Test
The Mann-Whitney test compares two groups of pivot table columns
(control and experiment) to determine the significance of change as well as
the direction of change. It computes a p-value for each comparison.
The Mann-Whitney test is the nonparametric method for comparing two
unpaired groups. It does not assume a particular distribution of the data.
1.
In the Analysis Function dialog box, select the Mann-Whitney operator
(Figure 12.11).
Figure 12.11
Analysis Function dialog box
2.
Enter a column name for the Mann-Whitney test results.
3.
Confirm the default P Cutoff, or enter a new value.
If the computed p-value for a call is greater than the p Cutoff, the
Change Direction call is None.
4.
Click Next.
⇒ The column selection dialog box appears (Figure 12.12).
Affymetrix® Data Mining Tool User’s Guide
Figure 12.12
Column selection dialog box, Mann-Whitney test
5.
Select two or more pivot columns for the Control and two or more pivot
columns for the Experiment, then click Finish.
⇒ The pivot table displays the Mann-Whitney test results (Figure 12.13).
Figure 12.13
Pivot table, Mann-Whitney results
217
218
CH A P T E R 12
Statistical Analyses
Count & Percentage
The Count & Percentage operator is only available in GeneChip® data mode.
For each probe set in user-specified pivot table columns, it counts the
number and computes the percentage of:
■
Absolute or detection calls (P, M, or A)
■
Difference or change calls (I, MI, NC, MD, D), or
■
Calls within a user-specified numeric range
For the Count & Percentage operator, you can specify any combination of:
■
Absolute call, difference call and numeric range
■
Detection call, change call and numeric range
A probe set must meet all conditions to be counted.
1.
In the Analysis Function dialog box, select the Count & Percentage
operator (Figure 12.14).
Figure 12.14
Analysis Function dialog box
2.
Specify the conditions (absolute call, difference call, or numeric
thresholds) a probe set must meet to be counted.
Affymetrix® Data Mining Tool User’s Guide
If numeric thresholds are specified, DMT counts only the values within
the threshold limits. If both > and < threshold options are selected, they
are combined in AND (intersection) fashion.
The example in Figure 12.14 specifies a probe set must have an absolute
call = P, difference call = I and expression metric >200 to be counted.
3.
Enter a column name for the Count & Percentage results.
4.
Click Next.
⇒ The column selection dialog box appears (Figure 12.15).
Figure 12.15
Column selection dialog box, Count & Percentage operator
5.
Select the pivot columns and parameter(s) for the count and percentage
analysis, then click Finish. (The Parameters box is only displayed if a
numeric threshold was specified in the Analysis Function dialog box.)
⇒ The pivot table displays the Count & Percentage results
(Figure 12.16).
For example, in Figure 12.16, the count for probe set AB002533_at is 16
because the probe set met all conditions (absolute call = P, difference
call = I, average difference > 200) in 16 of 16 columns, resulting in
percentage = 100%.
219
220
CH A P T E R 12
Statistical Analyses
Figure 12.16
Pivot table, Count & Percentage results
13
Chapter 13
Matrix Analysis
13
Matrix analysis compares two probe lists, determines the probes in
common (probe sets or spot probes) and computes an overlap or nonoverlap significance score for the two probe lists. The matrix provides
a spreadsheet framework for comparing probe lists and displays the
overlap or non-overlap significance score for probe lists in the matrix.
See Appendix D for further information about the matrix algorithm.
Overview
Matrix analysis uses the binomial distribution to calculate the probability
that an overlap between two lists occurs by chance. (See Appendix D for
more information about the binomial distribution.)
The analysis compares two separate lists and calculates the significance of
the overlap between them. To illustrate how the significance is determined,
consider two independent sets, probe list A and Q. Probe list A has na
members and probe list Q has nq members. These sets were generated from
a total population size of t. (Note: the total population usually includes
additional members besides those in sets A and Q.)
The expected overlap between the lists based on random chance is important
in determining the overlap significance. The chance (or frequency, w) of
picking a member of set Q at random from the total population is: w = nq/t.
For example, if there are 10 member of Q in a total population of 100
members, then there is a ten percent chance of picking Q. If we make na
random picks (the number of members in set A) from this distribution, we
would expect to pick a member of set Q ten percent of the time. The
expected overlap between Q and A is na*w.
What we actually observe is there are x members that belong to both
classification A and Q. How close the observed overlap, x, is to the expected
overlap, na*w, determines the overlap significance. If these two values are
223
224
CH A P T E R 13
Matrix Analysis
close, then there is a high probability that the overlap is due to random
chance. The algorithm uses the binomial distribution to determine this
significance.
The observed overlap could be larger or smaller than the expected overlap. If
the observed overlap is larger than the expected, then set A is over
represented. If the observed overlap is smaller than the expected, then set A
is under represented.
Population Size
The total population is an important parameter in calculating overlap
significance. This is the total population from which the lists were
generated. It is defined as the number of members in common between the
two independent classification schemes.
The total population for clustering sets is the number of probe sets used
when generating the clusters. For the SOM clustering, this value is the total
number of probe sets contained in all of the clusters. For correlation
coefficient clustering, the population size is either the maximum number of
seeds used or the total number of probe sets in the pivot table, depending on
whether all or only the seed set were used to generate the final clusters. (See
Chapter 14 for a description of the clustering algorithms and parameters.)
Matrix analysis initially sets the population size as the number of unique
probe sets in the row and column probe lists of the matrix. If you are using
only a subset of the classification lists, or the lists do not include all of the
members that were used to generate the classification, then the calculated
population size is too small. In this case, change the total population to the
total number of members used to generate the classification.
Affymetrix® Data Mining Tool User’s Guide
Running a Matrix Analysis
1.
Select Analyze → Matrix from the menu bar.
⇒ The Matrix opens (Figure 13.1).
Figure 13.1
Matrix
2.
Click Select Rows.
⇒ The Select dialog box appears (Figure 13.2).
Figure 13.2
Select Probe Sets dialog box
225
226
CH A P T E R 13
Matrix Analysis
3.
Select the probe lists you want to include in the matrix rows, then click
OK.
⇒ The matrix displays the probe list names in the row headers
(Figure 13.3).
Figure 13.3
Matrix, rows specified
4.
Click Select Columns.
⇒ The Select dialog box appears (Figure 13.2).
5.
Select the probe lists you want to include in the matrix columns, then
click OK.
⇒ The matrix displays the probe list names in the column headers
(Figure 13.4).
Affymetrix® Data Mining Tool User’s Guide
Figure 13.4
Matrix, probe lists selected for the rows and columns
6.
Confirm the Population Size, or enter a new value.
The default population value is equal to the number of unique probes in
the rows and column probe lists. (See Population Size on page 224 for
information on how to set this value.)
7.
Click Calculate.
⇒ The algorithm computes the overlap (over represented probe sets)
or non-overlap (under represented probe sets) significance score for
each pair of probe lists in the matrix (Figure 13.5).
The overlap significance score increases as the overlap or lack of
overlap increases between two lists (see Appendix D).
To distinguish between overlap or non-overlap, the matrix highlights
scores that exceed the overlap significance threshold (pink) or are nonoverlap scores and exceed the significance threshold (yellow).
The threshold values in the Overlap and Non-overlap boxes can be
changed.
227
228
CH A P T E R 13
Matrix Analysis
Figure 13.5
Matrix displaying overlap significance scores
8.
Click Print to print the matrix.
9.
Click Close when finished to close the matrix.
14
Chapter 14
Cluster Analysis
14
Cluster analysis helps identify gene expression patterns (profiles) in
the data and groups together probe sets or spot probes with similar
gene expression patterns.
DMT offers two clustering algorithms: self organizing map (SOM)
and correlation coefficient clustering.
Self Organizing Map
(SOM) Algorithm
The self organizing map (SOM) algorithm is designed to cluster GeneChip®
average difference data (shown in this chapter). However, any numeric
column in the pivot table may be selected for cluster analysis. (Appendix D
describes the SOM algorithm and its user-modifiable parameters.)
The algorithm considers the expression levels of n probe sets in k
experiments as n points in k-dimensional space. Initially, the algorithm
randomly places a grid of nodes or centroids onto the k-dimensional space.
The algorithm iteratively adjusts the positions of the nodes to identify
clusters in the data.
231
232
CH A P T E R 14
Cluster Analysis
Running a SOM Cluster Analysis
Prior to cluster analysis, normalize GeneChip® signal data in Affymetrix®
Microarray Suite or DMT. Normalize spot probe intensity data in
Affymetrix® Jaguar™. (For more information, see Chapter 5.)
1.
Select Analyze → SOM Clustering from the menu bar.
⇒ The Select Columns for Clustering dialog box appears (Figure 14.1).
Figure 14.1
Select Columns for Clustering dialog box
2.
Select more than one pivot table column for SOM clustering, then click
OK.
⇒ The SOM Clustering dialog box appears (Figure 14.2).
Affymetrix® Data Mining Tool User’s Guide
Figure 14.2
SOM Clustering dialog box
The section SOM Filters on page 238, provides a description of the
thresholds, row variation and row normalization filters. Filtering the data is
optional, but recommended. See SOM Parameters on page 239 for a
description of the user-modifiable algorithm parameters.
3.
To apply threshold filtering, confirm the Thresholds values, MinVal
and MaxVal, or enter new values, then click Add>.
⇒ The threshold filter is displayed in the box to the right (Figure 14.3).
4.
To apply Row Variation filtering, confirm the row variation Max/Min
and Max-Min defaults or enter new values, then click Add>.
⇒ The row variation filter is displayed in the box to the right
(Figure 14.3).
5.
Click Compute to display the number of probe sets (or spot probes)
remaining after the row variation filter is applied to the data.
The Compute button is a tool for quickly confirming the row variation
Max/Min and Min-Min parameters. The number of rows (probe sets)
233
234
CH A P T E R 14
Cluster Analysis
that remain in the dataset after filtering is displayed as New Rows next
to the Compute button (Figure 14.3).
When you click Compute, any values entered in the Row Variation
edit boxes are also applied to the filters in the box on the upper right,
even when the Row Variation values do not appear in the filter box.
6.
To apply Row Normalization, confirm the Mean and Variance
defaults, or enter new values, then click Add>.
⇒ The row normalization filter is displayed in the box to the right
(Figure 14.3).
Figure 14.3
SOM Clustering dialog box, data filtering
7.
To change the order of a filter, highlight the filter, then click Down or
Up to move the filter to the desired position.
8.
To delete a filter, highlight the filter, then click Del. To delete all filters,
click Del All.
Affymetrix® Data Mining Tool User’s Guide
9.
Confirm the defaults for Parameters, or enter new values.
See SOM Parameters on page 239 for a description of the usermodifiable algorithm parameters.
10.
Click Run to filter the data and perform SOM cluster analysis.
⇒ The graph pane displays the results of the cluster analysis
(Figure 14.4).
Figure 14.4
SOM clusters
The rows and columns parameters generate the nodes that identify clusters.
For example, in Figure 14.4 the default rows and columns (6 x 3) generate 18
clusters (click the down arrow to scroll the cluster view).
The SOM algorithm maps clusters that have similar gene expression patterns
near one another. As a result, in Figure 14.4, the average gene expression
235
236
CH A P T E R 14
Cluster Analysis
patterns in Cluster 1 and Cluster 2 show the greatest similarity and those in
Cluster 1 and Cluster 18 are the most dissimilar.
Each cluster plot displays the cluster number followed by the number of
cluster members (in parentheses). The middle (red) graph line represents the
average gene expression pattern for the cluster. The two outer (blue) graph
lines represent the standard deviation of expression (Figure 14.5). Cluster plot
axes are not scaled identically.
SOM cluster results may show run-to-run variability due to the inherent
nature of the algorithm (for example, the random initialization process).
Figure 14.5
SOM cluster plot (4 pivot columns selected for clustering)
11.
Click a cluster plot to view the members in the Probes box (Figure 14.4).
Affymetrix® Data Mining Tool User’s Guide
Saving a Probe List
Saving a Selected Cluster as a Probe List
1.
Click the cluster you want to save.
⇒ The cluster members are displayed in the Probes box of the Cluster
tab (Figure 14.4).
2.
Enter a Probe List Name.
3.
Click Save Selected.
⇒ The data tree displays the probe list name.
Saving All Clusters as a Probe List
1.
Click Save All.
⇒ The Save All Clusters dialog box appears (Figure 14.6).
Figure 14.6
Save All Clusters dialog box
2.
Enter a cluster root name and click Save All.
⇒ The data file tree displays the probe lists (Figure 14.7).
Each probe list is named using the cluster root name followed by the
cluster number.
237
238
CH A P T E R 14
Cluster Analysis
Figure 14.7
Data file tree, Probe Lists directory
To quickly view data for the cluster members in a probe list, right-click
the probe list in the data tree, then select Highlight Pivot and
Graph from the shortcut menu. The pivot table displays only the rows
for the probe list (cluster members).
If the scatter, fold change and series line graphs were previously
plotted for the clustered columns, the scatter and fold change graphs
highlight the points from the probe list. The series line graph displays
only the probe list.
SOM Filters
The SOM filter values are user-modifiable. The default values are intended
for probe set average difference data.
Thresholds
The minimum and maximum thresholds are designed to exclude outlier data.
Data that exceed the maximum threshold value are changed to the maximum
threshold value. Data less than the minimum threshold value are changed to
the minimum threshold value.
Affymetrix® Data Mining Tool User’s Guide
Row Variation
The row variation filters are designed to exclude probe sets or spot probes
that do not significantly change expression level across the experiments.
DMT evaluates each probe set or spot probe across all selected columns and
includes it in the analysis if both of the following conditions are met:
1) maximum value/minimum value > 3 (default),
and
2) maximum value - minimum value > 100 (default)
The maximum and minimum row variation values are user-modifiable.
Row Normalization
This normalizes the data to a mean of zero and a variance of one. Row
normalization helps the algorithm identify clusters based on the shape of
expression patterns rather than absolute expression levels.
SOM Parameters
See Appendix D for further description of the SOM algorithm.
Rows & Columns Specifies the rows and columns of nodes that identify
clusters in the data. The number of nodes (rows x
columns) determines the number of clusters generated.
Epochs
Determines the number of iterations the algorithm runs.
Iterations = Epochs x Number of probe sets
Seeds
The number of times the algorithm runs through a set of
iterations. The algorithm selects the result that minimizes
the sum of the distances from the data points to the nodes.
Initialization
Initial placement of the nodes in k-dimensional space.
Random Vectors method randomly places the nodes in kdimensional space. Random Datapoints method places the
nodes on randomly-selected points.
239
240
CH A P T E R 14
Cluster Analysis
Neighborhood
Defines a distance from the target node (the node closest
to the point being considered). At each iteration, nodes in
the neighborhood are moved toward the point being
considered (updated).
Bubble neighborhood = a radial distance from the target
node. All nodes in the bubble neighborhood are updated
the same amount. Nodes outside the bubble neighborhood
are not updated.
In the Gaussian neighborhood, all nodes are updated. The
distance a node moves is a function of the distance of the
node from the target node. The greater the distance
between the node and the target node, the smaller the
distance the node is updated.
Initial
Initial width of the bubble neighborhood (default = 5).
neighborhood size
Final
Final width of the bubble neighborhood at the last
neighborhood size iteration.
Initial learning
rate
Initial distance (learning rate) a node is updated.
Final learning rate Final learning rate at the last iteration.
Correlation Coefficient
Clustering Algorithm
The correlation coefficient clustering algorithm finds probe set patterns that
have similar shape. The process for finding clusters of similar probe set
patterns is accomplished in three steps:
■
Filtering - Removes patterns due mostly to noise.
■
Seeding - Defines the expression patterns of the clusters.
■
Clustering - Groups patterns which are close to the cluster shape.
First, the data set is filtered to remove probe sets with low or relatively
constant expression levels across the samples (low standard deviation). The
entire data set need not be included to obtain a diverse set of clusters. To the
contrary, including noisy data tends to make the discovery of unique
Affymetrix® Data Mining Tool User’s Guide
expression patterns more difficult. Filtering reduces the number of
expression patterns using the following seeding step. It has been empirically
determined that 3,000 or fewer genes should be included in the seeding step.
Next, a nearest neighbor approach is used to calculate seeds with unique
patterns in the data set. All probe sets whose expression patterns exceed the
user-defined correlation coefficient (CC) threshold are grouped to define a
seed. The expression level for each of the genes in the seed is normalized
relative to its standard deviation and the mean of the normalized expression
levels is calculated and defined as the seed pattern.
In the final step, the pattern of each gene is compared to the seed patterns.
Those patterns that closely match the seed pattern are assigned to the seed
cluster. Depending on the way the clustering parameters are defined, either
all genes or just those that survived the filtering step are assigned to seed
clusters. Genes may match more than one seed. Assignment to more than
one cluster is allowed, or assignment to only the cluster with the highest CC
may be forced.
Unlike the SOM clustering, the correlation coefficient algorithm does not
pre-define the number of clusters. The seeding operation determines the
final number of clusters.
The correlation coefficient clustering algorithm is designed to cluster
GeneChip® expression data such as signal or average difference. In general
it is best to use normalized expression values. This removes some types of
sample preparation artifacts which can create spurious patterns that tend to
mask the true patterns in the data. However, any column in the pivot table
may be selected for cluster analysis. See Appendix D for more information
about the algorithm.
Running the Correlation Coefficient Cluster
To run the correlation coefficient clustering, you must specify the data to
cluster and various parameters for filtering, seeding and final clustering.
1.
Select Analyze → Correlation Coefficient Clustering from the menu
bar.
⇒ The Select Columns for Clustering dialog box appears (Figure 14.8).
241
242
CH A P T E R 14
Cluster Analysis
Figure 14.8
Select Columns for Clustering dialog box
2.
Select the samples for clustering.
3.
Click OK when finished.
⇒ The Correlation Coefficient Clustering dialog box appears
(Figure 14.9).
Figure 14.9
Correlation Coefficient Clustering dialog box
Affymetrix® Data Mining Tool User’s Guide
See Correlation Coefficient Clustering Options on page 244 for a
description of the Filter, Seed Patterns and Cluster options and settings.
4.
In GeneChip® data mode, confirm the default or enter a new value for
the Maximum number of probe sets to include in seeding.
The Filter options are only available in GeneChip data mode if the
absolute call or detection values have been retrieved from the
database.
5.
To generate seeds, choose the Generate Seeds option.
The Import Seed Patterns option is described later.
6.
Confirm the defaults or enter new values for the Correlation
coefficient threshold and Minimum number of probe sets per seed.
7.
Confirm the defaults for the Cluster options (Unique assignments to
one cluster and Cluster filtered probe sets only) or choose new
Cluster options.
8.
Click Run to start the cluster analysis.
⇒ The Cluster tab in the graph pane displays the clusters (Figure 14.10).
The pane displays the cluster number followed by the number of cluster
members (in parentheses).
The cluster plot axes are not scaled identically.
243
244
CH A P T E R 14
Cluster Analysis
Figure 14.10
Correlation coefficient cluster plot
Correlation Coefficient Clustering Options
The parameters for the filtering, seed generation and clustering steps of the
clustering algorithm are specified in the Correlation Coefficient Clustering
dialog box (Figure 14.9). The following discusses the parameters for each step.
Filter
Many genes in a data set may not be expressed and have low expression
values. However, the noise in the expression values will lead to spurious
patterns which are removed by the filtering step. The detection call
(Statistical Expression algorithm) and absolute call (Empirical Expression
algorithm) are used to determine whether a probe set is expressed or not. An
absent call (A) indicates the gene is not expressed in the sample. These calls
may be excluded from the seeding process.
To filter based on the expression call, choose the Exclude probe sets with
less than _% Present calls across all analyses when generating seeds
option. The filter slider sets the percentage of P (present) or M (marginal)
calls that are required for a given probe set to be included in the seeding step.
Affymetrix® Data Mining Tool User’s Guide
A higher filter percentage excludes more probe sets with low expression
values; a lower percentage includes more genes with low signal. The default
is 75%.
Depending on the experiment, the filtering parameter may be set to either a
high or low number. For example, suppose an experiment looks at several
different tissues and only those probe sets expressed in a single tissue are of
interest. In this case, lowering the filtering percentage and tolerating the
noise in the rest of the sample is required to detect the one rare gene that may
be expressed.
Also in the filter step, specify the Maximum number of probes to include
in seeding. This parameter ranks the genes according to the relative standard
deviation of their expression intensities across the samples, that is, those
with the greatest fluctuations in expression patterns. Top-ranked probe sets
fluctuate the most, low-ranked genes the least. The value determines the
number of top-ranked genes that will be included in seeding. The default is
1,000, but sometimes this value may be as small as several hundred in order
to obtain meaningful clusters.
Seed
Seeding is a pre-clustering process by which cluster patterns are first
determined. A seed is usually a small group of genes whose expression
patterns are very similar to each other. The seed’s expression pattern is
calculated from the average expression pattern of this small group.
Two separate parameters are used in the seeding step:
■
Correlation coefficient threshold
■
Minimum number of probes per seed
The correlation coefficient threshold is a numerical way of representing the
relatedness of expression patterns. It is the covariance between expression
patterns for two probe sets across a series of biological samples.
The value of the correlation coefficient ranges from -1 to +1, where +1
represents complete correspondence. The higher the threshold, the more
similar the probe sets must be to belong to the same seed. The default value
is 0.98, but can be as large as 0.999 or as small as 0.8 in order to obtain
meaningful clusters.
245
246
CH A P T E R 14
Cluster Analysis
Set the Minimum number of probe sets to specify the number of genes that
must be present in a seed for it to be used in clustering. A higher number is
more restrictive and reduces the number of allowed patterns. A lower
number allows rarer expression patterns to define seeds and then later
clusters. The default is 3.
If the file name is entered into the Save Seed Patterns box, the patterns will
be saved as a text file.
Cluster
There are three parts to the final clustering step.
The correlation coefficient threshold is the same parameter as in the
seeding step. It specifies how closely a probe set pattern must match the
seed’s pattern in order to join the cluster. A lower number allows a less
stringent expression relationship between the probe sets which are permitted
to join the cluster. A higher number forces a more stringent relationship.
Generally, it is best to use a less stringent threshold than in the seeding step
in order to incorporate more unseeded probe sets into the cluster. The default
is 0.90.
If the Cluster Filtered Probe Sets Only option is chosen, only those probe
sets that passed the filtering steps are allowed to join the cluster. The choice
will depend on factors such as the quality of the data and whether a rare
expression pattern is being sought.
It is possible that a probe set correlation coefficient exceeds the threshold for
two or more seeds. If the Unique assignments to one cluster option is
chosen, a probe set is assigned to the cluster with the highest correlation
coefficient. If this option is not chosen, the probe set is assigned to every
cluster whose correlation threshold exceeds the threshold
Effect of Changing Algorithm Parameters
describes how changing a parameter value affects seeding and
clustering.
Table 14.1
Affymetrix® Data Mining Tool User’s Guide
Table 14.1
User-modifiable Correlation Coefficient algorithm parameters
Correlation
Coefficient
Algorithm
Parameter
Description
Parameter
Change
Effect of Parameter Change
Specifies the percentage of present
and marginal detection or absolute
calls that a probe set must have
across all analyses in order to be
considered for the seeding step, and
optionally, the clustering step
(default = 75%)
Increase
Decreases the number of probe sets (with
the highest relative standard deviation) used
in the seeding process. An excessive number
of probe sets in the seeding process
generates large, less distinct clusters.
Decrease
Increases the number of probe sets (with the
highest relative standard deviation) used in
the seeding process. Increases the number
of seeds (representative expression profiles).
The algorithm ranks the probe sets
not excluded by the filter in order of
highest standard deviation. Probe
sets with the highest standard
deviation are included in the seeding
procedure until the Maximum
number of probe sets to include in
seeding is reached.
Increase
Increases the number of probe sets (with the
highest relative standard deviation) used in
the seeding process. An excessive number of
probe sets in the seeding process generates
large, less distinct clusters.
Decrease
Decreases the number of probe sets (with
the highest relative standard deviation) used
in the seeding process. Decreases the
number of seeds (representative expression
profile for a cluster).
Increase
Increases the similarity required between the
expression profiles of two probe sets in order
to be included in the same seed. If the seed
correlation coefficient threshold is
excessively high, this prevents identification
of any seeds.
Decrease
Lowers the similarity required between the
expression profiles of two probe sets in order
to be included in the same seed. If the seed
correlation coefficient threshold is too low,
expression profiles merge and can result in a
new profile that is unlike either merged
profile.
Minimum number of probe sets (that Increase
Minimum
number of probe exceed the seed correlation
Decrease
coefficient threshold) required to
sets per seed
define a cluster and generate a seed.
Decreases the number of clusters generated.
Filter
Maximum
number of probe
sets to include in
seeding
Seed correlation Expression patterns of probe sets
that pass the filter are compared to
coefficient
one another. If the correlation
threshold
coefficient between two probe set
profiles exceeds the threshold, they
are included in the same seed.
Cluster
correlation
coefficient
threshold
The expression profile of each probe
set is compared to each seed. If the
correlation coefficient exceeds the
threshold, the probe set is assigned
to the cluster.
Increases the number of clusters generated.
Increase
Decreases the number of probe sets in a
cluster.
Decrease
Increases the number of probe sets in a
cluster.
247
248
CH A P T E R 14
Cluster Analysis
Saving and Importing Seed Patterns
The seeding process described above is useful for finding interesting, but
unknown patterns in the data set. In cases where the pattern is known, the
seeding process can be omitted and the known patterns imported instead. An
example pattern would be a gene that is expressed in one tissue type, but in
none of the others.
To import the seed patterns:
1.
Choose the Import Seed Patterns option.
2.
Enter the name of the seeds data file (*.txt) that contains the patterns
(Figure 14.11). Alternatively, click the Browse button
and select a *.txt
from the Read Seeds Data dialog box that appears.
The *.txt can be a file saved from a previous clustering run (see Saving
Seeds Data on page 249) or manually created (see Seed Pattern (*.txt)
Format on page 250).
Figure 14.11
Correlation Coefficient Clustering dialog box
Affymetrix® Data Mining Tool User’s Guide
Saving Seeds Data
The seeds generated by a cluster analysis may be saved in a seeds data file
(*.txt).
1.
In the Correlation Coefficient Clustering dialog box (Figure 14.12), select
the Generate Seeds option.
Figure 14.12
Correlation Coefficient Clustering dialog box
2.
Click the upper Browse button .
⇒ The Save Seeds Data dialog box appears (Figure 14.13).
249
250
CH A P T E R 14
Cluster Analysis
Figure 14.13
Save Seeds Data dialog box
3.
Select a directory for the saved file.
4.
Enter a File Name for the seeds data file (*.txt), then click Save.
The seed patterns are saved when the clustering algorithm is executed.
Seed Pattern (*.txt) Format
In the seed pattern text file (Figure 14.14), the first row contains the column
headings. The following rows contain the patterns. Each row contains the
label of the pattern and the expression values.
Figure 14.4 shows a representative pattern file. In this example, the first three
rows are patterns of individual probe sets. The last three rows are userspecified patterns.
Figure 14.14
Import seed text file
Affymetrix® Data Mining Tool User’s Guide
Saving a Probe List
Cluster members may be saved as a probe list.
1.
Click the cluster you want to save.
⇒ The cluster members are displayed in the Probes box of the
Clusters tab (Figure 14.15).
2.
Enter a name for the list in the Probe List Name box.
Figure 14.15
Correlation coefficient cluster plot
3.
Click Save Selected.
⇒ The data tree displays the probe list name.
251
252
CH A P T E R 14
Cluster Analysis
To quickly view data for the cluster members in a probe list, right-click
the probe list in the data tree, then select Highlight Pivot and
Graphs from the shortcut menu. The pivot table displays only the
rows for the probe list (cluster members).
If the scatter, fold change and series line graphs were previously
plotted for the clustered columns, the scatter and fold change graphs
highlight the points from the probe list. The series line graph displays
only the probe list.
15
Chapter 15
DMT Tutorial
15
Introduction
This tutorial includes six lessons that demonstrate (in GeneChip® data
mode) how to use DMT.
■
Lesson 1: Identify highly expressed genes
■
Lesson 2: Calculate summary statistics of replicates
■
Lesson 3: Summarize qualitative data
■
Lesson 4: Evaluate difference between two tissues
■
Lesson 5: Use comparison ranking to evaluate difference call consistency
■
Lesson 6: Perform cluster analysis using the self organizing map (SOM)
algorithm
The tutorial lessons use the demonstration database DMT_3_Tutorial that is
provided on the Affymetrix Data Mining Tutorial and Demo Data CD
(P/N 610050 Rev. 2). The database includes absolute and comparison
analyses of tissue T1, T2 and T3.
LIMS users: The tutorial database name may be different from that
used in this manual. Please contact your Database Administrator for
the correct name.
There are six replicate absolute analyses of each tissue type (a total of 18
absolute analyses). For example, the replicates for tissue T1 are T1_r1,
T1_r2, ... T1_r6 (Figure 15.1). There are 36 comparison analyses that compare
tissue T1 and T2 replicates for use in Lesson 5.
The number of replicates needed in your own experiments will depend on
how much variability you expect to see in your system.
The signal intensity data were scaled to a target intensity (TGT) of 500 using
the All Probe Sets option in the Affymetrix® Microarray Suite software.
255
256
CH A P T E R 15
DMT Tutorial
Figure 15.1
DMT_3_Tutorial database, 18 absolute analyses
Before we can start to analyze the data, we must first register and connect
the database to DMT.
Step 1: Restoring the MicroDB™ Database
Refer to the Affymetrix® MicroDB™ User’s Guide, the SQL Server manual,
or the Oracle® manual for instructions on how to restore the tutorial database
to a workstation or server.
Step 2: Starting DMT
Refer to Chapter 2 on page 10 for more information on installing and
registering DMT.
Press the Windows Start menu button
Affymetrix → Data Mining Tool.
⇒ The DMT main window appears.
, then select Programs →
Step 3: Registering the Database
Tutorial Database on Windows NT® Workstation (MicroDB™ System)
1.
Select Edit → Register Database from the menu bar.
⇒ The Register Database dialog box appears (Figure 15.2).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.2
Register Database dialog box, publish database on Windows NT workstation
2.
Select the DMT_3_Tutorial database from the Publish Database
drop-down list, then click Register.
⇒ The tutorial database is now available to DMT.
Tutorial Database on LIMS Server (Affymetrix® LIMS)
Select Edit → Register Database from the menu bar.
⇒ The Register Database dialog box appears (Figure 15.3).
Figure 15.3
Register Database dialog box
Oracle® Database
1.
To select another server, enter the server name or Oracle alias, then click
List Databases to display the publish databases for the server in the
Publish Database drop-down list.
2.
Select the DMT_3_Tutorial database from the Publish Database dropdown list, then click Register.
⇒ The tutorial database is available to DMT.
257
258
CH A P T E R 15
DMT Tutorial
Step 4: Selecting the Tutorial Database
1.
2.
Select Edit → Select Database from the menu bar.
Select the DMT_3_Tutorial database.
⇒ The status bar at the bottom of the main window displays the name
of the current database.
If the status bar is not displayed, select View → Status Bar from the
menu bar.
Step 5: Opening the DMT Session
A DMT session must be opened to begin data analysis.
■
Select Data → New → GeneChip Mining from the menu bar.
⇒ The DMT session opens (Figure 15.4).
Figure 15.4
DMT session, DMT_3_Tutorial database selected
Affymetrix® Data Mining Tool User’s Guide
Lesson 1: Identifying
Highly Expressed Genes
Identifying genes that significantly change expression level can give insight
into the major functional and structural cell changes that occur between two
experimental conditions (for example, normal cells and cells treated with a
drug).
This lesson shows how to identify genes that are highly expressed in tissue
T1. It then examines the expression of these same genes in tissue T2 and T3.
We will use only one replicate of each tissue in this lesson.
Lesson 1 includes:
■
Step 1: Specifying a Filter
■
Step 2: Querying the Database
■
Step 3: Sorting the Pivot Table by Signal
■
Step 4: Creating a Probe List
■
Step 5: Plotting the Series Bar Graph
Step 1: Specifying a Filter
The filter is a useful tool for selecting transcripts that exceed a certain limit
or transcripts within a given expression range. For example, to find highly
expressed genes, we can specify (in the filter grid) genes that are called
Present with a Signal > 1000.
1.
Clear any entries in the filter grid. To do this, right-click the filter grid
and select Clear Query from the shortcut menu that appears.
2.
In Line 1 of the filter grid, double-click the Signal cell, then enter
>1000.
3.
Enter =’P’ in the Detection cell (Figure 15.9).
Figure 15.5
Filter grid
259
260
CH A P T E R 15
DMT Tutorial
The query interrogates the absolute analyses selected from the data tree and
returns probe sets that have a Signal greater than 1000 and a Present (P)
Detection call.
To specify more complex queries, right-click a cell in the filter grid,
then select Show Query Builder from the shortcut menu that
appears. This opens the Build Filter dialog box for the selected cell.
The Build Filter dialog box enables you to enter complex limits in the
filter grid without prior knowledge of correct syntax for operators
such as BETWEEN and LIKE. You need only specify text or number
where appropriate.
Step 2: Selecting Analyses for the Query
In the data tree, select the absolute analyses: T1_r1, T2_r1 and T3_r1.
Step 3: Pivoting on Signal & Detection Call
1.
To select results for the pivot operation, click the Options toolbar
button .
⇒ The Data Mining Options dialog box appears (Figure 15.6).
2.
Click the Pivot tab.
⇒ The absolute and relative expression data available for the pivot
table are displayed (Figure 15.6).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.6
Data Mining Options dialog box, Pivot tab
3.
From the list of Absolute Expression Data for the Statistical
Algorithm, select Signal and Detection (Figure 15.6).
4.
Clear the check mark from the Show order analyses dialog option.
When this option is chosen, the software prompts you to confirm the
order of the columns (analyses) in the pivot table prior to the pivot
operation.
5.
Click OK to close the Data Mining Options dialog box.
261
262
CH A P T E R 15
DMT Tutorial
Step 4: Querying and Pivoting the Data
1.
Click the Pivot toolbar button .
⇒ The data are queried using the filter specified in step 1. The pivot
table displays the signal and detection call for each probe set
returned by the query (Figure 15.7).
You can reorder the pivot table columns using the click-and-drag
method.
Figure 15.7
Pivot table
Some fields in the pivot table are blank because the probe sets in
these analyses did not satisfy the filter criteria.
Affymetrix® Data Mining Tool User’s Guide
Step 5: Sorting the Pivot Table by Signal
In the pivot table, right-click the Signal column heading for T1_r1 and
select Sort Descending from the shortcut menu that appears.
⇒ The pivot table columns are sorted in descending order of the signal
values for T1_r1 (Figure 15.8).
Step 6: Saving a Probe List
1.
Select the ten pivot table rows with the highest signal values for T1_r1
(Figure 15.8).
Figure 15.8
Pivot table
2.
Right-click a highlighted cell and select Create Probe List from the
shortcut menu that appears.
⇒ The Save Probe List dialog box appears (Figure 15.9).
263
264
CH A P T E R 15
DMT Tutorial
Figure 15.9
Save Probe List dialog box
3.
In the Name box, enter the probe list name Highly Expressed.
4.
Clear the check mark from the Show members after saving option.
5.
Click Save.
⇒ The probe list is saved and displayed in the data tree.
To view the probe list members, click the plus sign (+) next to the
probe list in the data tree.
Step 7: Plotting the Series Line Graph
Now that we have identified genes that are highly expressed in T1_r1 and
saved them in a probe list, we can plot the series line graph to examine the
expression levels of these genes in T2_r1 and T3_r1 as well.
1.
Right-click the filter grid and select Clear Query from the shortcut
menu that appears.
⇒ The criteria in the filter grid are cleared.
2.
Click the Pivot toolbar button
.
Verify that analysis T1_r1, T2_r1 and T3_r1 remain selected in the data
tree before running the pivot operation.
Affymetrix® Data Mining Tool User’s Guide
3.
Right-click the Highly Expressed probe list in the data tree and select
Display Selected Probes from the shortcut menu that appears.
⇒ The pivot table displays only the members of the Highly Expressed
probe list (Figure 15.10).
Figure 15.10
Pivot table
4.
Click the Options toolbar button .
⇒ The Data Mining Options dialog box appears (Figure 15.11).
265
266
CH A P T E R 15
DMT Tutorial
Figure 15.11
Data Mining Options dialog box, Series Graph tab
5.
Click the Series Graph tab and verify the Line Graph option is
selected.
6.
Click OK.
7.
Click the Series Graph toolbar button .
⇒ The Series Graph dialog box appears (Figure 15.12).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.12
Series Graph dialog box
8.
Select all three columns (T1_r1-Signal, T2_r1-Signal and and
T3_r1-Signal), then click OK.
⇒ The signal series line graph is plotted for the probe sets in the
Highly Expressed probe list (Figure 15.13).
If necessary, use the scroll bar at the bottom of the graph pane to view
the entire graph.
267
268
CH A P T E R 15
DMT Tutorial
Figure 15.13
Series bar graph displaying the Highly Expressed probe list
Lesson 1 Summary
We used filters (Detection = P and Signal > 1000) to query the database and
select transcripts in a given expression range. We then sorted the pivot table
by signal in descending order to quickly identify those genes returned by the
query that were expressed the highest.
We saved probe sets (genes) of interest as a probe list. The probe list is a
useful way to organize probe sets of interest. In the data tree, the Display
Selected Probes function provided a convenient way to view pivot table
results and plot graphs for the probe list members only. You can use the
probe list to look at gene expression for list members across other
experiments.
For example, in this lesson we saved ten probe sets (with the highest signal
value in T1_r1) as a probe list, then used the Display Selected Probes
function to update the pivot table and plot the series bar graph for analyses
T1_r1, T2_r1 and T3_r1.
Affymetrix® Data Mining Tool User’s Guide
Suggested Exercise
Repeat lesson 1, filtering for genes that are called present and have a signal
between 1000 and 2000. Generate a short probe list (five to ten members)
and plot the series line graph (select Columns for the X-Axis option) for the
probe list across three replicate analyses.
269
270
CH A P T E R 15
DMT Tutorial
Lesson 2: Calculating
Averages of Replicates
The analysis of replicates allows us to measure the variability in a data set
and determine confidence values for these measurements. This enables us to
measure small, consistent changes even when the variability in a data set is
relatively high.
Small changes in gene expression can be biologically very important. Using
a larger number of replicates increases the probability that small changes are
statistically significant.
Lesson 2 shows how to compute the mean and standard deviation for the
members of the Highly Expressed probe list (generated in lesson 1) across
replicate analyses. This lesson includes:
■
Step 1: Specifying a Probe List for the Filter
■
Step 2: Selecting Analyses for the Query
■
Step 3: Pivoting on Signal
■
Step 4: Querying and Pivoting the Data
■
Step 5: Selecting the Average and Standard Deviation Operators
■
Step 6: Sorting the Pivot Table
■
Step 7: Displaying Probe Set Descriptions
Step 1: Specifying a Probe List for the Filter
1.
Clear any entries in the filter grid. To do this, right-click the filter grid
and select Clear Query from the shortcut menu that appears.
2.
In Line 1 of the filter grid, right-click the Probe Set Name column, then
select Probe List from the shortcut menu that appears.
⇒ The Open Probe List dialog box appears (Figure 15.14).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.14
Open Probe List dialog box
3.
Select the Highly Expressed probe list (generated in lesson 1), then
click Open.
⇒ The selected probe list is placed in the Probe Set Name column of
the filter grid (Figure 15.15).
The probe list contains the probe sets we want to analyze. By loading
the list we limit our analysis to these probe sets only.
Figure 15.15
Filter grid
271
272
CH A P T E R 15
DMT Tutorial
Step 2: Selecting Analyses for the Query
1.
In the data tree, select all replicate absolute analyses for tissue T1, T2,
and T3 (T1_r1 through T1_r6, T2_r1 through T2_r6, and T3_r1 through
T3_r6) (Figure 15.16).
Figure 15.16
Data tree, all absolute analyses (18) selected
Affymetrix® Data Mining Tool User’s Guide
Step 3: Pivoting on Signal
1.
Click the Options toolbar button .
⇒ The Data Mining Options dialog box appears (Figure 15.17).
2.
Click the Pivot tab.
⇒ The absolute and relative expression data available for the pivot
table are displayed (Figure 15.17).
Figure 15.17
Data Mining Options dialog box, Pivot tab
3.
From the list of Absolute Expression Data for the Statistical
Algorithm, select Signal. Verify that all other options are cleared.
4.
Click OK to close the Data Mining Options dialog box.
273
274
CH A P T E R 15
DMT Tutorial
Step 4: Query and Pivot the Data
1.
Click the Pivot toolbar button .
⇒ The pivot table displays probe sets that are members of the Highly
Expressed probe list (Figure 15.18).
Figure 15.18
Pivot table
Average and Standard Deviation
The average and standard deviation statistics or the median and inter-quartile
range statistics can be used to summarize the expression level for each probe
set across a number of replicate analyses.
Select the average and standard deviation statistics if you assume a normal
distribution for the data (Figure 15.19). The standard deviation provides an
estimate of how much the expression level changes from one replicate to the
next.
Select the median and inter-quartile range statistics if you assume the data do
not have a normal distribution (Figure 15.20). The inter-quartile range is the
75th percentile minus the 25th percentile.
If you are not sure whether your data have a normal distribution,
calculate both the mean and median values. If the values vary
significantly, the data probably do not have a normal distribution and
it may be better to use the median value.
Affymetrix® Data Mining Tool User’s Guide
Figure 15.19
Normal data distribution
Figure 15.20
Skewed data distribution
275
276
CH A P T E R 15
DMT Tutorial
Step 5: Selecting Average & Standard Deviation Operators
1.
Select Analyze → Analysis Function from the menu bar.
⇒ The Analysis Function dialog box appears (Figure 15.21).
Figure 15.21
Analysis Function dialog box
2.
Enter T1 in the Column Name box.
3.
Select the Average and Standard Deviation operators (Figure 15.21),
then click Next.
⇒ The column selection dialog box displays the available pivot table
columns (Figure 15.22).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.22
Analysis Function dialog box
4.
Select all replicate T1 signal columns (T1_r1-Signal, T1_r2-Signal,...
T1_r6-Signal), then click Finish.
⇒ The pivot table (far right) displays the new columns T1-Average
and T1-Stdev (Figure 15.23).
Use the horizontal scroll bar at the bottom of the results pane to view the
right side of the pivot table.
Figure 15.23
Pivot table, average and standard deviation for replicate T1 average difference data
277
278
CH A P T E R 15
DMT Tutorial
5.
Repeat items 1 through 4 of Step 5 for the replicate T2 average
difference columns (enter T2 in the Column Name box of the Analysis
Function dialog box).
6.
Repeat items 1 through 4 of Step 5 for the replicate T3 average
difference columns (enter T3 in the Column Name box of the Analysis
Function dialog box).
⇒ The pivot table (right side) displays six new columns:
■
T1-Average and T1-Stdev,
■
T2-Average and T2-Stdev, and
■
T3-Average and T3-Stdev (Figure 15.24).
Use the horizontal scroll bar at the bottom of the results pane to view the
right side of the pivot table.
Figure 15.24
Pivot table, average and standard deviation for replicate T1, T2 and T3 average difference data
Affymetrix® Data Mining Tool User’s Guide
Step 6: Sorting the Pivot Table
We are interested in probe sets with large signal values. We can sort the pivot
table to help identify these probe sets.
1.
Select Edit → Sort from the menu bar.
⇒ The Sort dialog box appears (Figure 15.25).
Figure 15.25
Sort dialog box
2.
Select T1-Average from the top Sort By drop-down list and select the
Descending sort option.
3.
Click OK.
⇒ The pivot table is sorted by descending average T1-Signal value
(Figure 15.26).
279
280
CH A P T E R 15
DMT Tutorial
Figure 15.26
Pivot table sorted by descending T1-Average
Step 7: Displaying Probe Set Descriptions
1.
Select Query → Pivot Descriptions from the menu bar.
⇒ The pivot table displays a column of probe set descriptions
(Figure 15.27).
Figure 15.27
Pivot table, probe set descriptions displayed
Affymetrix® Data Mining Tool User’s Guide
Lesson 2 Summary
We used a probe list as a filter to focus on genes of interest across different
analyses. Here we included the Highly Expressed probe list (generated in
lesson 1) in the filter and queried all replicate analyses of tissue T1, T2 and
T3.
We computed the mean and standard deviation to help summarize the
replicate average difference data for tissue T1, T2 and T3, and provide a
confidence measure for the data. We sorted the T1 signal values to help us
identify probe sets with large signal values. Descriptions were displayed in
the pivot table for more information about the probe sets.
Suggested Exercise
Repeat lesson 2, computing the median and inter-quartile range for the
replicate analyses of tissue T1, T2 and T3.
281
282
CH A P T E R 15
DMT Tutorial
Lesson 3: Summarizing
Qualitative Data
Some transcripts may be expressed at the limit of assay detection. The more
often a weakly expressed transcript is called present across multiple
analyses, the more confident we are that it is actually present. (Think of this
as a jury where each experiment is a juror that votes whether or not a
transcript is present.)
This lesson shows how to:
■
Use the Count & Percentage analysis to evaluate the consistency of
detection calls across all replicate data.
■
Identify the transcripts that are present in all replicates of tissue T1, T2
and T3.
■
Annotate the genes that are present and generate a corresponding probe
list representing potential genes of interest.
Lesson 3 includes:
■
Step 1: Pivoting on Detection Call
■
Step 2: Performing Count & Percentage Analysis
■
Step 3: Sorting the Pivot Table Results
■
Step 4: Saving a Probe List
■
Step 5: Annotating Probe List Members
Step 1: Pivoting on Detection Call
1.
Clear any entries in the filter grid. To do this, right-click the filter grid
and select Clear Query from the shortcut menu that appears.
2.
In the data tree, select all 18 absolute analyses for tissue T1, T2 and T3.
3.
Click the Options toolbar button .
⇒ The Data Mining Options dialog box appears (Figure 15.28).
4.
Click the Pivot tab.
⇒ The absolute and relative expression data available for the pivot
table are displayed (Figure 15.28).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.28
Data Mining Options dialog box, Pivot tab
5.
From the list of Absolute Expression Data for the Statistical
Algorithm, select Detection. Verify that all other options are cleared.
6.
Click OK to close the Data Mining Options dialog box.
7.
Click the Pivot toolbar button .
⇒ The pivot table displays the detection call for each probe set in the
selected analyses (Figure 15.29).
283
284
CH A P T E R 15
DMT Tutorial
Figure 15.29
Pivot table displaying detection calls
Step 2: Performing Count & Percentage Analysis
1.
Select Analyze → Analysis Function from the menu bar.
⇒ The Analysis Function dialog box appears (Figure 15.30).
Figure 15.30
Analysis Function dialog box
2.
Enter T1 Present in the Column Name box.
3.
Select the Count & Percentage operator, then select the P (present)
option.
Affymetrix® Data Mining Tool User’s Guide
4.
Click Next.
⇒ The column selection dialog box displays the available pivot table
columns (Figure 15.31).
Figure 15.31
Analysis Function dialog box
5.
Select the six replicates for tissue T1 (T1_r1, T1_r2,... T1_r6), then
click Finish.
⇒ This generates the columns T1 Present-Count and T1 PresentPercent in the pivot table (Figure 15.32).
6.
Repeat items 1 through 5 of Step 2 for the replicate T2 Detection
columns (enter T2 Present in the Column Name box of the Analysis
Function dialog box).
⇒ The pivot table (right side) displays two new columns: T2 PresentCount and T2 Present-Percent (Figure 15.32).
7.
Repeat items 1 through 5 of Step 2 for the replicate T3 Detection
columns (enter T3 Present in the Column Name box of the Analysis
Function dialog box).
⇒ The pivot table (right side) displays two new columns: T3 PresentCount and T3 Present-Percent (Figure 15.32).
285
286
CH A P T E R 15
DMT Tutorial
Use the horizontal scroll bar at the bottom of the results pane to view the
right side of the pivot table.
Step 3: Sorting Pivot Table Results
1.
In the pivot table, right-click the T1 Present-Count column heading
and select Sort Descending from the shortcut menu that appears.
Figure 15.32
Pivot table, count and percentage columns
For each probe set, the:
■
Count column displays the number of columns (analyses) in which
the detection call = Present.
■
Percent column shows the corresponding percentage of columns
(analyses) in which the probe set was called present.
For example, in Figure 15.32, probe set Z70759_at was called present in
all replicates of tissue T1, T2 and T3 or 100% of the analyses.
Sorting the T1 Present-Count column in descending order ranks the
probe sets so that those with the most consistent detection calls in tissue
T1 are displayed at the top of the pivot table.
Affymetrix® Data Mining Tool User’s Guide
Step 4: Saving a Probe List
Save all probe sets with T1 Present-Percent =100% as a probe list called T1
Present 100%. (See lesson 1, step 6.)
Step 5: Annotating Probe List Members
1.
In the data tree, right-click the probe list T1 Present 100% and select
Display Selected Probes from the shortcut menu that appears.
⇒ The pivot table displays all the members of the T1 Present 100%
probe list.
2.
Select all pivot table rows, right-click a pivot table row, then select
Annotate Probes from the shortcut menu.
⇒ The Annotate dialog box appears (Figure 15.33).
Figure 15.33
Annotate dialog box
3.
Enter Tutorial in the Annotation Type box.
4.
In the Annotation box enter: T1: Present count = 6, Percent = 100%.
5.
Click OK.
⇒ The probe sets that were called present across all six T1 replicates
are annotated.
287
288
CH A P T E R 15
DMT Tutorial
Lesson 3 Summary
We used the count and percentage analysis to summarize detection calls for
the tissues T1, T2 and T3. By sorting the pivot table T1 count column, we
were able to identify the most consistent results. This makes it easy to
annotate all probe sets that are present in all analyses (or a user-specified
percentage of analyses).
We saved the probe sets that were present in all six T1 replicates as a probe
list and annotated the members of the probe list. In future sessions we can
query the annotations (see Chapter 8, Annotations).
Suggested Exercise
Repeat lesson 3 using the count and percentage analysis to identify all genes
called absent in all replicates of tissue T1, T2 and T3.
Affymetrix® Data Mining Tool User’s Guide
Lesson 4: Evaluating
Difference Between Two
Tissues
The T-Test and Mann-Whitney test are ranking tests that enable you to
determine the direction and significance of change in a transcript’s
expression level between two experimental conditions with one or more
replicates. These analyses are very good strategies to use if you are looking
for small, consistent changes in expression levels. The use of replicates
helps distinguish real change from biological and experimental noise.
The T-Test assumes the expression levels for a given transcript are normally
distributed across experiments. The Mann-Whitney test makes no
assumptions about the data distribution.
DMT computes a p-value for each comparison. The p-value is the
probability value that the observed difference in expression level occurred
by chance. A small p-value (for example, 0.01) means it is unlikely (only a
one in 100 chance) that such a mean difference would occur by chance. If
the computed p-value > p-value cutoff, the change call is no change.
In this lesson, we use the Mann-Whitney test to compare the signal
replicates for tissues T1 and T2 and determine whether the signal data for
these two tissues show a statistically significant difference. The lesson
shows how to generate change calls for signal data so we can determine the
direction of change and associated p-values to estimate confidence.
Lesson 4 includes:
■
Step 1: Pivoting on Signal
■
Step 2: Performing a Mann-Whitney Test
■
Step 3: Annotating Probe Sets
■
Step 4: Saving a Probe List
289
290
CH A P T E R 15
DMT Tutorial
Step 1: Pivoting on Signal
1.
Clear any entries in the filter grid. To do this, right-click the filter grid
and select Clear Query from the shortcut menu that appears.
2.
In the data tree, select all absolute analysis replicates for T1 and T2.
3.
Click the Options toolbar button .
⇒ The Data Mining Options dialog box appears (Figure 15.34).
4.
Click the Pivot tab.
⇒ The absolute and relative expression data available for the pivot
table are displayed (Figure 15.34).
Figure 15.34
Data Mining Options dialog box, Pivot tab
5.
From the list of Absolute Expression Data for the Statistical
Algorithm, select Signal. Verify that all other options are cleared.
6.
Click OK to close the Data Mining Options dialog box.
Affymetrix® Data Mining Tool User’s Guide
7.
Figure 15.35
Pivot table displaying signal
Click the Pivot toolbar button .
⇒ The pivot table displays the signal for each probe set returned by the
query (Figure 15.35).
291
292
CH A P T E R 15
DMT Tutorial
Step 2: Mann-Whitney Test
1.
Select Analyze → Analysis Function from the menu bar.
⇒ The Analysis Function dialog box appears (Figure 15.36).
Figure 15.36
Analysis Function dialog box
2.
Enter T1vsT2 in the Column Name box.
3.
Select the Mann-Whitney test option.
4.
Click Next.
⇒ The column selection dialog box appears (Figure 15.37).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.37
Analysis Function dialog box, select analyses for the Mann-Whitney test
5.
Select the six replicate T1 signal columns in the Control Columns box
(Figure 15.37).
6.
Select the six replicate T2 signal columns in the Experiment Columns
box (Figure 15.37).
The pivot table columns selected in the Control and Experiment
Columns lists define the two populations being compared.
7.
Click Finish.
⇒ The pivot table displays two columns of T1vsT2-Mann-Whitney
test results (Figure 15.38).
The pivot table also displays the computed p-value and the direction of
change (up, down, or none) for each probe set in the comparison. An Up
or Down change direction call is associated with a probe set if the pvalue < 0.05. If the p-value is > 0.05, the change direction call is None.
An Up call for a transcript indicates the signal is higher in the
Experiment group than the Control group. A Down call indicates the
signal is lower in the Experiment group compared to the Control group.
293
294
CH A P T E R 15
DMT Tutorial
8.
Right-click the P Value column header and select Sort Ascending from
the shortcut menu that appears.
⇒ The pivot table displays the p-values in ascending order
(Figure 15.38).
Figure 15.38
Pivot table
Affymetrix® Data Mining Tool User’s Guide
Step 3: Annotating Probe Sets
1.
In the pivot table, select probe sets with an Up call and p-value < 0.001.
2.
Right-click a selected row and select Annotate Probes from the
shortcut menu that appears.
⇒ The Annotate dialog box appears (Figure 15.39).
Figure 15.39
Annotate dialog box
3.
Enter or select Tutorial in the Annotation Type box.
4.
In the Annotation box, enter Signal higher in T2 than T1 with
p<= 0.001, then click OK.
⇒ The probe sets that showed a higher expression level in T2
compared to T1 with significance of p-value < 0.001 are annotated.
Step 4: Saving a Probe List
Save all probe sets with an Up direction call as a probe list named
T2_T1_MW_T2UP. (See lesson 1, step 6.)
You can now inspect or further filter the probe list as in lesson 1 and 2.
295
296
CH A P T E R 15
DMT Tutorial
Lesson 4 Summary
When replicate analyses are available, the Mann-Whitney test helps
determine whether differences in expression levels between two different
groups of samples are statistically significant.
The Mann-Whitney test generates change calls (Up, Down, None)
based on comparisons of one numeric metric (typically, signal).
Lesson 5 shows a more stringent comparison between 2 sets of
replicates using comparison replicates.
Suggested Exercise
Repeat lesson 4 and apply the T-Test to tissue T1 and T2 replicate signal
data.
Affymetrix® Data Mining Tool User’s Guide
Lesson 5: Evaluating
Change Call Consistency
Comparison ranking is a useful method for assessing the consistency of
change calls when comparing two data sets that include replicate analyses. It
is a ranking strategy that uses the change call from Microarray Suite analysis
to perform the ranking. The results are typically more conservative than a
standard Mann-Whitney or T-Test.
To comparison rank two data sets:
■
Generate all possible combinations of comparison analyses for the two
sets of replicate data in Affymetrix® Microarray Suite.
■
Pivot the change call result for all of the comparison analyses.
■
Run a count and percentage analysis of the change call data.
■
In the pivot table, sort the change call, count and percentage columns in
descending order.
This arranges or ranks the probe sets with the highest count and percentage
of a call at the top of the pivot table. Those with the lowest count and
percentage of the call are displayed at the bottom of the table. In this format,
you can conveniently evaluate the consistency of the data and the
significance of a change call.
Lesson 5 shows how to comparison rank T1 and T2 change call data. The
tutorial database includes comparison analyses for all possible combinations
of the T1 and T2 replicates (36 total, generated in Affymetrix® Microarray
Suite, see Figure 15.40 and Table 15.1).
This lesson includes:
■
Step 1: Clearing the Filter Grid & Selecting Comparison Analyses
■
Step 2: Pivoting on Change Call
■
Step 3: Comparison Ranking
■
Step 4: Annotating Probe Sets
■
Step 5: Saving a Probe List
297
298
CH A P T E R 15
DMT Tutorial
Figure 15.40
DMT_2_Tutorial database, 36 comparison analyses of tissue T1 and T2 replicates
Table 15.1
Comparison analyses of T1 and T2 replicate data (generated in Affymetrix® Microarray Suite)
T1
Replicate
Analyses
T2 Replicate Analyses
T2_r1
T2_r2
T2_r3
T2_r4
T2_r5
T2_r6
Comparison Analyses
T1_r1
T1_r1
v
T2_r1
T1_r1
v
T2_r2
T1_r1
v
T2_r3
T1_r1
v
T2_r4
T1_r1
v
T2_r5
T1_r1
v
T2_r6
T1_r2
T1_r2
v
T2_r1
T1_r2
v
T2_r2
T1_r2
v
T2_r3
T1_r2
v
T2_r4
T1_r2
v
T2_r5
T1_r2
v
T2_r6
T1_r3
T1_r3
v
T2_r1
T1_r3
v
T2_r2
T1_r3
v
T2_r3
T1_r3
v
T2_r4
T1_r3
v
T2_r5
T1_r3
v
T2_r6
T1_r4
T1_r4
v
T2_r1
T1_r4
v
T2_r2
T1_r4
v
T2_r3
T1_r4
v
T2_r4
T1_r4
v
T2_r5
T1_r4
v
T2_r6
T1_r5
T1_r5
v
T2_r1
T1_r5
v
T2_r2
T1_r5
v
T2_r3
T1_r5
v
T2_r4
T1_r5
v
T2_r5
T1_r5
v
T2_r6
T1_r6
T1_r6
v
T2_r1
T1_r6
v
T2_r2
T1_r6
v
T2_r3
T1_r6
v
T2_r4
T1_r6
v
T2_r5
T1_r6
v
T2_r6
Affymetrix® Data Mining Tool User’s Guide
Step 1: Clearing the Filter Grid & Selecting Comparison Analyses
1.
To clear the filter grid, right-click the grid and select Clear Query from
the shortcut menu that appears.
2.
In the data tree, select all of the comparison analyses for the T1 and T2
replicate data (36 total) (Figure 15.41). (Press and hold the CTRL key
while you click the analyses.)
Figure 15.41
Data tree, comparison analyses selected
299
300
CH A P T E R 15
DMT Tutorial
Step 2: Pivoting on Difference Call
1.
Click the Options toolbar button .
⇒ The Data Mining Options dialog box appears (Figure 15.42).
2.
Click the Pivot tab.
⇒ The absolute and relative expression data available for the pivot
table are displayed (Figure 15.42).
Figure 15.42
Data Mining Options dialog box, Pivot tab
3.
From the list of Relative Expression Data for the Statistical
Algorithm, select Change.
4.
Click OK to close the Data Mining Options dialog box.
5.
Click the Pivot toolbar button .
⇒ The pivot table displays the change call for each probe set in the
selected analyses (Figure 15.43).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.43
Pivot table displaying change calls
Step 3: Comparison Ranking
1.
Select Analyze → Analysis Function from the menu bar.
⇒ The Analysis Function dialog box appears (Figure 15.44).
Figure 15.44
Analysis Function dialog box
2.
Enter Rank T1vsT2 in the Column Name box.
301
302
CH A P T E R 15
DMT Tutorial
3.
Select the Count & Percentage analysis option, choose the I (increase)
difference call option and click Next.
⇒ The column selection dialog box displays the pivot table columns
available for the Count & Percentage analysis (Figure 15.45).
Figure 15.45
Analysis Function dialog box
4.
Select all of the columns (comparison analyses) and click Finish.
⇒ The new pivot table columns: Rank T1vsT2-Count and Rank
T1vsT2-Percent are generated (Figure 15.46).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.46
Pivot table
5.
Right-click the Rank T1vT2-Count column header and select Sort
Descending from the shortcut menu that appears.
⇒ The probe sets with the highest count and percentage are arranged,
or ranked, at the top of the pivot table (Figure 15.46). Those with the
lowest count and percentage (least consistent data) are located at the
bottom of the table.
Step 4: Annotating Probe Sets
1.
Select the pivot table rows with RankT1vsT2-Percent = 100%.
100% concordance is very high stringency or confidence. You can
select a lower percentage, depending on your requirements.
2.
Right-click a highlighted row and select Annotate Probes from the
shortcut menu that appears.
⇒ The Annotate dialog box appears (Figure 15.47).
303
304
CH A P T E R 15
DMT Tutorial
Figure 15.47
Annotate dialog box
3.
Enter T1vT2: Increase with 100% concordance in the Annotation box.
4.
Enter or select Tutorial in the Annotation Type box.
5.
Click OK.
Step 5: Saving a Probe List
In the pivot table, select the probe sets with RankT1vsT2-Percent = 100%
and save them as a probe list. (See lesson 1, step 6.)
Lesson 5 Summary
The comparison ranking method uses the count and percentage operator to
rank the increase or decrease change calls of comparison analyses between
two groups of replicate samples. The method enables you to assess the
consistency or concordance of change calls between the two groups.
In this lesson we identified the genes that show concordance of the increase
change call in T1 and T2. We annotated these genes and saved them as a
probe list.
Suggested Exercise
Perform a comparison ranking using count and percentage analysis on tissue
T1 and T2 Decrease and Marginal Decrease change calls.
Affymetrix® Data Mining Tool User’s Guide
Lesson 6: Self Organizing
Map (SOM) Cluster
Analysis
Cluster analysis groups probe sets with similar gene expression patterns. For
example, cluster analysis can help identify transcripts that are increased after
a treatment or over a period of time. Clustering can be applied to any
numeric output; however, the SOM algorithm is optimized for expression
signals and the algorithm defaults are set accordingly.1
This lesson demonstrates how to:
■
Compute the average signal values of tissue T1, T2 and T3
■
Apply SOM cluster analysis to the average signal values of T1, T2 and T3
(See Appendix D for more information about the SOM algorithm.)
■
Save a cluster result as a probe list
Lesson 6 includes:
■
Step 1: Clearing the Filter Grid & Selecting Analyses
■
Step 2: Pivoting on Signal
■
Step 3: Computing Average Signal
■
Step 4: SOM Cluster Analysis
■
Step 5: Saving & Annotating a Probe List
1. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, Eric S.,
and Golub, T.R. Interpreting patterns of gene expression with self-organizing maps: Methods
and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA. 96:2907-2912.
305
306
CH A P T E R 15
DMT Tutorial
Step 1: Clearing the Filter Grid & Selecting Analyses
1.
To clear the filter grid, right-click the grid and select Clear Query from
the shortcut menu that appears.
2.
In the data tree, select the absolute analysis replicates for tissue T1, T2
and T3 (18 analyses).
3.
Click the Options toolbar button .
⇒ The Data Mining Options dialog box appears (Figure 15.48).
4.
Click the Pivot tab.
⇒ The absolute and relative expression data available for the pivot
table are displayed (Figure 15.48).
Figure 15.48
Data Mining Options dialog box, Pivot tab
5.
From the list of Absolute Expression Data for the Statistical
Algorithm, select Signal. Verify that all other options are cleared.
Affymetrix® Data Mining Tool User’s Guide
6.
Click OK to close the Data Mining Options dialog box.
Step 2: Pivoting on Signal
1.
Figure 15.49
Pivot table
To pivot the data, click the Pivot toolbar button .
⇒ The pivot table displays the signal for each probe in the selected
analyses (Figure 15.49).
307
308
CH A P T E R 15
DMT Tutorial
Step 3: Computing Average Signal
1.
Select Analyze → Analysis Function from the menu bar.
⇒ The Analysis Function dialog box appears (Figure 15.50).
Figure 15.50
Analysis Function dialog box
2.
Enter T1 in the Column Name box.
3.
Select the Average operator, then click Next.
⇒ The column selection dialog box appears (Figure 15.51).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.51
Analysis Function dialog box
4.
Select the six replicate T1 Signal columns (absolute analyses) in the
Analysis Function dialog box, then click Finish.
⇒ The new column T1-Average in pivot table is generated
(Figure 15.52).
Figure 15.52
Pivot table
309
310
CH A P T E R 15
DMT Tutorial
5.
Repeat items 1 through 4 in Step 3 for the replicate T2 signal columns
(enter T2 in the Column Name box of the Analysis Function dialog
box) to generate the T2-Average column in the pivot table.
6.
Repeat items 1 through 4 in Step 3 for the replicate T3 signal columns
(enter T3 in the Column Name box of the Analysis Function dialog
box) to generate the T3-Average column in the pivot table.
Step 4: SOM Cluster Analysis
1.
Select Analyze → SOM Clustering from the menu bar.
⇒ The Select Columns for Clustering dialog box appears (Figure 15.53).
Figure 15.53
Select Columns for Clustering dialog box
2.
Select the T1-Average, T2-Average and T3-Average columns, then click
OK.
⇒ The SOM Clustering dialog box appears (Figure 15.54).
Affymetrix® Data Mining Tool User’s Guide
Figure 15.54
SOM Clustering dialog box
The SOM Clustering dialog box contains two sections: SOM Filtering (top)
and Parameters (bottom). The filter and parameters settings significantly
affect the analysis. Click Defaults to reset the algorithm parameters to the
default settings.
SOM Filtering
There are three types of SOM filters: thresholds, row variation and row
normalization. The default values are appropriate for most data sets, when
clustering on signal, but the optimum values may differ depending on the
data set and the type of information you want to extract from your data.
Thresholds
This sets the maximum and minimum values for the data set (signal in this
example). The minimum and maximum threshold settings control the
outliers in the data.
311
312
CH A P T E R 15
DMT Tutorial
shows three expression profiles. In the raw data, the two outliers
prevent effective normalization. Normalization is much more effective after
filtering. However, filtering removes information from the data set, so filter
as little as possible to obtain the optimum results.
Figure 15.55
Figure 15.55
Example expression profiles showing effects of normalization and filtering
Row Variation
When comparing expression patterns between different biological
conditions, most genes do not significantly change expression level and are
uninformative. Keeping uninformative genes in the data set in effect forms a
single, large cluster that may affect our ability to cluster the expression
patterns that do change.
The row variation filters define the genes that are considered changed. The
Max/Min setting defines the minimum expression ratio value (maximum
expression level/minimum expression level) a probe set must have across all
experiments to be included in the cluster analysis.
The Max/Min setting is very useful at moderate to high expression levels,
but is subject to noise at low expression levels. For example, in Figure 15.56,
Affymetrix® Data Mining Tool User’s Guide
the max/min ratio (b) for the bottom profile is much higher than max/min
ratio (a) for the top profile. We need a second parameter to filter for changes
at low expression levels.
Figure 15.56
Max/Min
The Max-Min setting can distinguish between changes in low expression
levels and high expression levels, and can be used to filter out noise. As
Figure 15.57 shows, the max-min can be set to eliminate the bottom profile if
we should want to. It is very important to take care when filtering out noisy,
low-level expression, because the noise will be amplified after normalization
(Figure 15.57).
To see how many probe sets remain after filtering, click Compute in
the SOM Clustering dialog box.
The Max-Min value sets a threshold for the absolute numerical difference of
the clustering values. For example, if the Max/Min is set at three, changes of
30/10 and 300/100 will be included in the cluster analysis. By setting the
Max-Min to 100, we eliminate the inherently noisy, low numerical change
values.
313
314
CH A P T E R 15
DMT Tutorial
Figure 15.57
Max-min filter setting
Row Normalization
Normalization is a technique that helps to answer the question: What probe
sets have similar expression patterns? For example, we may be interested in
finding all genes that increase expression under certain experimental
conditions regardless of the actual level of increase.
Asking the question in this way allows us to find small as well as large
changes in expression levels. It also makes the technique less sensitive to
experimental variation in the absolute expression levels, such as difficulty
normalizing between controls and experiments.
Using filtering and normalization usually reduces the number of clusters in a
data set because we ignore the actual expression levels. For example,
Figure 15.58 shows three different transcripts that are expressed at very
different levels. Without normalization, the three probe sets may group into
three different clusters according to their absolute expression levels.
However, after normalization, it is clear that their relative expression levels
are the same and they should cluster together.
Affymetrix® Data Mining Tool User’s Guide
Figure 15.58
Raw and normalized expression profiles
Order of Filtering
The Up and Down buttons in the SOM Clustering dialog box can be used to
change the order in which the filters are applied to the data. Be very careful
if you intend to change the default order. In particular, the row normalization
filter changes the values of the data to which the filters are applied and will
change the filter functions significantly.
Parameters
The rows and columns parameters should be considered carefully. These
settings specify the grid of centroids or nodes that is applied to the data. In
general, try to keep the grid square or almost square to ensure good coverage
of the whole data set. For the same reason, it also helps, but is not
imperative, to make one of the settings (row or column) an uneven number.
The exact number of clusters (rows x columns) you select depends on the
size and complexity of the data set, and the type of analysis you want to
perform. The default of 18 clusters (6 rows x 3 columns) is a good place to
start. If you find that the analysis generates empty clusters (clusters with
zero members), reduce the number of clusters.
Reducing the cluster number increases the variability of the shapes of curves
grouped together in a cluster. This is indicated by an increase in the distance
between the two (blue) error bars. Generating a small number of clusters
summarizes the data, but may obscure rare, interesting patterns.
Increasing the cluster number reduces the variability of the patterns that are
grouped together. This is indicated by a decrease in the distance between the
315
316
CH A P T E R 15
DMT Tutorial
two (blue) error bars. If the cluster number is increased too much, the
algorithm generates clusters with no members and many clusters will look
the same. The optimum number of clusters for a particular dataset displays
the narrowest possible error bars with lowest number of empty clusters.
The remaining parameters affect the functioning of the cluster program and
are intended for expert users. Do not change these parameters unless you
understand their function and the effect of changing them on your data.
3.
In the SOM Clustering dialog box, click the Add> button for the
Thresholds, Row Variation and Row Normalization variables.
⇒ The default values for these algorithm variables are displayed in the
box in the upper right corner (Figure 15.59).
Figure 15.59
SOM Clustering dialog box
4.
Enter 3 rows and 2 columns in the Parameters section.
Other values may be more appropriate for your data. These are
suggested values for this data set.
Affymetrix® Data Mining Tool User’s Guide
5.
Click Run.
⇒ The SOM algorithm generates 6 clusters (3 rows x 2 columns
specified in the SOM parameters) (Figure 15.60).
Your results may not be identical to the clusters in Figure 15.60. Run-torun cluster results may vary slightly because the nodes are randomly
initialized (see Appendix D).
Figure 15.60
SOM cluster results
317
318
CH A P T E R 15
DMT Tutorial
To expand the cluster graph view, right-click the graph and select
Expand Graph from the shortcut menu that appears.
Figure 15.60 shows the results of clustering the mean signal values for three
tissues (six replicates each). T1 is the first point, T2 is the second point and
T3 is the third point.
Step 5: Saving & Annotating a Probe List
After genes of interest are identified, we can save them as a probe list.
1.
Click Cluster 3.
⇒ The probe sets in cluster 3 are displayed at the right in the Probes
box (Figure 15.60).
2.
Enter the name Cluster 3 in the Probe List Name box.
3.
Click Save Selected.
⇒ A probe list is generated that includes the probe sets in cluster 3 and
the probe list is displayed in the data tree.
4.
Annotate the probe list members (see lesson 3, step 5).
In future sessions the annotations may be queried and sorted (see
Chapter 8, Annotations).
Lesson 6 Summary
SOM cluster analysis identifies gene expression patterns in the data. The
threshold and row variation filters help focus the analysis on probe sets that
have the same expression pattern.
The cluster results display patterns of gene expression rather than absolute
expression levels because the Row Normalization filters normalize the
signal data to a mean of zero and variance of one
(see Appendix D).
Adjusting the number of nodes or centroids (rows x columns) affects the
cluster number and the variability of expression patterns grouped together in
a cluster. The optimum number of clusters for a particular data set displays
the narrowest possible error bars with the lowest number of empty or similar
clusters.
We computed the average signal for the T1, T2 and T3 replicates. We
applied SOM cluster analysis to the average signal values. The SOM cluster
Affymetrix® Data Mining Tool User’s Guide
results organize the expression data into groups of genes with similar
expression patterns.
319
320
CH A P T E R 15
DMT Tutorial
A
Appendix A
Filter Grid
A
This Appendix explains the column headings in the filter grid for both
GeneChip® data mode and spot data mode.
GeneChip Data Mode
The filter grid includes expression metrics generated by the Statistical
Expression algorithm (in Microarray Suite 5.0) or the Empirical Expression
algorithm (in Microarray Suite 4.0 or lower).
Statistical Expression Algorithm
Probe Set Name
Identifier for the probe set on a GeneChip® probe array
Signal
A measure of the abundance of a transcript.
Detection
The call that indicates whether the transcript was present
(P), absent (A), marginal (M), or no call (NC)
Detection p-value p-value that indicates the significance of the detection call.
Stat Pairs
The number of probe pairs for a particular probe set on the
array.
Stat Pairs Used
= Pairs - Masked probe pairs - Saturated MM probe pairs
This is the number of pairs used by the Statistical
Expression algorithm to make the detection call in an
absolute analysis.
Signal Log Ratio The change in expression level for a transcript between a
baseline and an experiment array. This change is expressed
as the log2 ratio.
Signal Log Ratio The lower limit of the log ratio within a 95% confidence
Low
interval.
323
324
APPENDIX A
Filter Grid
Signal Log Ratio The upper limit of the log ratio within a 95% confidence
High
interval.
Change
The call that indicates the change in the transcript level
between a baseline and an experiment array.
Change p-value
p-value that indicates the significance of the change call.
Stat Common
Pairs
The intersection of the probe pairs from the baseline and
experiment that are used by the Statistical Expression
algorithm to make the change call in a comparison analysis.
Empirical Expression Algorithm
Probe Set Name
Identifier for the probe set on a GeneChip® probe array
Positive
The number of probe pairs scored positive. A probe pair is
positive if:
PM - MM > Statistical Difference Threshold (SDT)
and
PM/MM > Statistical Ratio Threshold (SRT)
where PM = perfect match intensity and MM = mismatch
intensity.
The SDT is a function of the noise (Q) and is calculated as:
SDT = Q * SDTmultiplier
The SDTmultiplier and the SRT are user-modifiable
parameters (see Affymetrix® Microarray Suite User’s Guide).
The SDTmultiplier is set at 2.0 for the standard staining
protocol or 4.0 for the antibody amplification protocol.
(Refer to the Affymetrix Expression Analysis Technical
Manual.)
The default SRT value is 1.5. Note: Increasing the SDTmult
and SRT increases analysis stringency, reducing these
thresholds decreases analysis stringency.
Affymetrix® Data Mining Tool User’s Guide
325
Negative
The number of probe pairs scored negative. A probe pair is
negative if:
MM - PM > SDT
and
MM/PM > SRT
Pairs
Number of probe pairs for a particular probe set on a
GeneChip® probe array.
Pairs Used
Number of probe pairs per probe set used in the analysis
(Empirical Expression algorithm). This may be the total
number of probes per probe set on the probe array or the
number of probe pairs in a pre-designated subset (for
example, probe pairs specified by a probe mask file and/or a
masked image). Pairs Used = total probe pairs per probe set –
(probe pairs masked in a mask file) – (probe pairs masked in
the image).
Pairs in Average
A trimmed probe set that excludes probes with extremely
intense or weak signal from the analysis. If 8 or fewer probe
pairs are used, Pairs in Avg = Pairs Used (or the number of
probe pairs per probe set minus any that are masked).
Super scoring is performed if more than 8 probe pairs are
used. Superscoring is a process that excludes probe pairs
from calculation of the Avg Diff and Log Avg Ratio if they
are outside a given intensity range.
Microarray Suite software calculates the mean and standard
deviation of the intensity differences (PM – MM) for an
entire probe set (excluding the highest and lowest values).
Those values outside of a set number of standard deviations
(STP) are not included in the calculation of the Avg Diff or
Log Avg Ratio. The STP is a user-modifiable parameter with
a default value = 3.
326
APPENDIX A
Filter Grid
Log Avg
(Log Avg Ratio)
Describes the hybridization performance of a probe set and is
determined by calculating the ratio of the PM/MM intensities
for each probe pair in a probe set, taking the logs of the
resulting values and averaging them for the probe set:
Log Avg = 10 x [Σ log (PM/MM)] / Pairs in Avg
Note: Log Avg = 0 indicates random cross hybridization. The
higher the Log Avg, the more confidence the gene transcript
is present.
Average
Difference
Serves as a relative indicator of the level of expression of a
transcript. It is used to determine the change in the
hybridization intensity of a given probe set between two
different experiments.
The Avg Diff is calculated by taking the difference between
the PM and MM of every probe pair (excluding the probe
pairs where PM – MM is outside the STP standard deviation
of the mean of PM-MM) in a probe set and averaging the
differences for the entire probe set.
Avg Diff = Σ (PM – MM) / Pairs in Avg
Note: Avg Diff cannot be used to compare the hybridization
intensity levels of two different probe sets on the same array.
Absolute Call
Each transcript in an absolute analysis has three possible
Absolute Call outcomes: Present (P), Absent (A), or
Marginal (M). The absolute call is derived from the
Pos/Neg, Positive Fraction and Log Avg absolute call
metrics. Each absolute call metric is weighted and entered
into a decision matrix to determine the status of the
transcript.
Affymetrix® Data Mining Tool User’s Guide
Increase (Inc)
327
Number of probe pairs that increased. A probe pair is
considered to increase if the intensity difference between the
PM and MM probe cells in the experimental sample is
significantly higher than in the baseline sample.
Two criteria must be met for a probe pair to show a
significant increase:
(PM – MM)exp – (PM – MM)baseline > Change Threshold
(CT)
and
[(PM – MM)exp – (PM – MM) baseline] / max [Q/2, min(|PM –
MM|exp, |PM – MM|baseline)] > Percent Change Threshold/100
Affymetrix Microarray Suite computes the Change
Threshold (CT) using the Statistical Difference Threshold of
the experiment and baseline data. Alternatively, you can
specify a value for the CT multiplier, which is multiplied by
the noise of the baseline or experiment data (whichever is
greater) to define CT.
Percent Change Threshold is a user-specified value
(default = 80).
Decrease (Dec)
Number of probe pairs that decreased. A probe pair is
considered to decrease if the intensity difference between the
PM and MM probe cells in the experimental sample is
significantly lower than in the baseline sample. Two criteria
must be met for a probe pair to show a significant decrease:
(PM – MM) baseline – (PM – MM) exp > Change Threshold
(CT)
and
[(PM – MM)baseline – (PM – MM) exp] / max [Q/2, min(|PM –
MM|exp, |PM – MM|baseline)] > Percent Change Threshold/100
Increase Ratio
For each transcript: # Increased probed pairs / # probe Pairs
Used
Decrease Ratio
For each transcript: # Decreased probed pairs / # probe Pairs
Used
328
APPENDIX A
Filter Grid
Positive Change
# Positive probe pairsexp - # Positive probe pairsbaseline
Negative Change # Negative probe pairsexp - # Negative probe pairsbaseline
Difference
Positive Difference
Negative Ratio
(DPos - DNeg
Ratio)
Log Avg Ratio
Change
(Positive Change – Negative Change) / # probe Pairs Used
The DPos – DNeg Ratio and Log Avg Ratio Change are
usually positive when a transcript changes from a very low to
a relatively high expression level and are typically negative
when the expression level changes from a high to a very low
or undetectable level. Both metrics may have values close to
zero if the transcript is present in both the baseline and
experimental samples despite an increase or decrease in the
level of the transcript.
The difference between the Log Avg Ratio of the baseline
and experimental probe array data (in a comparison analysis)
for each transcript. The Log Avg Ratios are recomputed for
each for each probe set based on probe pairs used in both the
baseline and experimental probe arrays (the recomputed
values are not displayed by DMT).
Log Avg Ratio Change = Log Avgexp – Log Avgbase
Difference Call
Each transcript in a comparison analysis has five possible
Difference Call outcomes:
(1) Increase (I), (2) Marginally Increase (MI), or
(3) Decrease (D), (4) Marginally Decrease (MD), and (5) No
Change (NC).
The difference call is derived from the comparison metrics:
Max [Increase/Total, Decrease/Total], Increase/Decrease
Ratio, Log Average Ratio Change, and Dpos – Dneg Ratio.
Each comparison metric is weighted and entered into a
decision matrix to determine the status of the transcript (see
Affymetrix Microarray Suite User’s Guide).
Average
Difference
Change
Avg Diffexperiment - Avg Diffbaseline
Affymetrix® Data Mining Tool User’s Guide
Fold Change
FC =
329
Indicates the relative change in the expression levels
between the experiment and baseline targets. The Fold
Change (FC) for a transcript is a positive number when the
expression level in the experiment increases compared to the
baseline and is a negative number when the expression level
in the experiment declines.
) {
AvgDiffChange
max [min (AvgDiffexp, AvgDiffbase), QM x QC]
(
+
}
+ 1 if AvgDiffexp > AvgDiffbase
- 1 if AvgDiffexp < AvgDiffbase
Qc = max(Qexp, Qbase)
QM = 2.1 for a 50 µm feature, QM = 2.8 for a 24 µm or
smaller feature
Microarray Suite recomputes the normalized or scaled Avg
Diff values in both the experimental and baseline data sets to
include only probe pairs used in both the baseline and
experiment arrays. This recomputation is not done in the
DMT calculation. If the noise (Q) of the experiment or
baseline is greater than the Avg Diff of the transcript
(baseline or experiment data), the Fold Change is calculated
over the noise and is an approximation (a tilde character (~)
precedes the approximated Fold Change value in the *.chp
file).
Sort Score
A ranking based on the fold Change and the Avg Diff
Change. The higher the fold Change and the Avg Diff
Change, the higher the Sort Score.
330
APPENDIX A
Filter Grid
Spot Data Mode
Spot
Identifier for the spotted probe.
Intensity
Background-subtracted spot intensity.
Standard deviation
(SD)
Standard deviation of the intensity signal.
Pixels
Number of pixels in the image data file (*.tif) used
to calculate the intensity for a spot.
Background
Background calculated for a spot.
Background SD
Standard deviation of the spot background.
Ratio
If channel 1 > channel 2, ratio = - channel1/channel 2,
otherwise ratio = channel 2/channel 1.
B
Appendix B
Working with Windows & Tables
B
The windows and tables found in Affymetrix® Data Mining Tool can
be modified to suit the individual needs of the user or data. This
appendix explains the options available.
Query Windowpanes
Expanding a Windowpane
1.
To expand the results pane (or the graph pane), right-click the pane and
select Expand Results (or Expand Graph) from the shortcut menu.
Alternatively, select View → Expand Results (or View → Expand
Graph) from the main menu.
⇒ The results (or graph) pane is enlarged and the graph (or results)
pane is hidden.
2.
Repeat step 1 to return the pane to its original size.
Resizing a Windowpane
You may resize a windowpane using the click-and-drag method to move a
border.
1.
Place the mouse pointer over a border so that it changes from a single
arrow
to a double arrow .
2.
Use the click-and-drag method to move a border in the horizontal or
vertical direction and resize the windowpane.
333
334
APPENDIX B
Working with Windows & Tables
Clearing the Results or Graph Pane
Right-click the results (or graph) pane and select Clear Results (or
Clear Graph) from the shortcut menu.
Alternatively, select Edit → Clear Results (or Edit → Clear Graphs)
from the main menu.
⇒ All graphs are cleared and the graph pane is hidden.
Tables
Selecting the Entire Table
Click the upper left corner of a table.
⇒ All rows in the table are selected (Figure B.1).
Figure B.1
Query table
Selecting Rows
To select adjacent rows, press and hold the SHIFT key while you click
the first and last row in the selection.
To select non-adjacent rows, press and hold the CTRL key while you
click the rows.
Affymetrix® Data Mining Tool User’s Guide
Resizing Columns
1.
Place the mouse pointer over the border of a column header.
⇒ The mouse pointer changes from a single arrow
arrow
.
to a double
Figure B.2
Query table, adjusting width of Analysis Name column
2.
Use the click-and-drag method to adjust the column width.
1.
Right-click an analysis column header in the experiment information
table or a metric column header in the query or pivot table (Figure B.3).
Hiding Columns
Figure B.3
Pivot table, shortcut menu of column commands
335
336
APPENDIX B
Working with Windows & Tables
2.
Select Hide Column from the shortcut menu.
⇒ The selected column is hidden.
3.
To show hidden columns, right-click an analysis column header in the
experiment information table or a metric column header in the query or
pivot table, then select Show All Columns from the shortcut menu
(Figure B.3).
Reordering Columns
1.
Click a column header and use the click-and-drag method to move the
column.
2.
In the pivot table, click a column header (analysis) and use the clickand-drag method to reorder the column and its subordinate results
columns.
3.
In the pivot table, click a subordinate column header (results), then use
the click-and-drag method to reorder the results column within the
analysis.
DMT retains the column order of the results table in saved queries
and as a user preference. If you open a previously saved query, DMT:
1) displays the results tables using the saved column order, and 2)
unhides any hidden columns of results data.
If you create a new query, DMT applies the column settings used in
the previous session.
C
Appendix C
Query Table Data
C
After running a query, results are presented in the Query Table. This
appendix defines the column headings and explains the information
found there, for both GeneChip Data Mode and Spot Data Mode.
GeneChip® Data Mode
Statistical Expression Algorithm Metrics
Probe Set Name
Identifier for the probe set on a GeneChip® probe array
Signal
A measure of the abundance of a transcript.
Detection
The call that indicates if the transcript was detected (P) or
undetected (A)
Detection p-value p-value that indicates the significance of the detection call.
Stat Pairs
The number of probe pairs for a particular probe set on the
array.
Stat Pairs Used
= Pairs - Masked probe pairs - Saturated MM probe pairs
This is the number of pairs used by the Statistical
Expression algorithm to make the detection call in an
absolute analysis.
Signal Log Ratio The change in expression level for a transcript between a
baseline and an experiment array. This change is expressed
as the log2 ratio.
Signal Log Ratio The lower limit of the log ratio within a 95% confidence
Low
interval.
Signal Log Ratio The upper limit of the log ratio within a 95% confidence
High
interval.
339
340
APPENDIX C
Query Table Data
Change
The call that indicates the change in the transcript level
between a baseline and an experiment array.
Change p-value
p-value that indicates the significance of the change call.
Stat Common
Pairs
The intersection of the probe pairs from the baseline and
experiment that are used by the statistical Expression
algorithm to make the change call in a comparison
analysis.
Empirical Expression Algorithm Metrics
Analysis Name
Name of the experiment entered during experiment set up.
Probe Set Name
Identifier for the probe set on the array.
Positive
Number of probe pairs scored positive. A probe pair is called
positive if the intensity of the PM probe cell is significantly
greater than that of the corresponding MM probe cell.
To evaluate intensity, the Empirical Expression algorithm
calculates the ratio and difference associated with each probe
pair and compares these values to the Statistical Difference
Threshold (SDT) and the Statistical Ratio Threshold (SRT).
A probe pair is positive if: PM - MM > SDT and PM/MM >
SRT.
Negative
Number of probe pairs scored negative. A probe pair is called
negative if the intensity of the MM probe cell is significantly
greater than that of the corresponding PM probe cell.
To evaluate intensity, the expression algorithm calculates the
ratio and difference associated with each probe pair and
compares these values to the Statistical Difference Threshold
(SDT) and the Statistical Ratio Threshold (SRT). A probe
pair is negative if: MM - PM > SDT and MM/PM > SRT.
(See Affymetrix® Microarray Suite User’s Guide for further
information.)
Pairs
Number of probe pairs for a particular probe set on the probe
array.
Affymetrix® Data Mining Tool User’s Guide
Pairs Used
341
Number of probe pairs per probe set used in the analysis.
This may be the total number of probes per probe set on the
probe array or the number of probe pairs in a pre-designated
subset (for example, probe pairs specified by a probe mask
file or a masked image).
Pairs Used = total probe pairs per probe set - (probe pairs
masked in a mask file) - (probe pairs masked in the image)
Pairs in Avg
A trimmed probe set that excludes probes with extremely
intense or weak signal from the analysis. If 8 or fewer probe
pairs are used, Pairs in Avg = Pairs Used (or the number of
probe pairs per probe set minus any that are masked). Super
scoring is performed if more than 8 probe pairs are used.
Superscoring is a process that excludes probe pairs from
calculation of the Avg Diff and Log Avg Ratio if they are
outside a given intensity range. Microarray Suite calculates
the mean and standard deviation of the intensity differences
(PM – MM) for an entire probe set (excluding the highest
and lowest values). Those values outside of a set number of
standard deviations (STP) are not included in the calculation
of the Avg Diff or Log Avg Ratio. The STP is a usermodifiable parameter with a default value = 3.
Pos Fraction
Number of positive probe pairs divided by the number of
probe pairs used.
Log Avg
Describes the hybridization performance of a probe set and is
determined by calculating the ratio of the PM/MM intensities
for each probe pair in a probe set, taking the logs of the
resulting values, and averaging them for the probe set:
Log Avg = 10 x [Σ log (PM/MM)] / Pairs in Avg
Pos/Neg
Ratio of positive probe pairs to negative probe pairs in a
probe set (# Positive probe pairs/# Negative probe pairs).
342
APPENDIX C
Query Table Data
Avg Diff
This parameter serves as a relative indicator of the level of
expression of a transcript. It is used to determine the
change in the hybridization intensity of a given probe set
between two different experiments.
Note: Avg Diff cannot be used to compare the hybridization
intensity levels of two different probe sets on the same array.
Avg Diff is calculated by taking the difference between the
PM and MM of every probe pair (excluding the probe pairs
where PM – MM is outside the STP standard deviation of the
mean of PM-MM) in a probe set and averaging the
differences for the entire probe set:
Avg Diff = Σ (PM – MM) / Pairs in Avg
Norm Avg Diff
Avg Diff x Normalization Factor
DMT computes the normalization factor (NF) using all probe
sets on the array in an analysis, then applies any specified
filters. All intensities in an analysis are multiplied by the NF.
Absolute Call
Each transcript in an absolute analysis has three possible
Absolute Call outcomes: Present (P), Absent (A), or
Marginal (M). The absolute call is derived from the Pos/Neg,
Positive Fraction, and Log Avg absolute call metrics. Each
absolute call metric is weighted and entered into a decision
matrix to determine the status of the transcript. (See
Affymetrix® Microarray Suite User’s Guide for further
information.)
Affymetrix® Data Mining Tool User’s Guide
Increase (Inc)
343
Number of probe pairs that increased. A probe pair is
considered to increase if the intensity difference between the
PM and MM probe cells in the experimental sample is
significantly higher than in the baseline sample.
Two criteria must be met for a probe pair to show a
significant increase:
(PM – MM)exp – (PM – MM)baseline > Change Threshold
(CT)
and
[(PM – MM)exp – (PM – MM) baseline] / max [Q/2, min(|PM –
MM|exp, |PM – MM|baseline)] > Percent Change Threshold/100
Affymetrix Microarray Suite computes the Change
Threshold (CT) using the Statistical Difference Threshold of
the experiment and baseline data. Alternatively, you can
specify a value for the CT multiplier, which is multiplied by
the noise of the baseline or experiment data (whichever is
greater) to define CT.
Percent Change Threshold is a user-specified value
(default = 80).
Decrease (Dec)
A probe pair is considered to decrease if the intensity
difference between the PM and MM probe cells in the
experimental sample is significantly lower than in the
baseline sample. Two criteria must be met for a probe pair to
show a significant decrease:
(PM – MM) baseline – (PM – MM) exp > Change Threshold
(CT),
and
[(PM – MM)baseline – (PM – MM) exp] / max [Q/2, min(|PM –
MM|exp, |PM – MM|baseline)] > Percent Change Threshold/100
(See Affymetrix® Microarray Suite User’s Guide for further
information.)
Inc Ratio
For each transcript:
# Increased probe pairs / # probe Pairs Used
344
APPENDIX C
Query Table Data
Dec Ratio
For each transcript:
# Decreased probe pairs / # probe Pairs Used
Pos Change
# Positive probe pairsexperiment - # Positive probe pairsbaseline
Neg Change
# Negative probe pairsexperiment - # Negative probe
pairsbaseline
Inc/Dec
For each transcript:
# Increased probe pairs / # Decreased probe pairs
Dpos-Dneg
Ratio
(Positive Change – Negative Change) / # probe Pairs Used
Log Avg Ratio
Change
The difference between the Log Avg Ratio of the baseline
and experimental probe array data (in a comparison analysis)
for each transcript. The Log Avg Ratios are recomputed for
each for each probe set based on probe pairs used in both the
baseline and experimental probe arrays (the recomputed
values are not displayed by DMT).
The Dpos – Dneg Ratio and Log Avg Ratio Change are
usually positive when a transcript changes from a very low to
a relatively high expression level and are typically negative
when the expression level changes from a high to a very low
or undetectable level. Both metrics may have values close to
zero if the transcript is present in both the baseline and
experimental samples despite an increase or decrease in the
level of the transcript.
Log Avg Ratio Change = Log Avgexp – Log Avgbase
Diff Call
Each transcript in a comparison analysis has five possible
Difference Call outcomes: (1) Increase (I), (2) Marginally
Increase (MI), or (3) Decrease (D), (4) Marginally Decrease
(MD), and (5) No Change (NC).
The difference call is derived from the comparison metrics:
Max [Increase/Total, Decrease/Total], Increase/Decrease
Ratio, Log Average Ratio Change, and Dpos – Dneg Ratio.
Each comparison metric is weighted and entered into a
decision matrix to determine the status of the transcript.
(See Affymetrix® Microarray Suite User’s Guide for further
information.)
Affymetrix® Data Mining Tool User’s Guide
Avg Diff
Change
345
Serves as a relative indicator of the level of expression of a
transcript. It is used to determine the change in the
hybridization intensity of a given probe set between two
different experiments.
The Avg Diff is calculated as:
Avg Diff Change = Avg Diffexp – Avg Diffbaseline
B=A
(Baseline =
Absent)
An asterisk (*) in this column indicates the transcript is
called absent (A) in the baseline.
Fold Change
The Fold Change indicates the relative change in the
expression levels between the experiment and baseline
targets. The Fold Change for a transcript is a positive number
when the expression level in the experiment increases
compared to the baseline and is a negative number when the
expression level in the experiment declines. The Fold
Change (FC) is calculated as:
FC =
) {
AvgDiffChange
max [min (AvgDiffexp, AvgDiffbase), QM x QC]
(
+
}
+ 1 if AvgDiffexp > AvgDiffbase
- 1 if AvgDiffexp < AvgDiffbase
(See Affymetrix® Microarray Suite User’s Guide for further
information.)
Approx
If the noise (Q) of the experiment or baseline array is greater
than the Avg Diff of the transcript (the baseline or
experimental data), the Fold Change is calculated over the
noise and is an approximation (a tilde character (~) precedes
the approximated Fold Change value in the *.chp file.
Sort Score
The Sort Score is a ranking based on the Fold Change and the
Avg Diff Change. The higher the Fold Change and the Avg
Diff Change, the higher the Sort Score.
346
APPENDIX C
Query Table Data
Spot Data Mode
Analysis name
Name of the experiment associated with an intensity
results file (*.spt)
Spot
Identifier for the spotted probe
Intensity
Background-subtracted intensity for the selected spot
Standard Deviation
Standard deviation of the spot intensity
Pixels
Number of pixels in the image data file (*.tif) used to
calculate the channel signal (intensity)
Background
Background calculated for the spot
Background SD
Standard deviation of the background for a spot
Ratio
Ratio of channel 1/channel2 intensity data
D
Appendix D
DMT Algorithms
D
This appendix provides further information on the three algorithms
used in Affymetrix® Data Mining Tool: the SOM clustering algorithm,
Correlation Coefficient clustering algorithm and the Matrix
algorithm.
The SOM Algorithm
The self organizing map (SOM) algorithm applies cluster analysis to
GeneChip® metric data to help identify gene expression patterns.
The algorithm considers the expression levels of n probe sets (or the
intensities of n probes) in k experiments as n points in k-dimensional space.
Initially, the algorithm randomly places a grid of nodes or centroids onto the
k-dimensional space. The rows and columns of nodes determine the number
of clusters identified by the algorithm (rows x columns = number of
clusters). Figure D.1 shows a 3 x 2 arrangement of nodes that can identify six
clusters of gene expression patterns.
Figure D.1
3 x 2 arrangement of nodes
349
350
APPENDIX D
DMT Algorithms
The user specifies the rows and columns of nodes as well as the initial
placement of the nodes (initialization). The random vectors method
randomly places the nodes in k-dimensional space. The random datapoints
method places the nodes on randomly-selected points.
Next, the algorithm iteratively adjusts the node positions toward clusters of
points. At each iteration, it selects a data point (P) and moves (updates) the
node closest to P (the target node, Np) toward P. (The data points are
randomly ordered for selection and recycled as needed through the
iterations.) Other nodes may also move toward P, depending on their
distance from Np, the type of neighborhood selected (discussed in the
following section) and time (iteration).
The algorithm updates a node using the formula:
f i + 1 ( N ) = f i ( N ) + α ( d ( N, N P ) ,i ) ( P – f i ( N ) )
where:
N = the node being updated
P = the data point being considered
fi(N) = the position of N at iteration i
Np = the target node (the node closest to P)
α = distance N moves toward P in iteration i (learning rate), which is a
function of:
❥
d(N, Np), the distance between N (the node being considered) and Np
(the target node) in two-dimensional space
❥
i, iteration
P - fi(N) = distance between P and N in k-dimensional space
T = maximum number of iterations
Affymetrix® Data Mining Tool User’s Guide
Neighborhood
The neighborhood describes an area around the target node, Np. At each
iteration, Np and all nodes in the neighborhood move toward the P, the point
being considered. There are two types of neighborhoods: bubble or
Gaussian.
Bubble Neighborhood
The bubble neighborhood specifies a radial distance from Np (default = 5).
At an iteration, all nodes in the bubble neighborhood are updated by the
same amount. Nodes outside the bubble neighborhood are not updated.
Neighborhood size is a user-modifiable parameter that specifies the width of
the bubble neighborhood. Neighborhood size decays with time (iterations)
as described by the following equation:
Neighborhood sizei = neighborhood size_i * (neighborhood size_f /
neighborhood size_i)i/T
where:
neighborhood sizei = width of bubble neighborhood at iteration i
neighborhood size_i = initial width of bubble neighborhood at the first
iteration
neighborhood size_f = final width of bubble neighborhood at the last
iteration
T = the maximum number of iterations (iterations = epochs x number of
probe sets (or probes))
Gaussian Neighborhood
In the Gaussian neighborhood, all nodes are updated at each iteration. The
distance a node moves is a function of its distance from the target node (Np).
The greater the distance between N and Np, the less N moves toward P.
351
352
APPENDIX D
DMT Algorithms
Learning Rate
The learning rate is a user-modifiable parameter that specifies the distance a
node moves toward P at each iteration. The learning rate decays with time
(iteration) as described by the following equation:
learning ratei = alpha_i * (learning rate_f /learning rate_i)i/T
where:
learning ratei = learning rate at iteration i
learning rate_i = initial learning rate at the first iteration
learning rate_f = final value of the learning rate at the last iteration
T = the maximum number of iterations (iterations = epochs x number of
probe sets (or probes))
Affymetrix® Data Mining Tool User’s Guide
The Correlation Coefficient
Clustering Algorithm
The correlation coefficient (ρ) between two probe set expression patterns (X
and Y) across all analyses is determined by the equation:
N
1--⋅
( X – Xm ) ⋅ ( Yi – Ym )
N ∑ i
Cov ( X, Y )
i=1
ρ ( X, Y ) = ------------------------- = ------------------------------------------------------------------σXσY
σXσY
where:
Cov (X,Y) is the covariance between X and Y
σX = standard deviation of X, σY = standard deviation of Y
Xm = mean Avg Diff (or normalized Avg Diff) for probe set X across all
analyses
Ym = mean Avg Diff (or normalized Avg Diff) for probe set Y across all
analyses
Xi = Avg Diff (or normalized Avg Diff) for probe set X from analysis i
Yi = Avg Diff (or normalized Avg Diff) for probe set Y from analysis i
N = number of analyses
Covariance increases when (Xi - Xm) and (Yi - Ym) are both positive or
negative. The covariance decreases when (Xi - Xm) is positive and (Yi - Ym)
is negative, or vice versa. Each analysis is weighed equally. The order in
which the analyses are used to compute ρ(X,Y) is not important because all
data are compared to the mean.
The value of ρ(X,Y) can range from -1 to +1: ρ = 1 indicates perfect positive
correlation, ρ = 0 indicates no correlation and ρ = -1 indicates perfect
inverse correlation. The correlation coefficient clustering algorithm is
designed to identify positive correlations, not negative inverse correlations.
353
354
APPENDIX D
DMT Algorithms
The Matrix Algorithm
The matrix algorithm determines the overlap significance between two lists
(the probe sets or spotted probes common to both lists). The matrix displays
the overlap significance value and highlights values that exceed the overlap
significance threshold (pink) or the non-overlap significance threshold
(yellow).
The algorithm uses the binomial distribution equation to calculate the
probability (p-value) that the observed overlap between two lists is expected
due to random chance.
The classification algorithm computes a p-value for each overlap
significance value:
n!
x
n–x
P = ----------------------- ⋅ w ⋅ ( 1 – w )
( n – x )!x!
where,
P=
probability that the observed overlap is due to random chance
n=
number of probe sets (or spotted probes) in the first list (rows)
x=
observed number of probe sets (or spotted probes) that overlap in
the two lists
w=
frequency of probe sets (or spotted probes) in the second list and
w = b/t
where b = number of probe sets (or spotted probes) in the second
list and t = total population
The p-value may range from zero to one. A score of one indicates there is no
relationship (overlap) between the lists and that the observed distribution of
probe sets or spotted probes in the two lists is expected to occur due to
random chance. A score close to zero indicates the observed overlap
between the two lists is not expected to occur due to random chance.
The algorithm computes the overlap significance score from the p-value:
Overlap significance = -log P
Affymetrix® Data Mining Tool User’s Guide
As a result, higher values in the matrix indicate greater overlap or nonoverlap significance between two lists. The algorithm uses the following
rules to distinguish between these two possibilities:
■
x > wn, there is greater overlap than expected by random chance
■
x< wn, there is less overlap than expected by random chance
The matrix displays the overlap significance value. It highlights values that
exceed the overlap significance threshold (pink) or values that exceed the
non-overlap significance threshold value (yellow).
355
356
APPENDIX D
DMT Algorithms
E
Appendix E
Toolbars & Shortcuts
E
You can display toolbars with text labels. To display the toolbar button
labels, select View → Toolbar → Text labels from the menu bar.
DMT Main Toolbar
Figure E.1
DMT main toolbar
Table E.1
DMT main toolbar button descriptions
Menu Command
Button
Function
Data → Open
Displays the Open dialog box. Select and open a
previously saved query from the Open dialog box.
Data → Save
Displays the Save dialog box so that a query may
be named and saved.
Data → Print
Displays the Print dialog box.
Help → Contents
Displays DMT help contents.
359
360
APPENDIX E
Toolbars & Shortcuts
Session Toolbar
Figure E.1
Session toolbar
Table E.2
Session Toolbar Button Descriptions
Menu Command
Button
Function
Edit → Copy Cells
Copies the cells selected in a results table to
the system clipboard.
Edit → Find in Results
Displays the Find Probe Set dialog box that
enables a text search of probe sets or
spotted probe names in the query or pivot
table. The search includes probe or probe
set descriptions when these are displayed in
the pivot table.
Query → Experiment Information
Displays experiment information for the
analyses selected in the data tree.
Query → Run Query
Executes the query for the analyses
selected in the data tree and populates the
query table.
Query → Pivot
Executes the query for the analyses
selected in the data tree and populates the
pivot table.
Annotations → Annotate Probe Sets
Displays the Annotate dialog box.
Annotations → Query Annotations
Runs the annotation query.
Graph → Scatter
Displays the Scatter Graph dialog box.
Graph → Fold Change
Displays the Fold Change Graph dialog box.
Graph → Series
Displays the Series Graph dialog box.
Graph → Histogram
Displays the Histogram dialog box.
Affymetrix® Data Mining Tool User’s Guide
Table E.2
Session Toolbar Button Descriptions
Menu Command
Button
Function
Graph → Lasso Points
Changes the cursor to a drawing tool that
can circle (lasso) points in the scatter
graph.
View → Options
Displays the Data Mining Options dialog
box.
View → Analysis Filters
Displays the Filter Analysis dialog box.
View → Data Tree
Displays or hides the data tree in the Query
window.
View → Results Filters
Displays or hides the filter grid in the DMT
session.
Shortcut Descriptions
Menu Bar Command
Shortcut Key
Data → Save
CTRL + S
Data → Print
CTRL + P
Edit → Copy Cells
CTRL + C
Edit → Copy Graph
CTRL + G
Edit → Find in Results
CTRL + F
Query → Run Query
CTRL + Q
Annotations → Annotate Probes
CTRL + A
361
362
APPENDIX E
Toolbars & Shortcuts
Index
A
comparison
Affymetrix
LIMS 26
technical support 6
algorithm
correlation coefficient
clustering 240–250, 353
matrix 354
SOM clustering 231–240,
349–352
annotation query results 120
tables 110
analysis filters 30, 31
filter analysis dialog box
components 70
specifying 70–75
annotations
deleting 122
loading 116–117
query results 120
querying 118–120
array sets
creating 149–151
defined 149
deleting 153
editing 152
viewing 151
virtual set option 150
attributes
finding 76
average 210–211
C
cluster analysis 239
correlation coefficient
algorithm 45, 240–250
SOM algorithm 43, 231–240
cluster correlation coefficient
threshold 245, 246, 247
DMT session
components 61
data tree 27
filter grid 27
graph pane 27
results pane 27
correlation coefficient 353
correlation coefficient
algorithm 45, 240–250, 353
filtering 240–241, 244–245
modifiable parameters 247
saving seeds 249
seed 245, 248–250
seeding 240–241
configuration for Oracle 17
creating an Oracle 17
selecting 70–78
GeneChip® data 28
spot data 29
copying
alias
analyses
DMT display
query operators 66
count & percentage analysis
218–220
documentation
conventions used 4
E
epochs 239
experiment information table
33–35, 93–95
GeneChip® data 94
D
data tree 27
database 25
publish 25
registering 51
selecting 53
unregistering 52
database connections 50
LIMS 50
MicroDB 50
default directory 54–55
deleting
annotations 122
array set 153
probe list 144
query 90
descriptions 107
DMT
installing 9–16
main toolbar 359
main toolbar buttons 359
main window 50
overview 25
session toolbar 360
shortcuts 361
starting 49
exporting data
query results table 111
expression call search strings
68
F
filter
adding probes 109
filter (correlation coefficient
algorithm) 247
filter analysis dialog box
attribute section 72
components 73
find function 76
sample section 77
components 77
filter grid 27
adding probe lists 64
components 62
editing limits 65
entering limits 63
expression call search strings
68
GeneChip® data 63, 323
query builder 68
363
364
I n dex
query operators 66
sort order 65
specifying 61–65
spot data 330
magnifying 176–178
plotting 173–176
selecting points 181
viewing probe information 179
histogram 42, 193–202
adding landmarks 196–
197
display options 199–202
magnifying 198
plotting 193–194
viewing bar information
195
printing 204
scatter graph 39, 158–171
display options 168–171
locating probes 163
magnifying 161–163
plotting 158–161
selecting points 166–168
viewing probe information 164–165
series graph 40, 185–193
display options 191–193
locating probes 188
plotting 186–187
viewing probe information 189
filtering (correlation coefficient
algorithm) 240–241
filters
analysis 30, 31
results 30, 31
selecting from sample
section 78
find function 106
find probe 106
templates and attributes 76
fold change 212–213
fold change graph 40, 171–184
display options 183–184
locating probes 178
magnifying 176–178
plotting 173–176
selecting points 181
viewing probe information
179
G
GeneChip® data
analysis filters 70–75
DMT display 28
experiment information table
94
expression call search strings
68
filter grid explained 323
new query 59
query table data explained
339
graph pane 27
enlarging 202
graphs
clearing 204
color options 202–204
copying 204
fold change 40
fold change graph 171–184
display options 183–184
locating probes 178
L
lasso points
fold change graph 181
learning rate 240, 352
limits
editing 65
entering 63
LIMS 26
database connections 50
lists
query operators 67
M
Mann-Whitney test 216–217
matrix 46, 225
matrix analysis 223–228
overlap significance 223
population size 224
running 225
median 210–211
MicroDB 26
database connections 50
N
neighborhood 240, 351
new query
H
GeneChip® data 59
spot data 59
histogram 42, 193–202
adding landmarks 196–197
display options 199–202
graph options 199
magnifying 198
plotting 193–194
viewing bar information 195
nodes 239, 349
normalization 79–81
after query or pivot 81
before query or pivot 80
intensity threshold 83
low and high percentage 83
options 81–83
target intensity 83
I
initialization (SOM algorithm)
239
installing DMT 9–16
inter-quartile range 210–211
O
open a saved query 89
Oracle
alias configuration 17
creating an alias 17
Affymetrix® Data Mining Tool User’s Guide
overlap significance 223
probe sets
p-value 354
P
pivot
normalizing data
after pivot 81
before pivot 80
pivot operation 101
pivot table 37, 97–105
annotating probes 108
including probe descriptions
102
options 104
selecting and viewing data
99–100
sorting columns 103
publish database 25
publishing applications 26
p-value overlap significance
results pane 27
results tables
annotating probes 108
copying 110
experiment information 93–
95
exporting 111
find 106
gene information 107
pivot 97–105
query 96
text search 106
viewing descriptions 107
354
Q
query
annotations 118–120
builder 68–69
building 30, 59
deleting 90
normalizing data
after query 81
before query 80
open previously saved 89
operators 66
results 30
running 30, 79
save as 88
saving 87
selecting analyses 70–78
statements 66, 67
population size 224
printing
graphs 204
probe descriptions
pivot table 102
probe lists
adding to a filter 64
adding to results filter 137
combining 132, 142
creating 127
creating from
cluster analysis results
237
clustering results 251
query or pivot table 128
results filter 132
search array descriptions
131
deleting 144
editing members 140
highlighting members 138
input file 135
loading 134–137
managing 140
specifying input file 135
specifying members 134
using 137
viewing and editing 140
statistical analyses 43
tables
experiment information
33–35
pivot 37
query 35
viewing 33
maximum in seeding
(correlation coefficient
algorithm) 247
minimum in seeding
(correlation coefficient
algorithm) 247
query table 35, 96
GeneChip® data 339
sort order 65
spot data 346
R
ranges
query operators 66
registering a database 51
results
analyzing 43
cluster analysis 43
matrix analysis 46
pane
clearing 112
expanding 111
row normalization (SOM
algorithm) 239
row variation filters (SOM
algorithm) 239
S
saving
cluster member probe list 251
probe list 237
query 87
scatter graph 39, 158–171
absolute call combinations
170
display options 168–171
locating probes 163
magnifying 161–163
plotting 158–161
point options 169
selecting points 166–168
viewing probe information
164–165
search strings 68
absolute call 68
difference call 68
365
366
I n dex
seed (correlation coefficient
algorithm) 245
importing 248
maximum probe sets 247
minimum probe sets 247
saving 249
threshold 247
status bar
viewing 53
T
tables
annotating probes 108
copying 110
experiment information 93–
95
exporting data 111
find 106
gene information 107
modification options 333
modifying layout 334
pivot 97–105
sorting columns 103
query 96
text search 106
viewing descriptions 107
working with 334
seed (SOM algorithm) 239
seeding (correlation coefficient
algorithm) 240–241
selecting
analyses 70–78
database 53
series graph 40, 185–193
display options 191–193
formats 192
locating probes 188
plotting 186–187
viewing probe information
189
SOM algorithm 43, 231–240,
349–352
row variation filters 239
threshold filters 238
user-modifiable parameters
239, 351–352
spot data
DMT display 29
new query 59
technical support 5
templates
finding 76
text searches 106
threshold filters (SOM
algorithm) 238
toolbar
buttons 359
DMT session 360
DMT session buttons 360
main 359
spot data mode
filter grid explained 330
query table data explained
346
standard deviation 210–211
starting
DMT 49
statistical analyses
average 210–211
count & percentage 218–220
fold change 212–213
inter-quartile range 210–211
Mann-Whitney test 216–217
median 210–211
standard deviation 210–211
T-Test 214–215
T-Test 214–215
tutorial lessons 255
U
unregistering a database 52
V
viewing descriptions 107
W
windows
main window tasks 50
modification options 333
view status bar 53