Download Aucun titre de diapositive - Universidad Nacional De Colombia

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to
DNA Microarrays
DNA Microarrays and
DNA chips resources on
the web
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
INTRODUCTION
Microarray analysis is a new technology that allows scientists
to simultaneously detect thousands of genes in a small sample
and to analyze the expression of those genes.
Microarrays are simply ordered sets of DNA molecules of
known sequence. Usually rectangular shaped, they can consist
of a few hundred to hundreds of thousands of sets. Each
individual sequence goes on the array at precisely defined
location.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Potential application domains









Identification of complex genetic diseases
Drug discovery and toxicology studies
Mutation/polymorphism detection (SNP’s)
Pathogen analysis
Differing expression of genes over time, between tissues, and
disease states
Preventive medicine
Specific genotype (population) targeted drugs
More targeted drug treatments – AIDS
Genetic testing and privacy
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
The technique
Based on already known methods, such as fluorescence and
hybridization. High throughput miniaturized method.
It's main purpose is to compare gene transcription levels in two or
more different kinds of cells.
- Microarrays
- DNA chips
- SAGE
- Beads (liquid chip)
…
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
The challenge
The big revolution here is in the "micro" term. New slides will
contain a survey of the human genome on a 2 cm2 chip! The use
of this large-scale method tends to create phenomenal amounts
of data, that have then to be analyzed, processed and stored.
As the technique is quite new, analyzing the data is still a
problem, and nothing is standardized yet. A few databases and
on-line repositories are coming out, and the future standard will
probably be chosen among them.
This is a job for… Bioinformatics !
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
General overview

Making the chip
Experiment design, sequence selection, collection
maintenance, PCR, spotting, printing, synthesis


Probe hybridization


Scanning and image treatment


Fluorescence correction, find spots, background
Analysing the data



Probe purification, labelling, hybridization, washing
Filtering, normalisation
Clustering (hierarchical, centroid, SPC)
Representation, storage

Graphics, databases, web public resources
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
}
wet lab
THE EXPERIMENT : making the chip
1- Designing the chip : choosing genes of interest for
the experiment and/or select the samples
- Selection of sequences that represent the investigated genes.
- Finding sequences, usually in the EST database.
- Problems : sequencing errors, alternative splicing, chimeric
sequences, contamination…
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : making the chip
2- Spotting the sequences on the substrate
- Substrate : usually glass, but also nylon membranes, plastic,
ceramic…
- Sequences : cDNA (500-5000 nucleotides, dna chips),
oligonucleotides (20~80-mer oligos, oligo chips), genomic DNA (
~50’000 bases)
- Printing methods : microspotting, ink-jetting (for dna chips) or insitu printing, photolithography (for oligos, Affymetrix method)
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : making the chip
Microspotting and ink-jetting
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : making the chip
The microspotting is done by a robot called “arrayer”
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : making the chip
Oligo-spotting (Affymetrix method)
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : hybridization
Sample preparation
-
Extracting DNA (for genomic studies) or mRNA (for gene
expressions studies) from the two or more samples to
compare.
-
Making cDNAs with extracts, and labeling them with
different fluorochromes to allow direct comparison. (Cy-3,
Cy-5, DIG…)
-
Some techniques use radiolabeling
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : hybridization
Probes are overlaid on the chip, put in a
hybridization chamber, and then washed.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : generating data
Chip scanning
- Fluorescence measurements are made with scanning laser
fluorescence microscope that scans the slide, illuminating
each DNA spot and measuring fluorescence for each dye
separately. It creates one red and one green image.
- The two images are then superimposed to give a virtual
result of RNA ratio in both samples
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : generating data
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
1- Samples
2- Extracting mRNA
3- Labeling
4- Hybridizing
5- Scanning
6- Visualizing
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Examples of images
Affymetrix chip
Stanford array
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : generating data
Image analysis
- These fluorescence measures are then used to determine the
ratio, and in turn the relative abundance, of the sequence of each
specific gene in the two mRNA or DNA samples.
- This analysis is performed by a software such as “Scanalyze”,
available at : http://rana.lbl.gov/EisenSoftware.htm
or “Spotfinder” from TIGR
- The files created can then be submitted to further analysis
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : making sense of the data
Although the visual image of a
microarray panel is alluring,
its information content, per se,
is still not human readable.
How to visualize, organize and explore the meaning of information
consisting of several million measurements of expression of
thousands of genes under thousands of conditions?…
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
THE EXPERIMENT : making sense of the data
Data mining depends on the questions which are asked. The
most frequent question is to find sets of genes that have
correlated expression profiles (belonging to the same biological
process and/or co-regulated), or to divide conditions to groups
with similar gene expression profiles (for example divide drugs
according to their effect on gene expression).
The method used to answer these questions is called
CLUSTERING.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Clustering data
•Input: N data points, Xi, i=1,2,…,N (the color ratios measured with
Scanalyze, for example) in a D dimensional space. N and D will be
either genes and conditions for gene clustering, or conditions and
genes for condition clustering.
•Goal: Find “natural” groups or clusters.
•Note: according to the method, the number of clusters will be fixed
from the beginning (centroid clustering) or determined after the
analysis (hierarchical clustering)
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Clustering data
Before clustering, a few steps to “clean the data” are necessary
(normalization, filtering)
Clustering methods (examples) :
1- Agglomerative Hierarchical
2- Centroids: K-means or SOM
3- Super-Paramagnetic Clustering
For a good introduction on different clustering techniques, read
the article from Gavin Sherlock “Analysis of large-scale gene
expression data” in Current Opinion in Immunology 2000, 12:201205 (pdf)
http://www.isrec.isb-sib.ch/~vpraz/chips/Sherlock.pdf
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Agglomerative Hierarchical Clustering
Distance between joined clusters
4
2
5
3
1
1
3
2
4
Dendrogram
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
5
The dendrogram induces a linear ordering of
the data points
LF-2001.11
Agglomerative Hierarchical Clustering
Before doing a such clustering, one has to define two things:
1- The similarity measure between two genes (or experiments)
Centered correlation
Uncentered correlation
Absolute correlation
Euclidean
2- The distance measure between the new cluster and the others
Single Linkage:
distance between closest pair.
Complete Linkage: distance between farthest pair.
Average Linkage: distance between cluster centers
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Centroid methods - K-means
•Start with random position of K
centroids.
•Assign points to centroids
•Move centroids to center
of assigned points
•Iterate until centroids are stable
Iteration = 0
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Centroid methods - K-means
•Start with random position of K
centroids.
•Assign points to centroids
•Move centroids to center
of assigned points
•Iterate until centroids are stable
Iteration = 1
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Centroid methods - K-means
•Start with random position of K
centroids.
•Assign points to centroids
•Move centroids to center
of assigned points
•Iterate until centroids are stable
Iteration = 3
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Self-organizing Maps
- Choose a number of partitions
- Assign a random reference vector to each
partition.
- Pick a gene randomly and assign it to its
most similar reference vector.
- Adjust that reference vector is so that it is
more similar to the chosen gene.
- Adjust the other reference vectors.
- Repeat thousands of times until partitions
are stable.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
A self-organizing map.
LF-2001.11
Super-Paramagnetic Clustering (SPC)
M.Blatt, S.Weisman and E.Domany (1996) Neural Computation

The idea behind SPC is based on the physical
properties of dilute magnets.

Calculating correlation between magnet orientations
at different temperatures (T).
T=Low
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Super-Paramagnetic Clustering (SPC)
M.Blatt, S.Weisman and E.Domany (1996) Neural Computation

The idea behind SPC is based on the physical
properties of dilute magnets.

Calculating correlation between magnet orientations
at different temperatures (T).
T=High
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Super-Paramagnetic Clustering (SPC)
M.Blatt, S.Weisman and E.Domany (1996) Neural Computation


The algorithm simulates the magnets behavior at a range of
temperatures and calculates their correlation
The temperature (T) controls the resolution
T=Intermediate
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Clustering data
Available clustering tools
•M. Eisen’s programs for clustering and display of results
(Cluster, TreeView)
–Predefined set of normalizations and filtering
–Agglomerative, K-means, 1D SOM
•Matlab
–Agglomerative, public m-files.
•Dedicated software packages (SPC)
•Web sites: e.g. http://ep.ebi.ac.uk/EP/EPCLUST/
•Statistical programs (SPSS, SAS, S-plus)
•And much more…
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Clustering data
The final data representation is then a
big matrix with rows being the genes
and columns representing the different
experiments. To keep the image
coherent with the scan output, the ratio
numbers calculated by Scanalyze are
transformed back in color spots on a
green-red based scale.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Clustering data
Another way to represent
these data is a graph
showing the gene’s
expression variation
during the different
experiments
Expression variation of nine genes along the 19
experiments from Lyer et al. (Fibroblast response to
serum stimulation)
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Web resources : data analysis tools
Expression Profiler
Online clustering and analysis tools (EBI)
GenEx
Database, repository and analysis tools (NCGR)
MAExplorer
MicroArray Explorer for data mining Gene Expression, free download
ArrayDB
Downloadable tools, short online demo
MAXD
Downloadable data warehouse and visualisation for expression data
Jexpress
Java tools for gene expression data analysis, free download
Eisen Lab
Michael Eisen's suite for image quantitation and data analysis
(Scanalyze, Cluster, TreeView). Downloadable.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Web resources : public databases
SMD
The Stanford Microarray Database
Chip DB
Searchable database on gene expression (MIT)
ExpressDB
Public queries of E. coli and yeast data
GEO
Gene expression data repository and online resource (NCBI)
RAD
RNA Abundance Database
Expression
Connection
Saccharomyces Genome Database expression data retrieval
EpoDB
Expression information retrieval for one gene at a time
yMGV
Public queries of yeast data
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Web resources : public databases
AMAD
Downloadable web driven database system
ArrayExpress
Public data deposition and public queries (EBI)
maxdSQL
Downloadable data warehouse and visualization environment
GXD
Mouse expression data storage and integration
GeNet
Distribution and visualization of gene expression data from any organism
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Web resources : public databases
Drosophila microarray project
Drosophila Metamorphosis Time Course Database
Samson Lab
Yeast Transcriptional Profiling Experiments
SageMap
NCBI SAGE data and analysis tools
NCI60 cancer project
Supplement to Ross et al. (Nat Genet., 2000).
Serum-response
Supplement to Lyer et al.(1999) Science 283:83-87
Breast cancer
Supplement to Perou et al. Nature 406:747-752(2000)
Cancer Molecular Pharmacology
Integration of large databases on gene expression and molecular
pharmacology.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Web resources : general information
Leung’s
Links page & software info
Davison’s
DNA Microarray Methodology - Flash Animation
gene-chips
Overview of the technique, papers…
Chips & microassays
General information
SMD guide
Stanford's links page, very complete
Introduction
Online introduction to microarrays (EBI)
Brown Lab Guide
Microarrays protocols and arrayer construction.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11