Download file1 - Cornell Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Behavioural genetics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

RNA silencing wikipedia , lookup

Public health genomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Medical genetics wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Introduction to Microarray Analysis and Technology
Dave Lin - November 5, 2001
Overview
—Why Biologists care about Genomics
—Why statisticians/computer scientists
—may care about genomics
•Preprocessing issues
•Sources of variability in constructing
microarrays
•Postprocessing issues
•Analysis of data
What makes one
cell different
from another?
liver vs. brain
Cancerous vs.
non-cancerous
Treatment vs.
control
Old Days
100,000 genes in mammalian genome
each cell expresses 15,000 of these genes
each gene is expressed at a different level
estimated total of 100,000 copies of mRNA/cell
1-5 copies/cell - “rare” -~30% of all genes
10-200 copies/cell - “moderate”
200 copies/cell and up - “abundant”
Cells can be defined by:
Complement of Genes (which genes are expressed)
How much of each gene is expressed (quantity)
What makes one cell different from another?
Try and find genes that are differentially expressed
Study the function of these genes
Find which genes interact with your favorite gene
Extremely time-consuming.
Huge amounts of effort expended to find
individual genes that may differ between two
conditions
Genomics. Almost useless term-defines many
different concepts and applications.
Microarrays
-massively parallel analysis of gene
expression
-screen an entire genome at once
-find not only individual genes that differ,
but groups of genes that differ.
-find relative expression level differences
-how quantitative can they be?
MicroarraysBased on old technique
many flavors- majority are of two essential varieties
cDNA Arrays
printing on glass slides
miniaturization, throughput
fluorescence based detection
Affymetrix Arrays
in situ synthesis of oligonucleotides
will not consider Affymetrix arrays further.
THE PROCESS
Building the Chip:
MASSIVE PC R
PC R PURIFIC ATIO N
and PREPARATION
PREPARING SLI DES
PRINTING
Preparing RNA:
Hybing the Chip:
C ELL C ULTURE
AND HARVEST
PO ST PRO CESSING
ARRAY HYBRIDIZATIO N
RNA ISO LATIO N
DATA ANALYSIS
cDNA PRO DUC TIO N
PROBE LABELING
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Building the Chip:
MASSIVE PCR
Full yeast genome
= 6,500 reactions
PREPARING SLI DES
Polylysine coating for adhering
PCR products to glass slides
PCR PURIFICATION
and PREPARATION
IPA precipitation + EtOH
washes + 384-well format
PRINTING
The arrayer: high precision spotting device
capable of printing 10,000 products in 14 hrs,
with a plate change every 25 mins
POST PROCESSING
Chemically converting the positive
polylysine surface to prevent nonspecific hybridization
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Fabrication of “Spotted Arrays”
20,000 PCR
reactions
20,000
Precipitations
Arrayed Library
Normalized/Subtracted
Spot on Glass Slides
Consolidate for
printing
20,000
resuspensions
Printing Approaches
Non - Contact
• Piezoelectric dispenser
• Syringe-solenoid ink-jet dispenser
Contact (using rigid pin tools, similar to filter
array)
• Tweezer
• Split pin
• Micro spotting pin
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Micro Spotting
pin
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Microarray Gridder
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Practical Problems
— Surface chemistry: uneven surface may lead to high background.
— Dipping the pin into large volume -> pre-printing to drain off
excess sample.
— Spot variation can be due to mechanical difference between pins.
Pins could be clogged during the printing process.
— Spot size and density depends on surface and solution
properties.
— Pins need good washing between samples to prevent sample
carryover.
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Hybing the Chip:
ARRAY HYBRIDIZATION
Cy3 and Cy5 RNA samples are simultaneously
hybridized to chip. Hybs are performed for 5-12 hours
and then chips are washed.
DATA ANALYSIS
PROBE LABELING
Two RNA samples are labelled with Cy3 or
Cy5 monofunctional dyes via a chemical
coupling to AA-dUTP. Samples are purified
using a PCR cleanup kit.
Ratio measurements are determined via
quantification of 532 nm and 635 nm
emission values. Data are uploaded to the
appropriate database where statistical and
other analyses can then be performed.
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Labeling of RNAs with Cy3 or Cy5
Two general methods
-Dye conjugated nucleotide
-Amino-allyl indirect labeling
Direct labeling of RNA
AAAAAAA
TTTTTTTT
CCAACCTATGG
cDNA synthesis
T
+
Cy5-dUTP
or
T
GGTTGGATACC
RNA
cDNA
Cy3-dUTP
Indirect labeling of RNA
AAAAAAA
TTTTTTTT
CCAACCTATGG
T
Modified nucleotide
cDNA synthesis
GGTTGGATACC
GGTTGGATACC
Cy3
addition
Dye effect issues
Direct method
Unequal incorporation of Cy5 vs. Cy3
Very poor overall incorporation of direct-conjugated
nucleotide = more starting RNA for labeling.
Indirect method
Presumably less bias in initial incorporation of
activated nucleotide, but not clear if more or less
dye is added
Both Methods
Cy3 fluoresces more brightly than Cy5
labeling is very highly sequence dependent
Micrograph of a portion of hybridization probe from
a yeast mciroarray (after hybridization).
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Layout of the cDNA Microarrays
— Sequence verified, normalized mouse cDNAs
— 19,200 spots in two print groups of 9,600 each
– 4 x 4 grid, each with 25 x24 spots
– Controls on the first 2 rows of each grid.
pg1
pg2
Practical Problems 1
• Comet Tails
• Likely caused by
insufficiently rapid
immersion of the
slides in the
succinic anhydride
blocking solution.
Department of Statistics, University of California, Berkeley , and
Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 2
Department of Statistics, University of California, Berkeley , and
Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 3
High Background
• 2 likely causes:
– Insufficient blocking.
– Precipitation of the
labeled probe.
Weak Signals
Department of Statistics, University of California, Berkeley , and
Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 4
Spot overlap:
Likely cause: too
much rehydration
during post processing.
Department of Statistics, University of California, Berkeley , and
Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 5
Dust
Department of Statistics, University of California, Berkeley , and
Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Pin-specific
printing
differences
Normalization - lowess
•
•
Global lowess
Assumption: changes roughly symmetric at all intensities.
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Normalisation - print-tip-group
Assumption: For every print group, changes roughly symmetric
at all intensities.
Department of Statistics, University of California, Berkeley, and
Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Pre-processing Issues
-Definition of what a real signal is
what is a spot, and how to determine what should
be included in the analysis?
-How to determine background
local (surrounding spot) vs. global (across slide)
-How to correct for dye effect
-How to correct for spatial effect
e.g. print-tip, others
-How to correct for differences between slides
e.g. scale normalization
Experimental Design Issues
What is the best means of performing the experiment
To obtain the desired answer?
Biologists’ assumptions and statisticians’ differ.
Biologist viewpoint
make everything exactly the same so
that differences will stand out
Statistician viewpoint
make everything as random as possible
so that real trends will stand out
Most biologists will ask- what are the differences between
two samples?
-implicit questions associated with microarraysWhat is the best way to determine this?
e.g. Design; replicates; conditions.
How do I obtain the most reliable results?
e.g. measurements, normalization
How do I determine what a significant difference is?
Do I care about “subtle” changes, or just
the extremes?
How is information best extracted?
Is correlation useful? What type of clustering?
How is information combined?
How do you model the interactions of 1000s of genes
Design: Two Ways to Do the Comparisons
Advantages of Our Design
—Lower variability
—Increased precision
—Increase in measurement of
expression -> increased
precision