Download Genomic Applications

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Hutchinson-Guilford Progeria
-premature aging
-lifespan = 13.4 years
-retarded growth
-midface hypoplasia
-micrognathia
-alopecia
-low adiposity
-osteodysplasia
-premature, severe
atherosclerosis
-death due to MI
De Sandre-Ciovannoli, Science express, 17 April 2003
Lamin A mutations in HGS
Exons 11 and 12 code the Lamin A tail (not lamin c)
Red is coiled-coil and blue is globular domains
1824C>T is aa conservative (G608G) but - in 300 con.
1824C>T creates a cryptic donor site at 1819, -50 aa del
Best guess
Most diseases are probably interactions
between polygenic heritable events, and
environmental pressures leading to somatic
epigenetic changes.
Translation: diseases are complicated.
Gene by Environment Interaction
Predisposition
Event
Disease
hydrocarbons
radiation
estrogens
low fiber
colon CA
colon CA
breast CA
atherosclerosis
DNA
FAP
MSH
BRCA
LDLr
Microarrays-the big net.
Ideal disease-hunter: genomic scale protein quantitation and
sequencing.
Imperfect solution A: genomic scale detection of mRNA level.
Problem: little information on protein level
Imperfect solution B: genome-wide SNP/haplotype.
Problem: statistical limits on patient populations
Common compromise: microarray profiling mRNA transcripts
(transcript profiling) to identity target areas. Target genes
are then followed by proteomics and SNPs.
Array flavors
DNA detection (SNP, genotyping, etc.)
• short oligonucleotides to detect mismatches
RNA detection (transcript profiling)
• Plasmid
• Inserts
• Long oligonucleotides (60 mers)
• Short oligonucleotides (20 mers)
Hybridization-basic elements
• Hybridization = Annealing - Melting
• CRUCIAL: non-covalent, hydrogen bonds
-->equilibrium rules, binding is statistical
• Best hybridization occurs with:
• long sequences (no hyb when nt<4)
• high salt concentration (hybrids melt in water)
• low temperatures (hybrids melt with heat)
• G and C (3 H) bind better than A and T (2 H)
• self-complementarity is low (high GC is bad)
Base-pairing (the stuff of life)
A
T
G
C
T
A
G
Lewin. Genes VII page 8.
C
Tm-a good thing.
Tm is a measure of the stability of DS-DNA under a given set of
conditions. Stability, and therefore Tm, is affected by:
Strand length - the longer the strand, the higher the Tm
Base Composition - higher the GC content, the higher the Tm.
Ionic Strength - as the ionic strength increases, so does Tm.
Double helical DNA is stabilised by cations.
Divalent cations (eg Mg2+) are more effective than
monovalent cations (+ or K+).
Organic Solvents - formamide for instance lowers the Tm by
weakening the hydrophobic interactions.
Melting Curves-Tm measured
Tm
Tm
PCR
Primer
design
www.oligo.net
Array Choice Factors
Expression profiling:
Sequence known?
Not known?
Oligo arrays
High confidence
Immediate ID
cDNA arrays
Clone drift/cross hyb
sequence clones
Sample selection
-isolate the purest phenotypic examples of test and control
-laser capture microdissection (LCM)
-always control for treatment and manipulation
-people are the most meaningful, but least controllable
-animals are highly controllable, but less meaningful
-cell systems (in vitro) are controlled, but meaningful?
-small amounts of RNA can be amplified
-while purifying cells is good, the processing is bad.
-The quality of the results are directly proportional to the
samples that are chosen.
Laser Capture Microdissection
The importance of purity
Human colon cancer
Blue are normal cells
Red are tumor cells
Assessing sample quality
Amount > 5 ug total RNA or 500 ng of poly A+
Basic: O.D. 260/280 ratio >2.1,
nucleic acids absorb at 260, protein at 280 nm
thus, increasing impurity reduces ratio
Better: agarose gel electrophoresis, EtBR stained
if total RNA, 28s = 2 x 18s ribosomal (Lab-on-chip)
or
Q-PCR of a low and high gene, against standard
Best: test chip
GeneChip Probe Arrays
®
Hybridized Probe Cell
GeneChip Probe Array
Single stranded,
labeled RNA target
*
*
*
*
*
Oligonucleotide probe
11 µm
1.28cm
Millions of copies of a specific
oligonucleotide probe
>1 million probes
Image of Hybridized Probe Array
George Washington
Genomics Core Facility
Synthesis of Ordered Oligonucleotide Arrays
Light
(deprotection)
Mask
OOOOO
HO HO O O O
T–
TTOOO
Substrate
Light
(deprotection)
Mask
C AT A T
AGCTG
T TCCG
TTCCO
TTOOO
Substrate
C–
REPEAT
GeneChip Expression Array
Design
®
Gene 5´
Sequence
3´
Multiple
oligo probes
Probes designed to be
Perfect Match
Probes designed to be
Mismatch
Procedures for Target Preparation
Cells
Labeled transcript
AAAA
IVT
Poly (A)+
RNA
cDNA
(Biotin-UTP
Biotin-CTP)
Wash & Stain
Hybridize
Scan
L
L
L
L
Fragment
(heat, Mg2+)
L
L
L
L
(16 hours)
Labeled fragments
Streptavidin-Phycoerythrin (SAPE)
Fluorescent stain-laser stimulated
Analysis of expression level from probe sets
A single, contiguous gene set for the rat B-actin gene.
Each pixel is quantitated and integrated for each
oligo feature (range 0-25,000)
Perfect Match (PM)
Mis Match (MM) Control
PM - MM = difference score
All significant difference scores are averaged to
create “average difference” = expression level of
the gene.
Affymetrix Instrument System
®
Platform for GeneChip® Probe Arrays
•
Integrated
• Exportable
• Easy to use
•Versatile
GeneChip analysis of human atherosclerosis
Dissect normal media from atherosclerotic lesion
Prepare highly purified RNA
O.D. 260/280 = 2.0
Reverse transcribe w/poly dT + T7 = cDNA
Transcribe with T7 + biotin dUTP = cRNA
Purify probe/hybridize to chip
Wash and detect with avidin/PE + ab amplification
Read fluorescent label
And deconvolve genes
Basic Bioinformatics-Scatterplot
E145 P22-N
(raw)
10000
1000
100
10
1
0.1
0.01
0.01
Sample E145 P4-N (raw)
0.1
1
10
100
1000
10000
Transcript profiling of aged rat aorta.
Affymetrix GeneChip analysis of 10 aortas @ 20 mo. vs. 3 mo.
mRNAs Decreased in the Aged Aorta
Experiment 1
Experiment 2
Descriptions
Signal Change Signal
Change
12 884
94 40
34 2
75 40
17 81
30 90
17 18
68 97
35 52
90 63
13 72
86 04
38 45
17 08
65 93
31 39
*
-3.6
-5.7
-3.3
-2.8
-3.4
-8.2
-3.1
-2.3
-2.9
-2.1
-6.5
-2.2
-4.5
-11 .1
-2.1
-2.1
*
14 901
25 730
33 0
59 89
20 53
22 91
38 36
45 14
32 84
65 80
62 82
10 106
13 019
11 044
12 816
73 95
*
-4.9
-3.5
-3.9
-3.3
-3.5
-3.1
-3.0
-2.7
-2.2
-2.0
-1.9
-2.2
-1.7
-1.6
-1.5
-1.5
*
Egr-1 (3 p robe sets )
colla gen alph a1 typ e I (3 pro be s ets)
fl avin-conta inin g mo nooxygen ase 1 (FMO-1)
cycloo xyge nase iso form COX-2
le uci ne zip per p rotei n mRNA
he at s hock p rotei n 70 (3 p robe sets )
DNA po lymeras e al pha
ph osph oeno lpyruvate ca rboxykin ase (GTP)
retinol -bind ing prote in (RBP)
C4 comp leme nt p rotei n
DnaJ-like p rotei n (RDJ1)
pl asmi noge n activator in hibi tor-1 (PAI-1)
RCO4 -1 ge ne for cytoch rome c o xi dase sub unit IV
li popro tein lip ase
RTK40 hom olog
rib onucleo prote in F
AND 1 8 ESTs
FAQs: How many replicates?
Number of Genes Greater Than 2 Fold
Number of Genes Called Differentially Expressed as a Function of
Number of Replicates
4500
4000
3500
3000
2500
2000
1500
1000
500
0
1
2
3
4
5
Number of Replicates
6
7
Simple fold changes
• Crude, insensitive--but effective
Criteria:
Present
1.5-fold
up/down
Hierachical clustering
Statistical testing and ontology
Gene Abbrev. Fold Lists Description
Gene Abbrev. Fold Lists Description
Apoptosis
BAD
BCL2L1
CCND1
MDM2
PRSS25
TNFRSF6
VDAC2
Growth factors/regulators
FGF5
2.3 **
HDGF
-1.2 ***
IGFBP3
-1.6 **
IGFBP4
-1.5 *
LRP1
-1.7 **
LTBP2
-1.3 ***
SMURF2
1.8 ***
VEGFB
-1.5 ***
fibroblast growth factor 5
hepatoma-derived growth factor (high-mobility group protein 1-like)
insulin-like growth factor binding protein 3 (2 sets)
insulin-like growth factor binding protein 4
LRP1, TGF-§ Type V receptor
latent transforming growth factor beta binding protein 2
SMAD-specific ubiquitin ligase
vascular endothelial growth factor B
Signalling
FKBP9
JAK1
MAP3K12
MAP3K4
PPIH
STAT1
STAT3
STAT6
FK506 binding protein 9, 63 kDa
Janus associated kinase 1
mitogen-activated protein kinase kinase kinase 12
mitogen-activated protein kinase kinase kinase 4
peptidyl prolyl isomerase H (cyclophilin H)
signal transducer and transactivator 1
signal transducer and transactivator 3
signal transducer and transactivator 6
Cell Cycle
CCND1
CCNI
CDK11
CUL1
JUN
MDM2
PDGFRB
1.4
6.6
1.9
2.2
*
*
***
*
?
?
1.2 ***
1.8 ***
1.9
-1.6
-1.6
-1.3
1.4
2.2
-2.1
***
***
***
**
***
*
**
Chromatin remodeling
CBFA2T1
-1.6 *
CHD3
-1.5 ***
HDAC4
-1.5 *
HIST1H2BN
1.8 **
HIST1H2AL
1.7 **
MYST1
-1.5 ***
POLB
1.8 ***
BCL2-antagonist of cell death
BCL2-like 1 (BCL-XL)
cyclin D1, PRAD1
Mdm2, p53 binding protein
serine protease 25-Omi/HtrA2
TNF receptor superfamily, 6, fas, CD95
voltage-dependent anion channel 2
cyclin D1, PRAD1 (3 sets)
cyclin I
cyclin-dependent kinase (CDC2-like) 11
cullin 1-cyclin D1 degrading
v-Jun homolog
Mdm2, p53 binding protein
platelet-derived growth factor receptor, beta
core-binding factor, cyclin D-related
chromodomain helicase DNA binding protein 3
histone deacetylase 4
histone 1, both H2bn and H2bd
histone 1, H2al
MYST histone acetyltransferase 1
polymerase (DNA directed), beta
Cholesterol/Fatty acid/Membranes
ATP8B1
2.3 ***
Potential phospholipid-transporting ATPase
FADS1
-1.4 **
fatty acid desaturase 1
LRP1
-1.7 **
low density lipoprotein-related protein 1
PLTP
-1.4 ***
phospholipid transfer protein
SRD5A1
1.9 **
steroid-5-alpha-reductase, alpha 1
Extracellular Matrix
COL1A2
-1.3
COL6A1
-1.6
FBN1
-1.3
FN1
-1.3
LAMB2
-1.4
LAMA2
-1.6
RECK
?
TIMP1
-1.5
***
***
***
**
***
***
***
**
collagen, type I, alpha 2 (2 sets)
collagen, type VI, alpha 1
fibrillin 1 (Marfan syndrome)
fibronectin 1 (2 sets)
laminin, beta 2 (laminin S)
laminin, alpha 2 (merosin)
reversion-inducing cys-rich w/Kazal (MMP9 regulator)
tissue inhibitor of metalloproteinase 1 (2 sets)
-2.4
-1.2
-1.7
-1.4
3.4
1.4
-1.4
-1.3
*
*
***
**
*
**
*
***
Mitochondrial/Metabolic
AHCYL1
-1.5 ***
ATP5J
1.2 ***
ETFA
1.3 ***
HCCS
1.4 ***
TOMM34
1.4 ***
S-adenosylhomocysteine hydrolase-like 1 (3 sets)
ATP synthase, H+ transporting, mitochondrial F0 complex, subunit F6
electron-transfer-flavoprotein, alpha polypeptide (glutaric aciduria II)
holocytochrome c synthase (cytochrome c heme-lyase)
translocase of outer mitochondrial membrane 34
Stress/oxidant/antioxidant
DNAJA2
1.3 **
DNAJB4
2.7 ***
PSMF1
1.4 *
PSMB6
1.4 *
PTMA
-1.5 **
SOD3
-1.6 **
DnaJ (Hsp40) homolog, subfamily A, member 2
DnaJ (Hsp40) homolog B4 (2 sets), HLJ1
proteasome (prosome, macropain) inhibitor subunit 1 (PI31)
proteasome (prosome, macropain) subunit, beta type, 6
prothymosin, alpha (gene sequence 28)
superoxide dismutase 3, extracellular
Transcription factors
BLZF1
1.8 ***
CEBPD
-1.4 **
JUN
1.4 ***
MSC
-1.5 ***
ZNF24
-1.3 ***
ZNF42
1.4 *
ZNF337
-1.3 *
basic leucine zipper nuclear factor 1 (JEM-1)
CCAAT/enhancer binding protein (C/EBP), delta
v-Jun homolog
musculin (lamin C homolog, repressor)
zinc finger protein 24 (KOX 17)
zinc finger protein 42 (myeloid-specific retinoic acid- responsive)
zinc finger protein 337
Pathways of genetic information
Expression of Egr-1 mRNA in human lesions.
Patient #
E213
Minutes
5
Tissue L M
65
L M
E217
5
L
M
65
L M
H 20
Egr-1
RhoA
Egr-1 mRNA and protein in lesions vs normal cells.
Egr-1 mRNA
A)
20
Media
Lesion
15
10
B)
Egr-1
x
5
Actin
0
E197
E196
Western blot
E197E221E240E243
MLML MLML
Expression screening by GeneChip
• each oligo sequence (20 mer) is synthesized
as a 11 µ square (feature)
• each feature contains > 1 million copies of the oligo
• scanner resolution is about 2 µ (pixel)
• each gene is quantitated by 11 oligos and
compared to equal # of mismatched controls
• 44,000 genes are evaluated with 11 matching oligos
and 11 mismatched oligos = 4 x 106 features/chip
• features are photolithographically synthesized
onto a 2 x 2 cm glass substrate
®
GeneChip Array Advantages – Specificity
Oligo arrays
cDNA arrays
Gene “on”
~ 150 µm
24 µm
Gene “off”
Detection Pattern
Single Spot
Limitations to all microarrays.
- dynamic range of gene expression:
very difficult to simultaneously detect low and high
abundance genes accurately
- each gene has multiple splice variants
2 splice variants may have opposite effects (i.e. trk)
arrays can be designed for splicing, but complexity ^ 5X
- translational efficiency is a regulated process:
mRNA level does not correlate with protein level
- proteins are modified post-translationally
glycosylation, phosphorylation, etc.
- pathogens might have little ‘genomic’ effect
CardioChip
in silico workup
Lipoprotein genes/variants
Atherosclerosis markers
Restenosis markers
Coagulation factors
Stress markers
Inflammatory markers
Infectious agents
Heart failure predictors
Related documents