Download Lei-1-30

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DNA, Gene, and Genome
Translating Machinery for Genetic Information
Transcription factors
mRNA levels
Automated DNA Sequencing
Data Increase (from NCBI web site)
Partial Display of Human Draft Sequence (Nature, 2001)
Human Genome Map at NCBI
60-70 KDa Protein interacting with prostate cancer suppressor
MGALRPTLLPPSLPLLLLLMLGMGCWAREVLVPEGPLYRVAGTAVSISCNVTGY
EGPAQQNFEWFLYRPEAPDTALGIVSTKDTQFSYAVFKSRVVAGEVQVQRLQGD
AVVLKIARLQAQDQGIYECTPSTDTRYLGSYSGKVELRVLPDVLQVSAAPPGPR
GRQAPTSPPRMTVHEGQELALGCLARTSTQKHTHLAVSFGRSVPEAPVGRSTLQ
EVVGIRSDLAVEAGAPYAERLAAGELRLGKEGTDRYRMVVGGAQAGDAGTYH
CTAAEWIQDPDGSWAQIAEKRAVLAHVDVQTLSSQLAVTVGPGERRIGPGEPLE
LLCNVSGALPPAGRHAAYSVGWEMAPAGAPGPGRLVAQLDTEGVGSLGPGYE
GRHIAMEKVASRTYRLRLEAARPGDAGTYRCLAKAYVRGSGTRLREAASARSR
PLPVHVREEGVVLEAVAWLAGGTVYRGETASLLCNISVRGGPPGLRLAASWWV
ERPEDGELSSVPAQLVGGVGQDGVAELGVRPGGGPVSVELVGPRSHRLRLHSL
GPEDEGVYHCAPSAWVQHADYSWYQAGSARSGPVTVYPYMHALDTLFVPLL
VGTGVALVTGATVLGTITCCFMKRLRKR
Molecular biology databases
• Sequence databases
– Annotated
– Low-annotation
– Specialized
• Structural databases
• Motif databases
• Genome databases
•
•
•
•
•
•
•
•
Proteome databases
RNA expression
Literature
Populations
Mutations
Polymorphisms
Organisms
Pathways
Mutations/polymorphisms
Promoters
ESTs
Tissues and cells
DNA motifs
RNA
expression
DNA sequences
Molecular Phylogeny
Substrates
Transcription
Factors
Metabolic
pathways
Genome maps
Protein sequences
Protein
structures
Gene Family
Protein
motifs
Databases formats
• Relational databases
– GDB, GSDB, MGD etc.
– Vender: Sybase, Oracle etc.
• Flat file databases
– GenBank, SWISS-PROT etc.
• Object-oriented databases
– ACeDB, AtDB etc.
Molecular biology data types
Organisms
Mouse chromosome X
from the Mouse Genome Informatics project
http://www.informatics.jax.org/
Genome maps
Molecular biology data types
Organisms
Genome maps
DNA sequences
RNA sequences
...AATGGTACCGATGACCTGGAGCTTGGTTCGA...
Molecular biology data types
Organisms
Genome maps
DNA sequences
RNA sequences
Protein sequences
...TRLRPLLALLALWPPPPARAFVNQHLCGSHLVEA...
Molecular biology data types
Organisms
Genome maps
DNA sequences
RNA sequences
Protein sequences
Protein
structures
PDB entry 1CIS
P.Osmark, P.Sorensen, F.M.Poulsen
RNA
structures
Molecular biology data types
Organisms
Genome maps
DNA motifs
RNA
expression
DNA sequences
RNA sequences
Protein sequences
Protein
structures
Protein
motifs
RNA
structures
DNA microarrays measure
variations in RNA levels
The full Yeast
genome on a chip
Red dots:
genes whose RNA level increased
Green dots:
genes whose RNA level decreased
De Risi et al, Science 278:680
http://cmgm.Stanford.EDU/pbrown/
Substrates for High Throughput Arrays
Nylon Membrane
Single label P33
GeneChip
Single label biotin
streptavidin
Glass Slides
Dual label
Cy3, Cy5
GeneChip Probe Arrays
®
Hybridized Probe Cell
GeneChip Probe Array
Single stranded,
labeled RNA target
*
*
*
*
*
Oligonucleotide probe
24µm
1.28cm
Millions of copies of a specific
oligonucleotide probe
>200,000 different
complementary probes
Image of Hybridized Probe Array
®
GeneChip Expression Array Design
Gene 5´
Sequence
3´
Multiple
oligo probes
Probes designed to be
Perfect Match
Probes designed to be
Mismatch
Procedures for Target Preparation
Cells
Labeled transcript
AAAA
IVT
Poly (A)+/
Total
RNA
cDNA
(Biotin-UTP
Biotin-CTP)
Hybridize
(16 hours)
L
L
L
Fragment
(heat, Mg2+)
L
Wash & Stain
Scan
L
L
L
L
Labeled fragments
Microarray Technology
Printing Arrays on 50 slides
NSF Soybean Functional Genomics
Steve Clough / Vodkin Lab
Ratio of expression of genes from two sources
Cells from condition A
Total
or
mRNA
Cells from condition B
Label Dye 1
Label Dye 2
cDNA
Mix
NSF / U of Illinois
Microarray Workshop
-Steve Clough / Vodkin Lab
equal
over
under
GSI Lumonics
NSF Soybean Functional Genomics
Steve Clough / Vodkin Lab
Cattle and Soy Controls
Beta Actin
PKG
HPRT
Beta 2 microglobulin
Rubisco
AB binding protein
Major latex protein
homologue (MSG)
Array of cattle and soy spiking controls. 50 ug of cattle brain total RNA was labeled with Cy3 (green).
1 ul each of in vitro transcribed soy Rubisco (5 ng), AB binding protein (0.5 ng) and MSG (0.05 ng)
were labeled with Cy5. The two labeled samples were cohybridized on superamine slides (Telechem,
Inc.). To the right of each set of spots are five negative controls (water).
Fetal Spleen-Cy3
Adult Spleen-Cy5
IgM
IgM
MYLK
MYLK
IgM heavy chain
COL1A2
IgM heavy chain
COL1A2
GenePix Image Analysis Software
Placenta vs. Brain – 3800 Cattle Placenta Array
cy3
cy5
GeneFilter Comparison Report
GeneFilter 1 Name:
GeneFilter 1 Name:
O2#1 8-20-99adjfinal
N2#1finaladj
INTENSITIES
RAW
NORMALIZED
ORF NAME
GENE NAME
CHRM
F
G
GF1
R
GF2
GF1
GF2
DIFFERENCE
RATIO
YAL001C
TFC3
1
1
A
12.03
1 7.38
2 403.83
209.79
194.04
1.92
YBL080C
PET112
2
1
A
53.21
1 35.62
3 "1,786.11"
"1,013.13"
772.98
1.76
YBR154C
RPB5
2
1
A
79.26
1 78.51
4 "2,660.73"
"2,232.86"
427.87
1.19
YCL044C 3
1
A
53.22
1 44.66
5 "1,786.53"
"1,270.12"
516.41
1.41
YDL020C
SON1
4
1
A
23.80
1 20.34
6 799.06
578.42
220.64
1.38
YDL211C 4
1
A
17.31
1 35.34
7 581.00
"1,005.18"
-424.18
-1.73
YDR155C
CPH1
4
1
A
349.78
1
8 401.84
"11,741.98"
"11,428.10"
313.88
1.03
YDR346C 4
1
A
64.97
1 65.88
9 "2,180.87"
"1,873.67"
307.21
1.16
YAL010C
MDM10
1
1
A
13.73
2 9.61
2 461.03
273.36
187.67
1.69
YBL088C
TEL1
2
1
A
8.50
2 7.74
3 285.38
220.01
65.37
1.30
YBR162C 2
1
A
226.84
2
4 293.83
"7,614.82"
"8,356.39"
-741.57
-1.10
YCL052C
PBN1
3
1
A
41.28
2 34.79
5 "1,385.79"
989.41
396.38
1.40
YDL028C
MPS1
4
1
A
7.95
2 6.24
6 266.99
177.34
89.65
1.51
YDL219W 4
1
A
16.08
2 11.33
7 539.93
322.20
217.74
1.68
YDR163W 4
1
A
19.13
2 14.19
8 642.17
403.56
238.61
1.59
YDR354W
TRP4
4
1
A
62.24
2 40.74
9 "2,089.48"
"1,158.64"
930.84
1.80
YAL018C 1
1
A
10.72
3 8.81
2 359.75
250.60
109.15
1.44
YBL096C 2
1
A
10.91
3 8.98
3 366.40
255.40
111.00
1.43
YBR169C
SSE2
2
1
A
17.33
3 27.81
4 581.80
790.84
-209.05
-1.36
YCL060C 3
1
A
17.99
3 24.75
5 603.96
703.75
-99.79
-1.17
YDL036C 4
1
A
14.22
3 8.86
6 477.39
251.94
225.44
1.89
YDL227C
HO 4
1
A
25.61
3 31.52
7 859.71
896.46
-36.75
-1.04
YDR171W
HSP42
4
1
A
102.08
3
8 98.37
"3,426.83"
"2,797.58"
629.25
1.22
YDR362C 4
1
A
16.32
3 12.95
9 547.96
368.39
179.57
1.49
YAL026C
DRS2
1
1
A
11.32
4 7.97
2 379.85
226.53
153.33
1.68
YBL102W
SFT2
2
1
A
55.88
4 63.74
3 "1,875.82"
"1,812.81"
63.02
1.03
YBR177C 2
1
A
63.31
4 29.03
4 "2,125.20"
825.60
"1,299.60"
2.57
YCL068C 3
1
A
8.33
4 4.47
5 279.51
127.16
152.35
2.20
YDL044C
MTF2
4
1
A
11.73
4 6.96
6 393.88
198.07
195.81
1.99
YDL235C
YPD1
4
1
A
38.71
4 30.20
7 "1,299.33"
858.83
440.50
1.51
YDR179C 4
1
A
12.77
4 11.05
8 428.60
314.12
114.48
1.36
YDR370C 4
1
A
16.70
4 15.30
9 560.62
435.13
125.49
1.29
YAL034C
FUN19
1
1
A
20.89
5 24.21
2 701.32
688.59
12.73
1.02
YBL111C 2
1
A
22.38
5 13.67
3 751.39
388.69
362.70
1.93
Microarray Data Process
1. Experimental Design
2. Image Analysis – raw data
3. Normalization – “clean” data
4. Data Filtering – informative data
5. Model building
6. Data Mining (clustering, pattern recognition, et al)
7. Validation
Fetal
Scatterplot of Normalized Data
Adult
<-0.3
>0.3
Complexity Levels of Microarray Experiments:
1. Compare genes in a control situation versus a treatment situation
• Example: Is the level of expression (up-regulated or down-regulated)
significantly different in the two situations? (drug design application)
• Methods: t-test, Bayesian approach
2. Find multiple genes that share common functionalities
• Example: Find related genes that are dependent?
• Methods: Clustering (hierarchical, k-means, self-organizing maps,
neural network, support vector machines)
3. Infer the underlying gene and protein networks that are responsible for the
patterns and functional pathways observed
• Example: What is the gene regulation at system level?
• Directions: mining regulatory regions, modeling regulatory networks
on a global scale
Comparing data from two experiments.
Clustering to extract genes which tightly co-express.
Statistical filters used: The genes
present (Presence Call in
Affymetrix) in drug treated, ANOVA
p<0.02 between groups.
Red indicates increased
expression, and green is
decreased expression (Log(fold
change)).
Genesight 3 (Biodiscovery
Software, www.biodiscovery.com)
NO DRUG 1nM Drug
1 mM Drug
Statistical filters used: The
genes present (Presence Call
in Affymetrix) in absence of
drug, ANOVA p<0.02 between
groups.
NO DRUG 1nM Drug
1 mM Drug
Self Organizing Maps
Molecular Classification of Cancer
Gene Expression Profile of
Aging and Its Retardation by
Caloric Restriction
Cheol-Koo Lee, Roger G. Klopp,
Richard Weindruch, Tomas A. Prolla
Data Mining Methods
Classification, Regression (Predictive Modeling)
Clustering (Segmentation)
Association Discovery (Summarization)
Change and deviation detection
Dependency Modeling
Information Visualization
Related documents