Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Gene Search by use of MySQL
• Background – myself
• NsGene – DTU satellite
• Parkinson Disease (Affymetrix GeneChip)
• Analysis of fetal brain tissue
• Gene Discovery
• Use of MySQL
• Search for a new Growth Factor family
Background
• Thomas Nordahl Petersen
•Chemist, Ph.D protein Crystallography, University of
Copenhagen
•Computational Scientist, SBI-AT
•Prediction of protein structure, secondary structure,
fold recognition, homology modelling
•Bioinformatics - Gene discovery, NsGene
Devolop novel cell and gene based products for the
treatment of neurological diseases.
ECT Products
ECT for Parkinson’s Disease
• Growth of cells in a capsule
matrix
• The therapeutic protein be
released directly in the
relevant brain area
• Safe delivery across the
blood-brain-barrier
• Michael J. Fox foundation granted US $3 million
to support a clinical “proof-of-concept” (May 2004)
Factor Products
• Identification of novel genes by
use of bioinformatics
• Neublastin (GDNF family –
potent neuroprotective effects)
• Scanning the human genome or assembled protein sets
for different features of interest
Two case studies
• Parkinson related gene(s)
•Affymetrix GeneChip experiments
• Novel factor products
• Scanning an assembled protein set
• A new growth factor family
Parkinson Disease
Degenerative central nervous system (CNS) disorder
Parkinson Disease
Loss of dopamine producing brain cells
Parkinson’s Disease
• Dopamine from Substantia
nigra activates neurons in
Striatum/Basal ganglia
• Important for initiation of
movement
Cure for Parkinson’s Disease ?
Parkinson disease may be cured provided that new dopamine
producing cells replace the dead ones.
Dopamin producing brain cells from aborted foetuses have been
operated into the brain of parkinson patients and ín some cases
cured the disease. Brain tissue from approx 6 foetuses were needed.
Major ethical problems !
Search for a protein drug is the only valid option
Parkinson Disease
Dopamine producing cells
• Dopaminergic neurons can be found in the ventral
part of the mesencephalon (VM) from approximately 6
weeks
• No dopaminergic neurons can be found in the
neighbouring dorsal part (DM).
• Dopaminergic differentiation by use of GeneChips to
compare the expression profiles of VM and DM
Fetal brain tissue
Midbrain mesencephalon
+ Dopamine producing
Vm
Dm
cells
- Dopamine producing
cells
• Aborted feotus brain tissue – Karolinska hospital
• Feotus of age 6-10 weeks, 2 cases
Midbrain mesencephalon
Dm
Vm
- Dopamine
+ Dopamine
producing
cells
producing
cells
Dopamine producing cells at the interface ?
Isolate the two samples (Vm/Dm)
RNA purification + amplification
Affymetrix genechip analysis
GenePublisher
(program by Steen Knudsen)
•Scale, normalize the Affymetrix GeneChip experiments
A1
A2
A2
B1
B2
B2
P-value
319
315
314
44
48
38
1.26e-07
314
334
327
443
434
444
6.55e-05
1980
1974
1973
1801
1785
1763
6.77e-05
123
123
126
87
88
93
8.01e-05
103
101
104
77
78
73
0.000112
107
107
111
79
77
82
0.000124
128
123
117
189
184
196
0.000142
179
179
186
145
147
149
0.000191
78
77
79
86
87
87
0.000202
96
90
93
136
129
138
0.000215
Vulcano plot
P-value
Log2 Fold change
Assigning Affymetrix GeneChip probes
to a protein sequence
~20.000 probes on each of the A/B Affymetrix chips. The
probes are normally not a part of a protein sequence.
5’
Unigene sequence (cDNA)
Blast
IPI protein sequence
inferred
3’
Blast
Affymetrix probe
Internal database
Signal Peptide prediction
Conclusion – so far
• The most up-regulated genes include several ‘known’
genes like dopamine transporter (good positive control)
•The most interesting genes are the ‘unknowns’ that
were up-regulated in Vm. Futher analysis is ongoing.
A new growth factor family
• Criteria
• ‘Unknown’ family of protein sequences
• Growth factor like (Cys-Cys, SigP)
• Data source
• Assembled protein set/genomic data
• Search criteria are dynamic
• Use of MySQL
MySQL – a relational database language
• Data are stored in tables as a ’black box’
• Data physically separated from user
• Language is easy to read and understand
• Complex search queries
• Combine data in different tables/databases
• Result can be obtained in seconds
• Search criteria can be changed
Parsing Blast files
(Preparing data for MySQL)
# Qname
Dname
IPI00000001.1
STAU_HUMAN
IPI00000005.1
Mlen
Alen Qlen % a_id % q_id
e-value
Qfrom
577
577
577
100.0
100.0
0.0
1
RASN_HUMAN
189
189
189
100.0
100.0
e-106
IPI00000006.1
RASH_HUMAN
189
189
189
100.0
100.0
IPI00000009.1
RASK_HUMAN
189
189
189
100.0
IPI00000010.1
RASL_HUMAN
188
188
188
IPI00000012.3
ZNT1_MOUSE
86
261
IPI00000013.1
CSL2_HUMAN
334
IPI00000015.2
SFR4_HUMAN
IPI00000016.1
LMA3_MOUSE
Qto
Dlen
Dfrom
Dto
577
577
1
577
1
189
189
1
189
e-106
1
189
189
1
189
100.0
e-106
1
189
189
1
189
100.0
100.0
e-105
1
188
188
1
188
240
33.0
35.8
1e-32
1
230
503
248
500
334
334
100.0
100.0
0.0
1
334
334
1
334
494
494
494
100.0
100.0
0.0
1
494
494
1
494
114
145
145
78.6
78.6
9e-62
1
145
3333
1521
1665
Storing data from blast alignments
Field
Type
query_db
enum('hs_2_18','hs_2_23','affym','mm_1_11','affym_mouse')
query_acc
varchar(20)
target_db
enum('swissp','mm_1_11','sid','sid_mouse’)
target_acc
varchar(20)
align_len
smallint(6)
match_len
smallint(6)
query_len
smallint(6)
perc_align_len
float(5,1)
perc_query_len
float(5,1)
minus_ln_e
float(6,2)
query_from
smallint(6)
query_to
smallint(6)
target_fromsmallint(6)
target_to
smallint(6)
target_len
int(11)
MySQl example
SELECT
a.query_db, a.query_acc,
a.target_db, a.target_acc,
a.perc_align_len, a.minus_ln_e,
b.target_db, b.target_acc,
c.cleavage_site
FROM
blastdb AS a, blastdb AS b, signalp AS c
WHERE
a.query_db='hs_2_23' AND a.target_db = 'mm_1_11' AND
a.target_acc != 'NULL' AND b.target_db='swissp' AND
a.query_acc=b.query_acc AND b.target_acc='NULL' AND
c.query_db='hs_2_23' AND c.query_acc = a.query_acc AND
c.cleavage_site >= 15 AND c.cleavage_site<=45;
Output from MySQL
query_db
query_acc
target_db
target_acc
perc_align_lenminus_ln_e
target_db
target_acc
cleavage_site
hs_2_23
IPI00000111 mm_1_11
IPI00223686
48.6
999.00
swissp
NULL
35
hs_2_23
IPI00000183 mm_1_11
IPI00108107
74.0
999.00
swissp
NULL
26
hs_2_23
IPI00000381 mm_1_11
IPI00128682
78.5
206.13
swissp
NULL
21
hs_2_23
IPI00001001 mm_1_11
IPI00221700
91.7
173.39
swissp
NULL
45
hs_2_23
IPI00001443 mm_1_11
IPI00221913
60.0
17.73
swissp
NULL
30
hs_2_23
IPI00001578 mm_1_11
IPI00122466
88.8
207.93
swissp
NULL
38
hs_2_23
IPI00001719 mm_1_11
IPI00120961
83.1
52.27
swissp
NULL
44
hs_2_23
IPI00001952 mm_1_11
IPI00225921
76.0
999.00
swissp
NULL
44
hs_2_23
IPI00002173 mm_1_11
IPI00112960
85.4
999.00
swissp
NULL
42
47306
Clustering of protein sequences
Tribe-mcl
Store in MySQL
1) Cluster size
47306 sequences
13130 clusters
230
ACPGICSKSCCPF
LTPALCSRTCCPY
16
2 (3)
2) Cys-Cys
Conserved Cys-Cys
• Many growth factor families
have their own specific Cyspattern,TGF-b family.
•Transforming growth factor- is a
multifunctional peptide that
controls proliferation,
differentiation and other functions
in many cell types.
• Search for Cys-pattern without
any a priori knowledge
Search criteria
• Family cluster size > 1
• No SwissProt homologues
• Cys count > 4
• Signal Peptide
• Mouse homologue/orthologue
• 48 Families
• Manual inspection of alignments (- isoforms)
• Upload remaining sequences to internal database
Internal database
Outcome from Gene Search
• Family including 5 sequences
• Predicted as growth factors/hormones
• Some of them expressed in the brain
cys10A
cys10B
cys10C
cys10E
cys10D
Outcome from Gene Search
• Family including 5 sequences
• Predicted as growth factors/hormones
• Some of them expressed in the brain
• Family including 2 sequences
• Labwork at NsGene
100 bp ladder
Thymus
Thyroid gland
Trachea
Uterus
Colon
Small Intestine
Spinal Cord
Fetal Liver
Fetal brain
Pancreas
Neurosphere ctrl
dH2O
100 bp ladder
100 bp ladder
Universal ref
Whole brain
Heart
Kidney
Liver
Lung
Placenta
Prostate
Salivary gland
Skeletal muscle
Spleen
Testis
100 bp ladder
Tissue-specific expression
Patent applications
• Existing patent claims ?
• Product claim
• Medical use
• Futher function analysis
• Knock-out mouse experiments
• Clinical tests
Gene discovery