Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Contest Beta Test of Bioinformatics
Toolbox in Matlab
Hidden Markov Model for profile analysis
of GPCR sequences
ShannChing Chen
GPCR Database
Class A Rhodopsin like
Class B Secretin like
Class C Metabotropic glutamate / pheromone
Class D Fungal pheromone
Class E cAMP receptors (Dictyostelium)
Frizzled/Smoothened family
PFAM Database
HMM Profile of GPCR Class C
Test Data Set
Class A Rhodopsin like
Class B Secretin like
Class C Metabotropic glutamate / pheromone
GPCR
PFAM
Class A
1122
64
Class B
217
37
Class C
131
31
Commands
gethmmprof
% get PFAM HMM profile
gethmmalignment
% get sequences used in the PFAM HMM profile
hmmprofestimate
% estimate parameters of a profile HMM
getgenpept
% retrieves sequence information from the NCBI GenPept database
Some Problems
some sequences cannot by retrieved by getgenpept(ACCESSNUM)
>> getgenpept('Q93564')
GPCRDB/SWISSPROT
??? Error using ==> /usr/local/matlab6p5/toolbox/bioinfo/bioinfo/private/getncbidata
Can not interpret NCBI url data.
Error in ==> /usr/local/matlab6p5/toolbox/bioinfo/bioinfo/getgenpept.m
On line 33 ==> gbout = getncbidata(accessnum,'database','protein',varargin{:});
getncbidata(accessnum,'database','protein',varargin{:});
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&
db=protein&term=Q93564&dopt=GenBank&mode=file
Suggestion:
getswissport(ACCESSNUM)
Some sequences are missing in the tests
GPCR
PFAM
Class A
1122
64
Class B
217
37
Class C
131
31
Class C
131
31
Some sequences are missing in the tests
GPCR
PFAM
MY
InterSection
Union
Class A
1122
64
1112
60
1116
Class B
217
37
88
25
100
Class C
131
31
34
12
53
Class C
131
31
34
Some sequences are missing in the tests
GPCR
PFAM
MY
Intersection
Union
Class A
1122
64
1112
60
1116
Class B
217
37
88
25
100
Class C
131
31
34
12
53
Class C
131
31
12
34
Class C
Q93564/579-850
MGR1_RAT/592-845
MGR5_RAT/578-831
MGR4_HUMAN/587-852
MGR7_RAT/590-855
MGR6_RAT/579-844
MGR2_HUMAN/567-824
MGR3_RAT/576-833
MGR_DROME/626-881
MGR1_CAEEL/681-934
O73636/604-859
O93552/570-825
O93553/585-840
O73638/590-844
O73639/613-868
O73640/609-867
O73637/600-856
CASR_HUMAN/612-867
O35268/595-850
O35266/504-757
O35189/588-839
O35267/404-655
O35265/275-530
O35269/510-762
O70409/593-848
O70410/620-874
Q20073/1026-1271
O75205/55-302
O96954/137-418
GBR2_RAT/481-753
GBR1_HUMAN/593-867
BOSS_DROME
BOSS_DROVI
CASR_BOVIN
CASR_MOUSE
CASR_RAT
GBR1_MOUSE
GBR1_RAT
GBR2_HUMAN
MGR1_HUMAN
MGR1_MOUSE
MGR2_RAT
MGR3_HUMAN
MGR3_MOUSE
MGR4_RAT
MGR5_HUMAN
MGR6_HUMAN
MGR7_HUMAN
MGR8_HUMAN
MGR8_MOUSE
MGR8_RAT
Q90WL6
Q90ZF3
Pfam
392.064735
430.324765
454.253026
462.158205
456.809927
441.637205
474.183794
474.823802
463.670698
431.536823
509.878222
499.175160
535.782160
505.409106
498.097160
490.355389
479.058648
477.146160
493.412160
498.585902
484.897441
478.584441
481.888222
476.080804
485.920160
419.333919
283.524022
250.699080
313.661421
352.997624
310.464308
-42.948487
-50.399883
469.962081
472.270365
471.901365
307.748882
265.090364
355.852179
428.469326
426.393159
477.043532
465.240190
472.056956
462.028808
451.654868
429.971744
454.131180
458.613275
460.825275
460.940275
450.195335
430.562323
Union
347.395797
490.156635
488.499817
534.799902
537.349099
508.217074
499.470152
503.952638
434.844043
385.918002
449.879078
441.637844
477.453567
449.807293
441.382544
430.260647
416.663951
495.730963
418.848716
425.435595
410.564569
404.729878
406.476280
407.558095
399.185334
331.677518
146.811483
103.702354
165.987772
279.991531
286.723440
109.350882
117.725196
493.745459
494.962668
494.982860
288.446423
260.119139
281.269532
489.348371
486.225029
503.908054
500.331401
501.604222
534.771011
485.449344
505.674784
534.670351
556.385737
559.364026
560.014303
476.755926
490.551502
My
264.384048
507.207180
504.205146
569.530580
576.424828
539.677731
514.069902
523.120933
447.045332
390.459708
232.315669
207.251644
252.858455
253.060067
238.116480
225.572317
230.852568
435.172173
190.366365
192.551769
177.347350
178.483848
180.506104
181.848329
201.261584
172.271933
41.312498
-1.632265
75.331302
303.315709
324.437816
116.872645
134.150758
434.163431
435.198261
435.450371
329.215963
300.878203
305.503043
505.671282
503.275573
518.100194
519.156127
520.899501
568.913938
500.737734
539.636493
573.746080
596.067761
599.715567
600.531703
417.802864
506.874414
Intersection
214.767834
366.073191
382.278028
436.156992
436.974014
415.271713
406.403925
403.944131
383.043601
346.138481
160.114672
146.385214
174.934421
180.272999
164.883093
154.554065
160.374462
261.112868
140.977472
147.385798
137.381278
131.359061
139.005235
138.513683
144.217952
115.508341
29.265243
10.306800
78.768892
241.765650
222.652110
-26.158734
-15.832355
256.921606
257.240773
257.396804
223.455430
187.018309
242.184040
363.345798
362.141585
407.338969
396.403286
401.373438
434.143238
378.856552
406.840972
434.295267
432.862807
435.011221
435.748855
243.027962
364.893734
Class C
131
31
12
34
OAJ1_HUMAN/52-300
OL15_MOUSE/41-290
OLF6_RAT/44-293
OLF1_CHICK/41-290
GU27_RAT/22-271
MRGF_RAT/61-291
OPSB_HUMAN/51-303
OPS3_DROME/75-338
OPSD_LOLFO/51-315
OPS1_DROME/67-329
V2R_HUMAN/54-325
FSHR_BOVIN/379-626
TRFR_HUMAN/42-320
NTR1_HUMAN/80-364
NY1R_HUMAN/57-320
GLP2_RAT/175-443
GLR_HUMAN/138-407
GIPR_HUMAN/134-399
GLP1_RAT/141-409
CALR_PIG/146-415
CALR_RAT/145-435
CGRR_HUMAN/138-391
DIHR_ACHDO/130-393
DIHR_MANSE/83-351
CRF2_XENLA/115-368
CRF1_RAT/116-370
YQ44_CAEEL/267-539
YOW3_CAEEL/219-477
O95966/400-659
Q10922/771-1017
Pfam
-186.612063
-178.345641
-195.415970
-182.367313
-187.321125
-198.150421
-187.612752
-211.224282
-205.895209
-209.269114
-203.950419
-181.742988
-168.667767
-209.554125
-193.652723
-205.422264
-180.718411
-203.089354
-198.857712
-191.393060
-198.771887
-193.500975
-181.314215
-192.033661
-167.516292
-176.533708
-210.951110
-177.623461
-159.442058
-148.153491
Union
-112.162229
-119.201224
-101.239046
-110.731995
-101.499059
-117.267656
-101.159913
-133.483526
-117.262574
-111.675883
-126.516322
-102.965039
-108.577338
-132.159324
-116.647730
-125.760343
-97.289324
-120.706433
-112.053221
-107.696115
-119.516172
-107.340185
-102.871218
-111.663686
-116.302817
-106.044739
-111.353016
-102.753965
-95.656561
-91.036382
ClassA
ClassB
Other Problems/Suggestions
hmmprofalign cannot align sequences with
wobbling amino acids (Ex. X, B, Z…)
Preprocessing is needed before aligning the sequence
with the profile.
Modify ‘hmmprofalign’ to make it more flexible
In Class C,
align sequence GBR1_HUMAN with PFAM HMM Profile
'GBR1_HUMAN/593-867'
---SVSVLSSLG-[…]VPKMRRLITRGEWQ
LFISVSVLSSLG-[…]VPKMRRLITRGEWQ
'GBR1_HUMAN/1-961'
Summary
Perform Hidden Markov Model for profile
analysis of GPCR sequences in ClassA,
ClassB and ClassC
Suggestions about ‘getswissport’ and
‘hmmprofalign’
Not sure about how to determine the
representative sets to train the HMM profile,
although the HMM profile works pretty well
PFAM Database
HMM Profile of GPCR Class C