Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Analysis of “Coronavirus 3CLpro proteinase cleavage sites: Possible relevance to SARS virus pathology” Connie Wu Article Resources BMC Bioinformatics 2004, 5:72 Published on Jun 6, 2004 Article URL http://www.biomedcentral.com/14712105/5/72 NetCorona URL http://www.cbs.dtu.dk/services/NetCorona Outline SARS outbreak in 2003 Introduction to SARS virus Experimental database used Pattern Recognition Method Neural Network Method Biological Significance on NetCorona SARS Outbreak in 2003 A Chinese man was found to have caught the infectious respiratory disease in Hong Kong, first case emerge from the general population since July 2003. Infected more than 8,000 people in close to 30 nations and killed more than 750. SARS Virus Belongs to the family of human coronavirus, normally causes mild cold symptoms in human. The proteolytic cleavage of host proteins by viral proteinases is found in the pathology of other virus families such as picornaviruses. Virus proliferation can be arrested using specific proteinase inhibitors. SARS Virus Experimental database Seven full-length coronavius genomes retrieved from the GenBank database. Each sequence contained eleven 3CLpro proteinase cleavage sites, given a total 77 identifiable sites. Identify the main 3CL sites (P1) in polyproteins using alignment without gaps. P1 = N-terminal to cleavage site P1’= C-terminal to cleavage site Consensus Pattern Recognition Glutamine (Q) in position P1, and a trend of strong preference for leucine (L) at position P2 in found in coronavirus proteinase. ‘LQ’ consensus pattern prediction 60/77 true positives (78%) 196 additional false positives by random occurrence of this pair of amino acid ‘LQ[S/A]’ consensus pattern prediction 48/77 true positive (62%) 36 additional false positives Limitations of Pattern Recognition Simple consensus pattern recognition (i.e. ‘LQ’) low specificity high sensitivity Sophisticated consensus pattern recognition (i.e. ‘LQ[S/A]’) high specificity low sensitivity Neural Network A sequence window of 9 amino acid centered on the glutamine in the P1 position A score between 0 and 1 to every glutamine that is present Score > 0.8 = most likely to cleaved 0.5 ~ 0.8 = possibly cleaved < 0.5 = likely not cleaved 67/77 true positives (87.0%) 1358/1372 true negatives (99.0%) Neural Network Three-layered neural network Two hidden neurons Neural Network Training Training was done with three-fold cross-validation and Matthews correlation coefficients were calculated by sum up values in all combinations of training and test sets. An averaged sum of the score of all three networks arising from the threefold cross-validation was used for predition. Neural Network on Host Cell protein Cystic fibrosis transmembrane conductance regulator (CFTR), an ATPdependent chloride channel is predicted as a cleavage site with a high score 0.842 at Gln762. Transcription factor OCT-1 is predicted to be cleaved at Gln62 by the 3CLpro proteinase with a high confidence score of 0.874. Limitation of NetCorona High specificity Low sensitivity Not accurate in predicting sites with relative low cleavage efficiency in vivo. Need to disregard high scored cleavage sites that are inaccessible to the proteinase. Significance of NetCorona Employed by researchers suspecting a possible viral proteinase cleavage. Useful if working with coronavirus function. May facilitate proteinase inhibitor drug discovery. Possible future strategy for drug development