* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Improving Virus C type 4 Interferon using Bioinformatics Techniques
Adeno-associated virus wikipedia , lookup
Designer baby wikipedia , lookup
DNA vaccination wikipedia , lookup
Polyadenylation wikipedia , lookup
Transfer RNA wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Expanded genetic code wikipedia , lookup
RNA interference wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
RNA silencing wikipedia , lookup
Messenger RNA wikipedia , lookup
Point mutation wikipedia , lookup
History of RNA biology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genetic code wikipedia , lookup
Primary transcript wikipedia , lookup
Non-coding RNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org Improving Virus C type 4 Interferon using Bioinformatics Techniques 1 Attalah Hashad, 2 Khaled Kamal, 3 Ahmed Fahmy, 4 Amr Badr 1, 2, 3 Arab Academy for Science and Technology, Faculty of Engineering, Computer Engineering Department. 4 Cairo University, Faculty of Computers & Information, Computer Science Department 1 [email protected], 2 [email protected], 3 [email protected], 4 [email protected] ABSTRACT Hepatitis C is a predominant genotype found throughout the Middle East and parts of Africa, with high population prevalence in Egypt. Due to the world’s constant effort to find treatment for this fatal disease; many researches and trials have been made. It has become evident that virus C itself envelopes a self destructive gene [1], which if activated by a specific order to the mRNA aboard the virus, forms interferon. Interferon is an anti-viral if produced from virus C itself becomes specific only to it. Through a variety of bioinformatics tools we architect algorithms to enhance the chance of finding this gene which order the mRNA to produce the virus C specific interferon .Tools such as RNA to protein synthesis, gene prediction, protein classification and gene classification have been constructed and tried to reach this goal .As a result of these trials, several matches were made with alternating percentages but at least acknowledging the possibility this interferon/RNA analysis. Keywords: Virus C type 4, Interferon, RNA, DNA, Hidden Markov Models 1. INTRODUCTION Hepatitis C is an infection of the hepatocyte by the virus C .It is the most dangerous type of the hepatitis causing viruses .This is due to its relatively common method of transmission which occurs in 90% of the cases by blood transfusion[2] .This is specifically apparent in Egypt due to our lack of control on safe blood transfusion measures .Also hepatitis C has no vaccine available at the moment adding to its danger and fatality .Infection by hepatitis C may become chronic in 50 % of the cases may turn into hepatocellular carcinoma or may turn the case infected into a carrier (no clearance of the virus with normal liver function). Recently it’s been discovered that one of the host defense mechanisms against several viruses including the hepatitis C is by a protein called Interferon. Human interferons are cytokines (immune protein) produced due to stimulation by the virus infection [1]. They inhibit virus replication by inhibiting translation of mRNA into protein .They are species specific thus active in species in which they are produced but not specific for a special virus. So purified interferons are prepared by recombinant DNA technology. The main idea of the paper is to enhance the production of the interferon specific to virus C type 4 which we deal with in Egypt. This will help in the treatment of the virus more accurately due to the specificity of the interferon. As the interferon that is produced in humans is not virus C specific [3]. While the one produced by this proposed paper will be virus specific. This will be done by using the gene structure responsible for interferon protein production on the virus C RNA itself which will then be virus specific. In this paper we work using the bioinformatics techniques such as profile hidden markov models to reach our desired goal through multiple tools constructed to try to find the gene responsible for the interferon production. 2. PROFILE HIDDEN MARKOV MODELS 2.1 Introduction In This paper we used profile HMM technique to develop all tools used in our algorithm. Profile analysis has long been a useful tool in finding and aligning distantly related sequences and in identifying known sequence domains in new sequences [4]. Basically, a profile is a description of the consensus of a multiple sequence alignment. It uses a position-specific scoring system to capture information about the degree of conservation at various positions in the multiple alignments. This makes it a much more sensitive and specific method for database searching than pair wise methods. Hidden Markov modeling, a technique that has been used for years in speech recognition, is now being applied to many types of problems in molecular sequence analysis. In particular, this technique can produce profiles that are an improvement over traditionally constructed profiles. Profile hidden Markov models (HMMs) have several advantages over standard profiles. Profile HMMs have a formal probabilistic basis and have a consistent theory behind gap and insertion scores, in contrast to standard profile methods which use heuristic methods [5]. HMMs apply a statistical method to estimate the true frequency of a residue at a given position in the alignment from its observed frequency while standard profiles use the observed frequency itself to assign the score for that residue. This means that a profile HMM derived 885 VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org from only 10 to 20 aligned sequences can be of equivalent quality to a standard profile created from 40 to 50 aligned sequences. In general, producing good profile HMMs requires less skill and manual intervention than producing good standard profiles. 2.2 Profile Hidden Markov Models Architecture A profile HMM is a linear state machine consisting of a series of nodes, each of which corresponds roughly to a position (column) in the alignment from which it was built. If we ignore gaps, the correspondence is exact -- the profile HMM has a node for each column in the alignment, and each node can exist in one state, a match state as shown if figure 1 . (The word "match" here implies that there is a position in the model for every position in the sequence to be aligned to the model.) Fig 1: PHMM States A profile HMM has several types of probabilities associated with it. One type is the transition probability -the probability of transitioning from one state to another. In a simple ungapped model, the probability of a transition from one match state to the next match state is 1.0 and the path through the model is strictly linear, moving from the match state of node n to the match state of node n+1. There are also emissions probabilities associated with each match state, based on the probability of a given residue existing at that position in the alignment. For example, for a fairly well-conserved column in a protein alignment, the emissions probability for the most common amino acid may be 0.81, while for each of the other 19 amino acids it may be 0.01. If you follow a path through the model to generate a sequence consistent with the model, the probability of any sequence that is generated depends on the transition and emissions probabilities at each node. In order to model real sequences, we also need to consider the possibility that gaps might occur when a model is aligned to a sequence. Two types of gaps may arise. The first type occurs when the sequence contains a region that is not present in the model (an insertion in the sequence). The second type occurs when there is a region in the model that is not present in the sequence (a deletion in the sequence). To handle these cases, each node in the profile HMM must now have three states: the match state, an insert state, and a delete state. The model also needs more types of transition probabilities: match->match, match->insert, match->delete, insert->match, etc, as shown if figure 2. Fig 2: PHMM Architecture 886 VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org Aligning a sequence to a profile HMM is done by a dynamic programming algorithm that finds the most probable path that the sequence may take through the model, using the transition and emissions probabilities to score each possible path[4][5]. In general, if the sequence is equivalent to the consensus of the original alignment, the path through the model will pass from match state to match state in a linear fashion. If the sequence contains a deletion relative to the consensus, the path passes through one or more delete states before transitioning to the next match state; if the sequence contains an insertion relative to the consensus, the path passes through an insert state between two match states. Profile HMMs can be aligned to a sequence either globally (the whole profile HMM aligns to the sequence) or locally (only part of the profile HMM need be aligned with the sequence). The alignment type is actually part of the model, so you must specify whether the model is to be global or local at the time the model is built, not at the time the model is used. 3. PROTEINS AND GENES CLASSIFICATION The prediction of a protein's function from its amino acid sequence and gene’s function from its nucleotides sequence is one of the most important tasks in bioinformatics. The traditional procedure of searching databases for related sequences and inferring the function from the best matches has several shortcomings and pitfalls. Alternatively, the sequence under study can be scrutinized for the occurrence of particular sequence signatures that can be associated with certain protein or genes functionalities. Useful sequence signatures not only include short motifs such as protein modification sites or specific binding motifs but also encompass larger protein regions, such as homology domains. There exist a number of fundamentally different bioinformatical data structures, which can be used to store information about sequence signatures, thus making them available for the purpose of protein classification. Profile Hidden Markov Models (HMM) are statistical representations of protein and gene families derived from patterns of sequence conservation in multiple alignments and have been used in identifying remote homologues with considerable success. These conservation patterns arise from fold specific signals, shared across multiple families, and function specific signals unique to the families. The availability of sequences pre-classified according to their function permits the use of negative training sequences to improve the specificity of the HMM, both by optimizing the threshold cutoff and by modifying emission probabilities to minimize the influence of foldspecific signals. A protocol to generate family specific HMMs is described that first constructs a profile HMM from an alignment of the family's sequences and then uses this model to identify sequences belonging to other classes that score above the default threshold (false positives). Ten-fold cross validation is used to optimize the discrimination threshold score for the model. The advent of fast multiple alignment methods enables the use of the profile alignments to align the true and false positive sequences, and the resulting alignments are used to modify the emission probabilities in the original model. 4. DNA-RNA-PROTEIN SYNTHESIS 4.1 DNAs DNA is a linear polymer that is composed of four different building blocks, the nucleotides. It is in the sequence of the nucleotides in the polymers where the genetic information carried by chromosomes is located. Each nucleotide is composed of three parts: (1) a nitrogenous base known as purine (adenine (A) and guanine (G)) or pyrimidine (cytosine (C) and thymine (T)); (2) a sugar, deoxyribose; and (3) a phosphate group. The actual work of translating the information into a medium that can be used directly by the cell is done by RNA as shown in figure [1], ribonucleic acid. The RNA has three functions: (a) it serves as the messenger that tells the cell (the ribosomes) what protein to make (messenger RNA; mRNA); (b) it serves as part of the structure of the ribosome, the protein/RNA complex that synthesizes proteins according to the information presented by the mRNA (ribosomal RNA; rRNA); and (c) it functions to bring amino acids (the constituents of the proteins) to the ribosome when a specific amino acid "is called for" by the information on the mRNA to be put in into the protein that is being synthesized; this RNA is called transfer RNA (tRNA). 4.2 RNAs The messenger RNA (mRNA) serves as an intermediate between DNA and protein. Parts of the DNA are "transcribed" into transcripts (single-stranded RNA molecules) that are processed to mRNA. In prokaryotes the transcript generally does not need to be processed, and can serve as mRNA right away. Transcription starts at a specific site on the DNA called a promoter. Each gene or operon has its own promoter(s). Transcription ends at a terminator sequence on the DNA. The processed transcript is the mRNA, and the information in the mRNA can be used to be "translated" into a protein of specific sequence. However, in prokaryotes introns are rare and mRNA generally does not get processed before translation. Ribosomal RNAs (rRNAs) are essential components of an important part of the protein synthesis machinery: the ribosomes. Each ribosome contains one molecule of each of four rRNA types. In prokaryotes, 887 VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org ribosomes bind to the mRNA close to the translation start site. Transfer RNA (tRNA) carries amino acids to the ribosomes, to enable the ribosomes to put this amino acid on the protein that is being synthesized as an elongating chain of amino acid residues, using the information on the mRNA to "know" which amino acid should be put on next. For each kind of amino acid, there is a specific tRNA that will recognize the amino acid and transport it to the protein that is being synthesized, and tag it on to the protein once the information on the mRNA calls for it. All tRNAs have the same general shape, sort of resembling a clover leaf. Parts of the molecule fold back in characteristic loops, which are held in shape by nucleotidepairing between different areas of the molecule. There are two parts of the t RNA that are of particular importance: the aminoacyl attachment site and the anticodon. The aminoacyl attachment site is the site at which the amino acid is attached to the tRNA molecule. Each type of tRNA specifically binds only one type of amino acid. The anticodon (three bases) of the tRNA base-pairs with the appropriate mRNA codon at the mRNA-ribosome complex. This temporarily binds the tRNA to the mRNA, allowing the amino acid carried by the tRNA to be incorporated into the polypeptide in its proper place. Thus, the sequence of the codon (three bases) in the mRNA dictates the amino acid to be put in the protein at a specific site. The "dictionary" of codons coding for amino acids is called the genetic code. 4.3 Protein Synthesis After having discussed DNA and the various RNAs, the stage has been set for protein synthesis. The basic reaction of protein synthesis is the controlled formation of a peptide bond between two amino acids. This reaction is repeated many times, as each amino acid in turn is added to the growing polypeptide. Protein synthesis starts when the mRNA binds to a small ribosomal subunit near an AUG sequence in the mRNA. The AUG codon is called start codon, since it codes for the first amino acid (a methionine) to be made of the protein. The AUG codon base-pairs with the anticodon of tRNA carrying methionine. A large ribosomal subunit binds to the complex, and the reactions of protein synthesis itself can begin. The aminoacyl-tRNA to be called for next is determined by the next codon (the next three bases) on the mRNA as shown in figure [3]. Each amino acid is coded for by one or more (up to six) codons. Of course, it would be more straightforward to have each amino acid coded for by only one codon, but nature appears to have chosen a more complex route. The reason for this in part is that there are 20 different amino acids and 4x4x4=64 different combinations possible in a codon. When the ribosome reaches one of the three codons for which there is no matching t RNA, the ribosome falls off and the synthesized protein is released [6]. Fig 3: DNA-RNA-protein Synthesis 888 VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org 5. HEPATITIS C VIRUS LIFE CYCLE The hepatitis C virus (HCV) belongs to the Flaviviridae family and is the only member of the Hepacivirus genus.HCV infection is a major cause of chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma (HCC) worldwide .Therapeutic options are improving but are still limited and a protective vaccine is not available to date. However, many patients do not qualify for or do not tolerate standard therapy. Therefore, more effective and better tolerated therapeutic strategies are urgently needed. The development of such strategies depends on a detailed understanding of the molecular virology of HCV infection. The investigation of the HCV life cycle and pathogenesis has been complicated by the lack of efficient cell culture systems and small animal models. However , as shown in figure [4] it must complete the following basic steps to carry out its lifecycle [7] [8]: Fig 4: HCV life Cycle Step (a) The virus locates and attaches itself to a liver cell. Hepatitis C uses particular proteins present on its protective lipid coat to attach to a receptor site (a recognizable structure on the surface of the liver cell) .The virus's protein core penetrates the plasma membrane and enters the cell. To accomplish this, hepatitis C utilizes its protective lipid (fatty) coat, merging its lipid coat with the cellulose outer membrane (the coat is in fact composed of a fragment of another liver cell's plasma membrane). Once the lipid coat has successfully fused to the plasma membrane, the membrane engulfs the virus - and the viral core is inside the cell. Step (b) The protein coat dissolves to release the viral RNA in the cell. This may be accomplished during penetration of the cell membrane (it is broken open when it is released into the cytoplasm), or special enzymes present in liver cells may be used to dissolve the casing. Step (c) The viral RNA then coopts the cell's ribosomes, and begins the production of materials necessary for viral reproduction. Because hepatitis C stores its information in a "sense" strand of RNA, the viral RNA itself can be directly read by the host cell's ribosomes, functioning like the normal mRNA present in the cell. As it begins producing the materials coded in its RNA, the virus also probably shuts down most of the normal functions of the cell, conserving its energy for the production of viral material, although it occasionally appears that hepatitis C will stimulate the cell to reproduce (presumably to create more cells that can produce viruses), which is why hepatitis C is often associated with liver cancer. The viral RNA first synthesizes the RNA transcriptase it will need for reproduction. Step (d) Once there is adequate RNA transcriptase, the viral RNA creates an antisense version (the paired opposite) of itself as a template for the creation of new viral RNA. The viral RNA is now copied hundreds or thousands of times, making the genetic material for new viruses. Some of this new RNA will contain mutations. Viral RNA then directs the production of protein-based capsomeres (the building blocks for the virus's protective protein coat). Ribosomes create the proteins and release them for use. 889 VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org Step (e) the completed capsomeres assemble around the new viral RNA into new viral particles. The capsomeres are designed to attract each other and fit together in a certain way. When enough capsomeres are brought together, they self-assemble to form a spherical shell, called a capsid, that fully encapsulates the virus's RNA. The completed particle is called a nucleocapsid. Step (f) The newly formed viruses travel to the inside portion of the plasma membrane and attach to it, creating a bud. The plasma membrane encircles the virus and then releases it - providing the virus with its protective lipid coat, which it will later use to attach to another liver cell. This process of budding and release of new viruses continues for hours at the cell surface until the cell dies from exhaustion. Each surviving virus - those which are not destroyed by the immune system or other environmental factors - can produce hundreds or thousands of offspring. Over time, this endless cycle of reproduction results in significant damage to the liver, as millions upon millions of cells are destroyed by viral reproduction or by the immune system's attacks on infected cells. 6. THE PROPOSED ALGORITHM 6.1 The Algorithm Goal The goal of the proposed algorithm is to enhance the production of an interferon that would be specific to virus C especially type 4, which we deal with in Egypt. This proposal aims to reach and to help in the treatment of the virus C more accurately. This could happen as a result to the specificity of this interferon. Especially that the interferon produced in humans is not virus C specific [3], thus while researches with it in recombinant DNA technology have reached to its production, it did so with the human type. While the one that hopefully would be produced by this proposed algorithm will be virus specific not a broad spectrum antiviral as that in humans. 6.2 The Proposed Algorithm Theory This algorithm is based upon a fact derived from many researches concerning the virus C gene structure. After detailed exploration of the virus C RNA, evidence has been found of the existence of a certain gene upon the RNA. It is assumed that if this gene is activated, it could stimulate synthesis of an interferon against the virus C itself [1][2][3]. This interferon acting as an antiviral against virus C specifically. Although it has not been actually found, many trials are in progress to allocate this gene but not for the virus C type 4 subtypes that this thesis is concerned about. With this assumption of the gene existence the idea of this proposed work can exist too. This idea entitles the use of the gene which results in production of the interferon. This can be achieved by introducing many interferon, whether extracted from hepatitis C type 4 patients or produced from recombinant DNA technology methods, into an algorithm. The protein of this interferon is analyzed and in a reverse pathway the gene sequence of it is reached. Assuming that the sequence of this gene is similar to one found on the virus RNA the latter is extracted. Presuming that it is responsible for the interferon production that is sought after, the sequence is then used to find the recombinant interferon. 6.3 The Proposed Algorithm Architecture The following steps explain in details the different levels we go through starting by the introduction of the interferon until reaching to the recombinant interferon. Step 1: Extract the interferon protein from a Hepatitis C patient in Egypt randomly which is against virus C type 4 that is specific for Egyptian cases infected. Step 2: The interferon is introduced into the classification tool developed to assure that this is the specific family of the interferon which acts on humans as shown in figure 5. Step 3: After verification of the family of the interferon it is introduced into the second tool which is responsible for discovering the gene structure of the protein introduced. In this case the gene structure of the interferon is discovered. Step 4: Classification of this discovered gene structure is done on the third tool to assure the family of this gene as shown in figure 5. Step 5: Prediction of the gene sequence is done using the fourth tool by comparing to the original virus RNA. When an approximate similar sequence is found on the original virus RNA it is extracted and reinserted into the gene classification tool to re verify its family. Step 6: The predicted similar gene sequence of the original virus RNA is inserted into a synthesizing tool to synthesize the specified Antivirus C type 4 interferon proteins. 890 Virus C Type 4 interferon Genes Classification Gene Family Virus C Type 4 interferon Gene Structure Protein Classificatio n Protein, RNA synthesis Genes Prediction Virus C type 4 RNA sequence Structure Fig 5: Algorithm Architecture 891 Classification Sequence Tool Gene Family Genes Classification Approximate gene sequence structure RNA, Protein synthesis Virus C Type 4 interferon Protein Classification Recombinant Virus C Type 4 interferon VOL. 3, NO. 6, July 2012 Journal of Emerging Trends in Computing and Information Sciences ISSN 2079-8407 ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org Hepacivirus. The accredited references are [9], [10], [11] and revised by [11] bearing in mind that the sequence is incomplete on both ends. 7. RESULTS AND STATISTICS 7.1 Hepatitis C Virus Sequence Hepatitis C virus specifically, its genotype 4 is of locus NC_009825 9355 bp RNA linear VRL 18JUN-2008 and accession NC_009825. The version is NC_009825.1 GI: 157781208. This was acquired from Genome Project: 20933 using the keywords HCV poly protein. The Hepatitis C virus genotype 4 sequence was extracted from the following organisms Viruses; ss RNA positive-strand viruses, no DNA stage; Flaviviridae; and 7.2 Interferon Sequences There are 50 inter ferons that were fetched from NCBI databases [12] and all classified as Virus C Type 4 Interferon Family using classification tool, NCBI Classification tool [12] and CLC Main Workbench 4.1.1 [13]. 7.3 Prediction Result No Interferon Description Interfe. Size No Of Matches 1 gi|74095774|emb|CAE45642.2| interferon [Oncorhynchus mykiss] gi|37693458|dbj|BAC99048.1| interferon [Danio rerio] |29125840|emb|CAD67779.1| interferon [Tetraodon nigroviridis] gi|28475251|emb|CAD67752.1| interferon [Danio rerio] gi|28475279|emb|CAD67762.1| interferon [Tetraodon nigroviridis] gi|28475255|emb|CAD67754.1| interferon [Danio rerio] gi|28475253|emb|CAD67753.1| interferon [Danio rerio] gi|585316|sp|P01571.2|IFN17_HUMAN Interferon alpha-17 precursor (Interferon alpha-I') (LeIF I) (Interferon alpha-T) (Interferon alpha-88) gi|84029375|sp|P05014.2|IFNA4_HUMAN Interferon alpha4 precursor (Interferon alpha-4B) (Interferon alpha-M1) (Interferon alpha-76) gi|417188|sp|P32881.1|IFNA8_HUMAN Interferon alpha-8 precursor (Interferon alpha-B2) (Interferon alpha-B) (LeIF B) gi|124453|sp|P01566.1|IFN10_HUMAN Interferon alpha-10 precursor (Interferon alpha-C) (LeIF C) (Interferon alpha6L) gi|159164644|pdb|2HYM|B Chain B, Nmr Based Docking Model Of The Complex Between The Human Type I Interferon Receptor And Human Interferon Alpha-2 gi|118138012|pdb|2HYM|A Chain A, Nmr Based Docking Model Of The Complex Between The Human Type I Interferon Receptor And Human Interferon Alpha-2 gi|157940259|tpe|CAO03088.1| TPA: type I interferon 4 [Xenopus tropicalis] gi|157940241|tpe|CAM33515.1| TPA: type I interferon 4 [Monodelphis domestica] gi|157940225|tpe|CAM33453.1| TPA: type I interferon 4 [Ornithorhynchus anatinus] gi|124449|sp|P01563.1|IFNA2_HUMAN Interferon alpha-2 precursor (Interferon alpha-A) (LeIF A) Interferon Description 564 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 No Matching Percent 348 Start Matching From 733 603 636 380 388 6820 3946 63.018 % 61.006 % 567 7470 358 3940 7485 883 63.139 % 52.744 % 567 567 975 358 354 606 4672 4195 6589 63.139 % 62.434 % 62.154 % 609 380 7636 62.397 % 1335 776 7401 58.127 % 537 330 6118 61.453 % 567 358 7485 63.139 % 537 338 6730 62.942 % 558 354 5291 63.441 % 567 348 5302 61.376 % 567 364 2591 64.198 % 567 350 7412 61.728 % Interf. Size No Of Matches Start Matching 61.702 % Matching Percent 892 VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 gi|6754294|ref|NP_034634.1| interferon, alpha 4 precursor [Mus musculus] gi|34811411|pdb|1N6V|A Chain A, Average Structure Of The Interferon-Binding Ectodomain Of The Human Type I Interferon Receptor gi|20178289|sp|P01568.2|IFN21_HUMAN Interferon alpha21 precursor (Interferon alpha-F) (LeIF F) gi|10835103|ref|NP_066546.1| interferon, alpha 4 [Homo sapiens] gi|7305519|ref|NP_038702.1| interferon regulatory factor 4 [Mus musculus] gi|124455|sp|P01562.1|IFNA1_HUMAN Interferon alpha1/13 precursor (Interferon alpha-D) (LeIF D) gi|159164194|pdb|2DLL|A Chain A, Solution Structure Of The Irf Domain Of Human Interferon Regulator Factors 4 gi|18375650|ref|NP_542416.1| protein tyrosine phosphatase, non-receptor type 13 isoform 4 [Homo sapiens] gi|124463|sp|P05015.1|IFN16_HUMAN Interferon alpha-16 precursor (Interferon alpha-WA) gi|400061|sp|P05000.2|IFNW1_HUMAN Interferon omega1 precursor (Interferon alpha-II-1) gi|766|emb|CAA46506.1| trophoblast type I interferon gene [Bos taurus] gi|29468974|gb|AAO64456.1| interferon alpha 4 precursor [Mus musculus] gi|27451546|gb|AAO14969.1| type I interferon [Macropus eugenii] gi|74035856|emb|CAE46918.2| type I interferon [Oncorhynchus mykiss] |10720037|sp|Q61190.1|I10R2_MOUSE Interleukin-10 receptor beta chain precursor (IL-10R-B) (IL-10R2) (Cytokine receptor class-II member 4) (Cytokine receptor family 2 member 4) (CRF2-4) (CDw210b antigen) gi|124498|sp|P07352.1|IFNW1_BOVIN Interferon omega-1 precursor (Interferon alpha-II-1) (IFN-omega-c1) gi|1272477|gb|AAC50779.1| lymphocyte specific interferon regulatory factor/interferon regulatory factor 4 gi|157820959|ref|NP_001100137.1| interferon, alpha 4 [Rattus norvegicus] gi|122890705|emb|CAM14943.1| interferon alpha family, gene 4 [Mus musculus] gi|56757647|sp|Q08334.2|I10R2_HUMAN Interleukin-10 receptor beta chain precursor (IL-10R-B) (IL-10R2) (Cytokine receptor class-II member 4) (Cytokine receptor family 2 member 4) (CRF2-4) (CDw210b antigen) gi|53680584|gb|AAU89488.1| interferon alpha 4 [Marmota monax] gi|114326424|ref|NP_001041624.1| interferon omega 4 [Felis catus] gi|12830735|gb|AAK08199.1|AF320332_1 interferon regulatory factor 4 deltaE6 [Gallus gallus] gi|12830733|gb|AAK08198.1|AF320331_1 interferon regulatory factor 4 [Gallus gallus] gi|88810133|gb|ABD52365.1| interferon alpha 4 [Ailuropoda melanoleuca] 549 336 From 6299 61.202 % 585 374 1710 63.932 % 585 374 8294 63.932 % 567 356 7412 62.787 % 1227 736 1191 59.984 % 1350 776 941 57.481 % 1047 658 7741 62.846 % 567 366 4195 64.550 % 546 348 954 63.736 % 567 370 7600 65.256 % 495 314 802 63.434 % 558 356 7509 63.799 % 567 374 5896 65.961 % 558 356 1909 63.799 % 219 162 7524 73.973 % 549 352 5080 64.117 % 585 364 8468 62.222 % 567 358 7598 63.139 % 528 334 4564 63.258 % 567 366 1119 64.550 % 558 344 8024 61.649 % 567 352 1317 62.081 % 528 344 4564 65.152 % 558 348 8478 62.366 % 603 382 5941 63.350 % 893 VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org 43 44 No 45 46 47 48 49 50 gi|109732682|gb|AAI16218.1| Interferon alpha 4 [Mus musculus] gi|111601291|gb|AAI19352.1| Interferon alpha 4 [Mus musculus] Interferon Description 1350 774 3196 57.333 % 636 388 761 61.006 % Interf. Size No Of Matches gi|111600996|gb|AAI19350.1| Interferon alpha 4 [Mus musculus] gi|109731770|gb|AAI13641.1| Interferon, alpha 4 [Homo sapiens] gi|109731105|gb|AAI13643.1| Interferon, alpha 4 [Homo sapiens] gi|63148866|gb|AAY34555.1| interferon alpha 4 [Marmota himalayana] gi|49902173|gb|AAH74965.1| Interferon, alpha 4 [Homo sapiens] gi|49901646|gb|AAH74966.1| Interferon, alpha 4 [Homo sapiens] 555 346 Start Matching From 8566 62.342 % 561 348 4755 62.032 % 363 238 8679 65.565 % 588 366 7359 62.245 % 477 306 7870 64.151 % 558 348 7952 63.366 % Interferon number 32 has maximum match percentage with 162 Matches, the matching Start from nucleotide number 7524, Interferon Size 219 and Prediction percent= 73.973. Matching Percent The Virus Part that has most number of matches with Interferon number 32 TGCTGTTCGATGTCATACTCGTGGACTGGGGCGCTTGTAACACCTTGCGCGGCTGA AGAATCAAAGCTGCCAATTAGCCCCCTGAGCAATTCACTTTTGCGCCATCACAATA TGGTGTATGCCACGACCACCCGTTCTGCTGTGACACGGCAGAAGAAGGTGACCTTC GACCGCCTGCAGGTGGTGGACAGTACCTACAATGAAGTGCTTAAGGAGATA The output recombinant interferon after delivering the virus DNA part that has the most prediction percentage to the RNA, Protein synthesis Tool and CLC Main Workbench 4.1.1 [13] CCSMSYSWTGALVTPCAAEESKLPISPLSNSLLRHHNMVYATTTRSAVTRQKKVTFDR LQVVDSTYNEVLKEI 8. CONCLUSION. REFERENCES Using the constructed algorithms with the help of the bioinformatics tools, we tried to find our goal. That was to allocate the gene sequence on the RNA of virus C that was of most similarity to one of the interferon sequences that were input in the trials. In one, we reached a similarity of 73.973 %, which is a promising result to give this proposed idea a real life try. [1] Gladwin, M. And TRATTLER, B., Clinical Microbiology, McGraw-Hill, International edition, 1997. [2] HMAIED, F.; LEGRAND-ABRAVANEL, F.; NICOT, F.; GARRIGUES, N.; CHAPUY-REGAUD, S.; Dubois, M.; NJOUOM, R.; IZOPET, J. AND PASQUIER, C.,” Full-length genome sequences of hepatitis C virus subtype 4f”, J. Gen. VIROL.,; 88(11): 2985 – 2990, November 1, 2007. 894 VOL. 3, NO. 6, July 2012 ISSN 2079-8407 Journal of Emerging Trends in Computing and Information Sciences ©2009-2012 CIS Journal. All rights reserved. http://www.cisjournal.org [3] ABDEL-HAMID, M.; El-Daly, M.; MOLNEGREN, V.; El-KAFRAWY, S.; ABDEL-LATIF, S.; ESMAT, G.; Strickland, T.; LOFFREDO, C.; Albert, J. and WIDELL, A.,”GENETIC diversity in hepatitis C virus in Egypt and possible association with HEPATOCELLULAR carcinoma”, J. Gen. VIROL.; 88(5): 1526 - 1531, May 1, 2007. [9] Chamberlain, R.W.; Adams, N.; SAEED, A.A.; SIMMONDS, P. and Elliott, R.M., ”Complete Nucleotide sequence of a type 4 hepatitis C virus variant, the predominant genotype in the Middle East”, J. Gen. VIROL. 78:1341-1347, 1997. [10] CONSRTM NCBI Genome Project, National Centre for Biotechnology Information, NIH, Bethesda, MD 20894, USA, 05-SEP-2007. [4] Eddy, S.R., Profile hidden Markov Bioinformatics, 14; 755-763, 1998. models, [5] Durbin, R.; Eddy, S.; Krogh, A. and MITCHISON, G., Biological Sequence Analysis , Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, Cambridge, UK, 1998. [11] Chamberlain, R.W., ”Complete nucleotide sequence of a type 4 hepatitis C virus”, Institute of Virology, University of Glasgow, Church Street, Glasgow, G11 5JR, UK, 03-JUN-1997. [6] ALBERTS, Bruce. Molecular biology of the cell. New York: Garland Science. pp. 760. ISBN0-81533218-1, 2002. [12 ]National Centre for Biotechnology Information, www.ncbi.com. [13] [7] Koutsoudakis, G.; Kaul, A.; Steinmann, E.; Kallis, S.; Lohmann, V.; Pietschmann, T. and BARTENSCHLAGER, R.,” Characterization of the early steps of hepatitis C virus infection by using LUCIFERASE Reporter viruses”, J VIROL, 80: 5308–5320, 2006. [8] Chang, K.S.; Jiang, J.; CAI, Z. and LUO, G., “Human APOLIPOPROTEIN E is required for infectivity and production of hepatitis C virus in cell culture”, J VIROL, 81: 13783–13793, 2007. BJARNE Knudsen; Thomas Knudsen; MIKAEL FLENSBORG; HENRIK SANDMANN; Michael HELTZEN; Alex Andersen; Mikkel Dickenson; Jakob Bardram; Peter J. Steffensen; Søren Mønsted; Torben Lauritzen, Roald Forsberg; Agnes Thanbichler; Jannick D. Bendtsen; Lasse Görlitz; Jane Rasmussen; David Tordrup; Morten Værum; Mikkel Nygaard Ravn; Christian Hachenberg; Esben Fisker; Patrick Dekker and Jacob Schultz, CLC Main Workbench 4.1.1, www.clcbio.com. 895