Download Improving Virus C type 4 Interferon using Bioinformatics Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Adeno-associated virus wikipedia , lookup

Designer baby wikipedia , lookup

DNA vaccination wikipedia , lookup

RNA world wikipedia , lookup

Polyadenylation wikipedia , lookup

Transfer RNA wikipedia , lookup

Genomics wikipedia , lookup

DNA virus wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Expanded genetic code wikipedia , lookup

RNA interference wikipedia , lookup

Gene wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

RNA silencing wikipedia , lookup

Messenger RNA wikipedia , lookup

RNA wikipedia , lookup

NEDD9 wikipedia , lookup

Point mutation wikipedia , lookup

History of RNA biology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genetic code wikipedia , lookup

Primary transcript wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding RNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epitranscriptome wikipedia , lookup

Transcript
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
Improving Virus C type 4 Interferon using Bioinformatics Techniques
1
Attalah Hashad, 2 Khaled Kamal, 3 Ahmed Fahmy, 4 Amr Badr
1, 2, 3
Arab Academy for Science and Technology, Faculty of Engineering, Computer Engineering Department.
4
Cairo University, Faculty of Computers & Information, Computer Science Department
1
[email protected], 2 [email protected], 3 [email protected], 4 [email protected]
ABSTRACT
Hepatitis C is a predominant genotype found throughout the Middle East and parts of Africa, with high population prevalence
in Egypt. Due to the world’s constant effort to find treatment for this fatal disease; many researches and trials have been made.
It has become evident that virus C itself envelopes a self destructive gene [1], which if activated by a specific order to the
mRNA aboard the virus, forms interferon. Interferon is an anti-viral if produced from virus C itself becomes specific only to it.
Through a variety of bioinformatics tools we architect algorithms to enhance the chance of finding this gene which order the
mRNA to produce the virus C specific interferon .Tools such as RNA to protein synthesis, gene prediction, protein
classification and gene classification have been constructed and tried to reach this goal .As a result of these trials, several
matches were made with alternating percentages but at least acknowledging the possibility this interferon/RNA analysis.
Keywords: Virus C type 4, Interferon, RNA, DNA, Hidden Markov Models
1. INTRODUCTION
Hepatitis C is an infection of the hepatocyte by the
virus C .It is the most dangerous type of the hepatitis causing
viruses .This is due to its relatively common method of
transmission which occurs in 90% of the cases by blood
transfusion[2] .This is specifically apparent in Egypt due to
our lack of control on safe blood transfusion measures .Also
hepatitis C has no vaccine available at the moment adding to
its danger and fatality .Infection by hepatitis C may become
chronic in 50 % of the cases may turn into hepatocellular
carcinoma or may turn the case infected into a carrier (no
clearance of the virus with normal liver function).
Recently it’s been discovered that one of the host
defense mechanisms against several viruses including the
hepatitis C is by a protein called Interferon. Human
interferons are cytokines (immune protein) produced due to
stimulation by the virus infection [1]. They inhibit virus
replication by inhibiting translation of mRNA into protein
.They are species specific thus active in species in which
they are produced but not specific for a special virus. So
purified interferons are prepared by recombinant DNA
technology.
The main idea of the paper is to enhance the
production of the interferon specific to virus C type 4 which
we deal with in Egypt. This will help in the treatment of the
virus more accurately due to the specificity of the interferon.
As the interferon that is produced in humans is not virus C
specific [3]. While the one produced by this proposed paper
will be virus specific. This will be done by using the gene
structure responsible for interferon protein production on the
virus C RNA itself which will then be virus specific. In this
paper we work using the bioinformatics techniques such as
profile hidden markov models to reach our desired goal
through multiple tools constructed to try to find the gene
responsible for the interferon production.
2. PROFILE HIDDEN MARKOV MODELS
2.1
Introduction
In This paper we used profile HMM technique to
develop all tools used in our algorithm. Profile analysis has
long been a useful tool in finding and aligning distantly
related sequences and in identifying known sequence
domains in new sequences [4]. Basically, a profile is a
description of the consensus of a multiple sequence
alignment. It uses a position-specific scoring system to
capture information about the degree of conservation at
various positions in the multiple alignments. This makes it a
much more sensitive and specific method for database
searching than pair wise methods.
Hidden Markov modeling, a technique that has
been used for years in speech recognition, is now being
applied to many types of problems in molecular sequence
analysis. In particular, this technique can produce profiles
that are an improvement over traditionally constructed
profiles. Profile hidden Markov models (HMMs) have
several advantages over standard profiles.
Profile HMMs have a formal probabilistic basis and
have a consistent theory behind gap and insertion scores, in
contrast to standard profile methods which use heuristic
methods [5]. HMMs apply a statistical method to estimate
the true frequency of a residue at a given position in the
alignment from its observed frequency while standard
profiles use the observed frequency itself to assign the score
for that residue. This means that a profile HMM derived
885
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
from only 10 to 20 aligned sequences can be of equivalent
quality to a standard profile created from 40 to 50 aligned
sequences. In general, producing good profile HMMs
requires less skill and manual intervention than producing
good standard profiles.
2.2
Profile Hidden Markov Models
Architecture
A profile HMM is a linear state machine consisting of
a series of nodes, each of which corresponds roughly to a
position (column) in the alignment from which it was built.
If we ignore gaps, the correspondence is exact -- the profile
HMM has a node for each column in the alignment, and
each node can exist in one state, a match state as shown if
figure 1 . (The word "match" here implies that there is a
position in the model for every position in the sequence to be
aligned to the model.)
Fig 1: PHMM States
A profile HMM has several types of probabilities
associated with it. One type is the transition probability -the probability of transitioning from one state to another. In
a simple ungapped model, the probability of a transition
from one match state to the next match state is 1.0 and the
path through the model is strictly linear, moving from the
match state of node n to the match state of node n+1.
There are also emissions probabilities associated
with each match state, based on the probability of a given
residue existing at that position in the alignment. For
example, for a fairly well-conserved column in a protein
alignment, the emissions probability for the most common
amino acid may be 0.81, while for each of the other 19
amino acids it may be 0.01. If you follow a path through the
model to generate a sequence consistent with the model, the
probability of any sequence that is generated depends on the
transition and emissions probabilities at each node.
In order to model real sequences, we also need to
consider the possibility that gaps might occur when a model
is aligned to a sequence. Two types of gaps may arise. The
first type occurs when the sequence contains a region that is
not present in the model (an insertion in the sequence). The
second type occurs when there is a region in the model that
is not present in the sequence (a deletion in the sequence).
To handle these cases, each node in the profile HMM must
now have three states: the match state, an insert state, and a
delete state. The model also needs more types of transition
probabilities: match->match, match->insert, match->delete,
insert->match, etc, as shown if figure 2.
Fig 2: PHMM Architecture
886
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
Aligning a sequence to a profile HMM is done by a
dynamic programming algorithm that finds the most
probable path that the sequence may take through the model,
using the transition and emissions probabilities to score each
possible path[4][5].
In general, if the sequence is equivalent to the
consensus of the original alignment, the path through the
model will pass from match state to match state in a linear
fashion. If the sequence contains a deletion relative to the
consensus, the path passes through one or more delete states
before transitioning to the next match state; if the sequence
contains an insertion relative to the consensus, the path
passes through an insert state between two match states.
Profile HMMs can be aligned to a sequence either
globally (the whole profile HMM aligns to the sequence) or
locally (only part of the profile HMM need be aligned with
the sequence). The alignment type is actually part of the
model, so you must specify whether the model is to be
global or local at the time the model is built, not at the time
the model is used.
3. PROTEINS AND GENES
CLASSIFICATION
The prediction of a protein's function from its
amino acid sequence and gene’s function from its
nucleotides sequence is one of the most important tasks in
bioinformatics. The traditional procedure of searching
databases for related sequences and inferring the function
from the best matches has several shortcomings and pitfalls.
Alternatively, the sequence under study can be scrutinized
for the occurrence of particular sequence signatures that can
be associated with certain protein or genes functionalities.
Useful sequence signatures not only include short motifs
such as protein modification sites or specific binding motifs
but also encompass larger protein regions, such as homology
domains. There exist a number of fundamentally different
bioinformatical data structures, which can be used to store
information about sequence signatures, thus making them
available for the purpose of protein classification.
Profile Hidden Markov Models (HMM) are
statistical representations of protein and gene families
derived from patterns of sequence conservation in multiple
alignments and have been used in identifying remote
homologues with considerable success. These conservation
patterns arise from fold specific signals, shared across
multiple families, and function specific signals unique to the
families. The availability of sequences pre-classified
according to their function permits the use of negative
training sequences to improve the specificity of the HMM,
both by optimizing the threshold cutoff and by modifying
emission probabilities to minimize the influence of foldspecific signals. A protocol to generate family specific
HMMs is described that first constructs a profile HMM from
an alignment of the family's sequences and then uses this
model to identify sequences belonging to other classes that
score above the default threshold (false positives). Ten-fold
cross validation is used to optimize the discrimination
threshold score for the model. The advent of fast multiple
alignment methods enables the use of the profile alignments
to align the true and false positive sequences, and the
resulting alignments are used to modify the emission
probabilities in the original model.
4. DNA-RNA-PROTEIN SYNTHESIS
4.1
DNAs
DNA is a linear polymer that is composed of four
different building blocks, the nucleotides. It is in the
sequence of the nucleotides in the polymers where the
genetic information carried by chromosomes is located.
Each nucleotide is composed of three parts: (1) a
nitrogenous base known as purine (adenine (A) and guanine
(G)) or pyrimidine (cytosine (C) and thymine (T)); (2) a
sugar, deoxyribose; and (3) a phosphate group.
The actual work of translating the information into
a medium that can be used directly by the cell is done by
RNA as shown in figure [1], ribonucleic acid. The RNA has
three functions: (a) it serves as the messenger that tells the
cell (the ribosomes) what protein to make (messenger RNA;
mRNA); (b) it serves as part of the structure of the ribosome,
the protein/RNA complex that synthesizes proteins
according to the information presented by the mRNA
(ribosomal RNA; rRNA); and (c) it functions to bring amino
acids (the constituents of the proteins) to the ribosome when
a specific amino acid "is called for" by the information on
the mRNA to be put in into the protein that is being
synthesized; this RNA is called transfer RNA (tRNA).
4.2
RNAs
The messenger RNA (mRNA) serves as an
intermediate between DNA and protein. Parts of the DNA
are "transcribed" into transcripts (single-stranded RNA
molecules) that are processed to mRNA. In prokaryotes the
transcript generally does not need to be processed, and can
serve as mRNA right away. Transcription starts at a specific
site on the DNA called a promoter. Each gene or operon has
its own promoter(s). Transcription ends at a terminator
sequence on the DNA. The processed transcript is the
mRNA, and the information in the mRNA can be used to be
"translated" into a protein of specific sequence. However, in
prokaryotes introns are rare and mRNA generally does not
get processed before translation.
Ribosomal RNAs (rRNAs) are essential
components of an important part of the protein synthesis
machinery: the ribosomes. Each ribosome contains one
molecule of each of four rRNA types. In prokaryotes,
887
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
ribosomes bind to the mRNA close to the translation start
site.
Transfer RNA (tRNA) carries amino acids to the
ribosomes, to enable the ribosomes to put this amino acid on
the protein that is being synthesized as an elongating chain
of amino acid residues, using the information on the mRNA
to "know" which amino acid should be put on next. For each
kind of amino acid, there is a specific tRNA that will
recognize the amino acid and transport it to the protein that
is being synthesized, and tag it on to the protein once the
information on the mRNA calls for it.
All tRNAs have the same general shape, sort of
resembling a clover leaf. Parts of the molecule fold back in
characteristic loops, which are held in shape by nucleotidepairing between different areas of the molecule. There are
two parts of the t RNA that are of particular importance: the
aminoacyl attachment site and the anticodon. The aminoacyl
attachment site is the site at which the amino acid is attached
to the tRNA molecule. Each type of tRNA specifically binds
only one type of amino acid. The anticodon (three bases) of
the tRNA base-pairs with the appropriate mRNA codon at
the mRNA-ribosome complex. This temporarily binds the
tRNA to the mRNA, allowing the amino acid carried by the
tRNA to be incorporated into the polypeptide in its proper
place. Thus, the sequence of the codon (three bases) in the
mRNA dictates the amino acid to be put in the protein at a
specific site. The "dictionary" of codons coding for amino
acids is called the genetic code.
4.3
Protein Synthesis
After having discussed DNA and the various
RNAs, the stage has been set for protein synthesis. The basic
reaction of protein synthesis is the controlled formation of a
peptide bond between two amino acids. This reaction is
repeated many times, as each amino acid in turn is added to
the growing polypeptide. Protein synthesis starts when the
mRNA binds to a small ribosomal subunit near an AUG
sequence in the mRNA. The AUG codon is called start
codon, since it codes for the first amino acid (a methionine)
to be made of the protein. The AUG codon base-pairs with
the anticodon of tRNA carrying methionine. A large
ribosomal subunit binds to the complex, and the reactions of
protein synthesis itself can begin. The aminoacyl-tRNA to
be called for next is determined by the next codon (the next
three bases) on the mRNA as shown in figure [3]. Each
amino acid is coded for by one or more (up to six) codons.
Of course, it would be more straightforward to have each
amino acid coded for by only one codon, but nature appears
to have chosen a more complex route. The reason for this in
part is that there are 20 different amino acids and 4x4x4=64
different combinations possible in a codon. When the
ribosome reaches one of the three codons for which there is
no matching t RNA, the ribosome falls off and the
synthesized protein is released [6].
Fig 3: DNA-RNA-protein Synthesis
888
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
5.
HEPATITIS C VIRUS LIFE CYCLE
The hepatitis C virus (HCV) belongs to the Flaviviridae
family and is the only member of the Hepacivirus
genus.HCV infection is a major cause of chronic hepatitis,
liver cirrhosis, and hepatocellular carcinoma (HCC)
worldwide .Therapeutic options are improving but are still
limited and a protective vaccine is not available to date.
However, many patients do not qualify for or do not tolerate
standard therapy. Therefore, more effective and better
tolerated therapeutic strategies are urgently needed. The
development of such strategies depends on a detailed
understanding of the molecular virology of HCV infection.
The investigation of the HCV life cycle and pathogenesis
has been complicated by the lack of efficient cell culture
systems and small animal models. However , as shown in
figure [4] it must complete the following basic steps to
carry out its lifecycle [7] [8]:
Fig 4: HCV life Cycle
Step (a) The virus locates and attaches itself to a
liver cell. Hepatitis C uses particular proteins present on its
protective lipid coat to attach to a receptor site (a
recognizable structure on the surface of the liver cell) .The
virus's protein core penetrates the plasma membrane and
enters the cell. To accomplish this, hepatitis C utilizes its
protective lipid (fatty) coat, merging its lipid coat with the
cellulose outer membrane (the coat is in fact composed of a
fragment of another liver cell's plasma membrane). Once the
lipid coat has successfully fused to the plasma membrane,
the membrane engulfs the virus - and the viral core is inside
the cell.
Step (b) The protein coat dissolves to release the
viral RNA in the cell. This may be accomplished during
penetration of the cell membrane (it is broken open when it
is released into the cytoplasm), or special enzymes present in
liver cells may be used to dissolve the casing.
Step (c) The viral RNA then coopts the cell's
ribosomes, and begins the production of materials necessary
for viral reproduction. Because hepatitis C stores its
information in a "sense" strand of RNA, the viral RNA itself
can be directly read by the host cell's ribosomes, functioning
like the normal mRNA present in the cell. As it begins
producing the materials coded in its RNA, the virus also
probably shuts down most of the normal functions of the
cell, conserving its energy for the production of viral
material, although it occasionally appears that hepatitis C
will stimulate the cell to reproduce (presumably to create
more cells that can produce viruses), which is why hepatitis
C is often associated with liver cancer. The viral RNA first
synthesizes the RNA transcriptase it will need for
reproduction.
Step (d) Once there is adequate RNA transcriptase,
the viral RNA creates an antisense version (the paired
opposite) of itself as a template for the creation of new viral
RNA. The viral RNA is now copied hundreds or thousands
of times, making the genetic material for new viruses. Some
of this new RNA will contain mutations. Viral RNA then
directs the production of protein-based capsomeres (the
building blocks for the virus's protective protein coat).
Ribosomes create the proteins and release them for use.
889
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
Step (e) the completed capsomeres assemble
around the new viral RNA into new viral particles. The
capsomeres are designed to attract each other and fit together
in a certain way. When enough capsomeres are brought
together, they self-assemble to form a spherical shell, called
a capsid, that fully encapsulates the virus's RNA. The
completed particle is called a nucleocapsid.
Step (f) The newly formed viruses travel to the
inside portion of the plasma membrane and attach to it,
creating a bud. The plasma membrane encircles the virus
and then releases it - providing the virus with its protective
lipid coat, which it will later use to attach to another liver
cell. This process of budding and release of new viruses
continues for hours at the cell surface until the cell dies from
exhaustion. Each surviving virus - those which are not
destroyed by the immune system or other environmental
factors - can produce hundreds or thousands of offspring.
Over time, this endless cycle of reproduction results in
significant damage to the liver, as millions upon millions of
cells are destroyed by viral reproduction or by the immune
system's attacks on infected cells.
6. THE PROPOSED ALGORITHM
6.1
The Algorithm Goal
The goal of the proposed algorithm is to enhance
the production of an interferon that would be specific to
virus C especially type 4, which we deal with in Egypt. This
proposal aims to reach and to help in the treatment of the
virus C more accurately. This could happen as a result to the
specificity of this interferon. Especially that the interferon
produced in humans is not virus C specific [3], thus while
researches with it in recombinant DNA technology have
reached to its production, it did so with the human type.
While the one that hopefully would be produced by this
proposed algorithm will be virus specific not a broad
spectrum antiviral as that in humans.
6.2
The Proposed Algorithm Theory
This algorithm is based upon a fact derived from
many researches concerning the virus C gene structure. After
detailed exploration of the virus C RNA, evidence has been
found of the existence of a certain gene upon the RNA. It is
assumed that if this gene is activated, it could stimulate
synthesis of an interferon against the virus C itself [1][2][3].
This interferon acting as an antiviral against virus C
specifically. Although it has not been actually found, many
trials are in progress to allocate this gene but not for the
virus C type 4 subtypes that this thesis is concerned about.
With this assumption of the gene existence the idea of this
proposed work can exist too.
This idea entitles the use of the gene which results
in production of the interferon. This can be achieved by
introducing many interferon, whether extracted from
hepatitis C type 4 patients or produced from recombinant
DNA technology methods, into an algorithm. The protein of
this interferon is analyzed and in a reverse pathway the gene
sequence of it is reached. Assuming that the sequence of
this gene is similar to one found on the virus RNA the latter
is extracted. Presuming that it is responsible for the
interferon production that is sought after, the sequence is
then used to find the recombinant interferon.
6.3
The Proposed Algorithm Architecture
The following steps explain in details the different
levels we go through starting by the introduction of the
interferon until reaching to the recombinant interferon.
Step 1:
Extract the interferon protein from a
Hepatitis C patient in Egypt randomly which is against virus
C type 4 that is specific for Egyptian cases infected.
Step 2:
The interferon is introduced into the
classification tool developed to assure that this is the specific
family of the interferon which acts on humans as shown in
figure 5.
Step 3:
After verification of the family of the
interferon it is introduced into the second tool which is
responsible for discovering the gene structure of the protein
introduced. In this case the gene structure of the interferon is
discovered.
Step 4:
Classification of this discovered gene
structure is done on the third tool to assure the family of this
gene as shown in figure 5.
Step 5: Prediction of the gene sequence is done
using the fourth tool by comparing to the original virus
RNA. When an approximate similar sequence is found on
the original virus RNA it is extracted and reinserted into the
gene classification tool to re verify its family.
Step 6: The predicted similar gene sequence of
the original virus RNA is inserted into a synthesizing tool to
synthesize the specified Antivirus C type 4 interferon
proteins.
890
Virus C
Type 4
interferon
Genes
Classification
Gene
Family
Virus C
Type 4
interferon
Gene
Structure
Protein
Classificatio
n
Protein,
RNA
synthesis
Genes
Prediction
Virus C type 4
RNA sequence
Structure
Fig 5: Algorithm Architecture
891
Classification
Sequence
Tool
Gene
Family
Genes
Classification
Approximate
gene sequence
structure
RNA,
Protein
synthesis
Virus C
Type 4
interferon
Protein
Classification
Recombinant
Virus C Type
4 interferon
VOL. 3, NO. 6, July 2012
Journal of Emerging Trends in Computing and Information Sciences
ISSN 2079-8407
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
Hepacivirus. The accredited references are [9], [10], [11]
and revised by [11] bearing in mind that the sequence is
incomplete on both ends.
7. RESULTS AND STATISTICS
7.1
Hepatitis C Virus Sequence
Hepatitis C virus specifically, its genotype 4 is of
locus NC_009825 9355 bp RNA
linear VRL 18JUN-2008 and accession NC_009825. The version is
NC_009825.1 GI: 157781208. This was acquired from
Genome Project: 20933 using the keywords HCV poly
protein. The Hepatitis C virus genotype 4 sequence was
extracted from the following organisms Viruses; ss RNA
positive-strand viruses, no DNA stage; Flaviviridae; and
7.2
Interferon Sequences
There are 50 inter ferons that were fetched from
NCBI databases [12] and all classified as Virus C Type 4
Interferon Family using classification tool, NCBI
Classification tool [12] and CLC Main Workbench 4.1.1
[13].
7.3 Prediction Result
No
Interferon Description
Interfe.
Size
No Of
Matches
1
gi|74095774|emb|CAE45642.2| interferon [Oncorhynchus
mykiss]
gi|37693458|dbj|BAC99048.1| interferon [Danio rerio]
|29125840|emb|CAD67779.1| interferon [Tetraodon
nigroviridis]
gi|28475251|emb|CAD67752.1| interferon [Danio rerio]
gi|28475279|emb|CAD67762.1| interferon [Tetraodon
nigroviridis]
gi|28475255|emb|CAD67754.1| interferon [Danio rerio]
gi|28475253|emb|CAD67753.1| interferon [Danio rerio]
gi|585316|sp|P01571.2|IFN17_HUMAN Interferon alpha-17
precursor (Interferon alpha-I') (LeIF I) (Interferon alpha-T)
(Interferon alpha-88)
gi|84029375|sp|P05014.2|IFNA4_HUMAN Interferon alpha4 precursor (Interferon alpha-4B) (Interferon alpha-M1)
(Interferon alpha-76)
gi|417188|sp|P32881.1|IFNA8_HUMAN Interferon alpha-8
precursor (Interferon alpha-B2) (Interferon alpha-B) (LeIF
B)
gi|124453|sp|P01566.1|IFN10_HUMAN Interferon alpha-10
precursor (Interferon alpha-C) (LeIF C) (Interferon alpha6L)
gi|159164644|pdb|2HYM|B Chain B, Nmr Based Docking
Model Of The Complex Between The Human Type I
Interferon Receptor And Human Interferon Alpha-2
gi|118138012|pdb|2HYM|A Chain A, Nmr Based Docking
Model Of The Complex Between The Human Type I
Interferon Receptor And Human Interferon Alpha-2
gi|157940259|tpe|CAO03088.1| TPA: type I interferon 4
[Xenopus tropicalis]
gi|157940241|tpe|CAM33515.1| TPA: type I interferon 4
[Monodelphis domestica]
gi|157940225|tpe|CAM33453.1| TPA: type I interferon 4
[Ornithorhynchus anatinus]
gi|124449|sp|P01563.1|IFNA2_HUMAN Interferon alpha-2
precursor (Interferon alpha-A) (LeIF A)
Interferon Description
564
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
No
Matching
Percent
348
Start
Matching
From
733
603
636
380
388
6820
3946
63.018 %
61.006 %
567
7470
358
3940
7485
883
63.139 %
52.744 %
567
567
975
358
354
606
4672
4195
6589
63.139 %
62.434 %
62.154 %
609
380
7636
62.397 %
1335
776
7401
58.127 %
537
330
6118
61.453 %
567
358
7485
63.139 %
537
338
6730
62.942 %
558
354
5291
63.441 %
567
348
5302
61.376 %
567
364
2591
64.198 %
567
350
7412
61.728 %
Interf.
Size
No Of
Matches
Start
Matching
61.702 %
Matching
Percent
892
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
gi|6754294|ref|NP_034634.1| interferon, alpha 4 precursor
[Mus musculus]
gi|34811411|pdb|1N6V|A Chain A, Average Structure Of
The Interferon-Binding Ectodomain Of The Human Type I
Interferon Receptor
gi|20178289|sp|P01568.2|IFN21_HUMAN Interferon alpha21 precursor (Interferon alpha-F) (LeIF F)
gi|10835103|ref|NP_066546.1| interferon, alpha 4 [Homo
sapiens]
gi|7305519|ref|NP_038702.1| interferon regulatory factor 4
[Mus musculus]
gi|124455|sp|P01562.1|IFNA1_HUMAN Interferon alpha1/13 precursor (Interferon alpha-D) (LeIF D)
gi|159164194|pdb|2DLL|A Chain A, Solution Structure Of
The Irf Domain Of Human Interferon Regulator Factors 4
gi|18375650|ref|NP_542416.1| protein tyrosine phosphatase,
non-receptor type 13 isoform 4 [Homo sapiens]
gi|124463|sp|P05015.1|IFN16_HUMAN Interferon alpha-16
precursor (Interferon alpha-WA)
gi|400061|sp|P05000.2|IFNW1_HUMAN Interferon omega1 precursor (Interferon alpha-II-1)
gi|766|emb|CAA46506.1| trophoblast type I interferon gene
[Bos taurus]
gi|29468974|gb|AAO64456.1| interferon alpha 4 precursor
[Mus musculus]
gi|27451546|gb|AAO14969.1| type I interferon [Macropus
eugenii]
gi|74035856|emb|CAE46918.2| type I interferon
[Oncorhynchus mykiss]
|10720037|sp|Q61190.1|I10R2_MOUSE Interleukin-10
receptor beta chain precursor (IL-10R-B) (IL-10R2)
(Cytokine receptor class-II member 4) (Cytokine receptor
family 2 member 4) (CRF2-4) (CDw210b antigen)
gi|124498|sp|P07352.1|IFNW1_BOVIN Interferon omega-1
precursor (Interferon alpha-II-1) (IFN-omega-c1)
gi|1272477|gb|AAC50779.1| lymphocyte specific interferon
regulatory factor/interferon regulatory factor 4
gi|157820959|ref|NP_001100137.1| interferon, alpha 4
[Rattus norvegicus]
gi|122890705|emb|CAM14943.1| interferon alpha family,
gene 4 [Mus musculus]
gi|56757647|sp|Q08334.2|I10R2_HUMAN Interleukin-10
receptor beta chain precursor (IL-10R-B) (IL-10R2)
(Cytokine receptor class-II member 4) (Cytokine receptor
family 2 member 4) (CRF2-4) (CDw210b antigen)
gi|53680584|gb|AAU89488.1| interferon alpha 4 [Marmota
monax]
gi|114326424|ref|NP_001041624.1| interferon omega 4
[Felis catus]
gi|12830735|gb|AAK08199.1|AF320332_1 interferon
regulatory factor 4 deltaE6 [Gallus gallus]
gi|12830733|gb|AAK08198.1|AF320331_1 interferon
regulatory factor 4 [Gallus gallus]
gi|88810133|gb|ABD52365.1| interferon alpha 4 [Ailuropoda
melanoleuca]
549
336
From
6299
61.202 %
585
374
1710
63.932 %
585
374
8294
63.932 %
567
356
7412
62.787 %
1227
736
1191
59.984 %
1350
776
941
57.481 %
1047
658
7741
62.846 %
567
366
4195
64.550 %
546
348
954
63.736 %
567
370
7600
65.256 %
495
314
802
63.434 %
558
356
7509
63.799 %
567
374
5896
65.961 %
558
356
1909
63.799 %
219
162
7524
73.973 %
549
352
5080
64.117 %
585
364
8468
62.222 %
567
358
7598
63.139 %
528
334
4564
63.258 %
567
366
1119
64.550 %
558
344
8024
61.649 %
567
352
1317
62.081 %
528
344
4564
65.152 %
558
348
8478
62.366 %
603
382
5941
63.350 %
893
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
43
44
No
45
46
47
48
49
50
gi|109732682|gb|AAI16218.1| Interferon alpha 4 [Mus
musculus]
gi|111601291|gb|AAI19352.1| Interferon alpha 4 [Mus
musculus]
Interferon Description
1350
774
3196
57.333 %
636
388
761
61.006 %
Interf.
Size
No Of
Matches
gi|111600996|gb|AAI19350.1| Interferon alpha 4 [Mus
musculus]
gi|109731770|gb|AAI13641.1| Interferon, alpha 4 [Homo
sapiens]
gi|109731105|gb|AAI13643.1| Interferon, alpha 4 [Homo
sapiens]
gi|63148866|gb|AAY34555.1| interferon alpha 4 [Marmota
himalayana]
gi|49902173|gb|AAH74965.1| Interferon, alpha 4 [Homo
sapiens]
gi|49901646|gb|AAH74966.1| Interferon, alpha 4 [Homo
sapiens]
555
346
Start
Matching
From
8566
62.342 %
561
348
4755
62.032 %
363
238
8679
65.565 %
588
366
7359
62.245 %
477
306
7870
64.151 %
558
348
7952
63.366 %
Interferon number 32 has maximum match
percentage with 162 Matches, the matching Start from
nucleotide number 7524, Interferon Size 219 and Prediction
percent= 73.973.
Matching
Percent
The Virus Part that has most number of matches with
Interferon number 32
TGCTGTTCGATGTCATACTCGTGGACTGGGGCGCTTGTAACACCTTGCGCGGCTGA
AGAATCAAAGCTGCCAATTAGCCCCCTGAGCAATTCACTTTTGCGCCATCACAATA
TGGTGTATGCCACGACCACCCGTTCTGCTGTGACACGGCAGAAGAAGGTGACCTTC
GACCGCCTGCAGGTGGTGGACAGTACCTACAATGAAGTGCTTAAGGAGATA
The output recombinant interferon after delivering
the virus DNA part that has the most prediction percentage
to the RNA, Protein synthesis Tool and CLC Main
Workbench 4.1.1 [13]
CCSMSYSWTGALVTPCAAEESKLPISPLSNSLLRHHNMVYATTTRSAVTRQKKVTFDR
LQVVDSTYNEVLKEI
8. CONCLUSION.
REFERENCES
Using the constructed algorithms with the help of
the bioinformatics tools, we tried to find our goal. That was
to allocate the gene sequence on the RNA of virus C that
was of most similarity to one of the interferon sequences that
were input in the trials. In one, we reached a similarity of
73.973 %, which is a promising result to give this proposed
idea a real life try.
[1]
Gladwin, M.
And TRATTLER, B., Clinical
Microbiology, McGraw-Hill, International edition,
1997.
[2]
HMAIED, F.; LEGRAND-ABRAVANEL, F.;
NICOT, F.; GARRIGUES, N.; CHAPUY-REGAUD,
S.; Dubois, M.; NJOUOM, R.; IZOPET, J. AND
PASQUIER, C.,” Full-length genome sequences of
hepatitis C virus subtype 4f”, J. Gen. VIROL.,;
88(11): 2985 – 2990, November 1, 2007.
894
VOL. 3, NO. 6, July 2012
ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
©2009-2012 CIS Journal. All rights reserved.
http://www.cisjournal.org
[3]
ABDEL-HAMID, M.; El-Daly, M.; MOLNEGREN,
V.; El-KAFRAWY, S.; ABDEL-LATIF, S.; ESMAT,
G.; Strickland, T.; LOFFREDO, C.; Albert, J. and
WIDELL, A.,”GENETIC diversity in hepatitis C
virus in Egypt and possible association with
HEPATOCELLULAR carcinoma”, J. Gen. VIROL.;
88(5): 1526 - 1531, May 1, 2007.
[9]
Chamberlain, R.W.; Adams, N.; SAEED, A.A.;
SIMMONDS, P. and Elliott, R.M., ”Complete
Nucleotide sequence of a type 4 hepatitis C virus
variant, the predominant genotype in the Middle
East”, J. Gen. VIROL. 78:1341-1347, 1997.
[10]
CONSRTM NCBI Genome Project, National Centre
for Biotechnology Information, NIH, Bethesda, MD
20894, USA, 05-SEP-2007.
[4]
Eddy, S.R., Profile hidden Markov
Bioinformatics, 14; 755-763, 1998.
models,
[5]
Durbin, R.; Eddy, S.; Krogh, A. and MITCHISON,
G., Biological Sequence Analysis , Probabilistic
Models of Proteins and Nucleic Acids, Cambridge
University Press, Cambridge, UK, 1998.
[11]
Chamberlain, R.W., ”Complete nucleotide sequence
of a type 4 hepatitis C virus”, Institute of Virology,
University of Glasgow, Church Street, Glasgow, G11
5JR, UK, 03-JUN-1997.
[6]
ALBERTS, Bruce. Molecular biology of the cell.
New York: Garland Science. pp. 760. ISBN0-81533218-1, 2002.
[12
]National Centre for Biotechnology Information,
www.ncbi.com.
[13]
[7]
Koutsoudakis, G.; Kaul, A.; Steinmann, E.; Kallis, S.;
Lohmann,
V.;
Pietschmann,
T.
and
BARTENSCHLAGER, R.,” Characterization of the
early steps of hepatitis C virus infection by using
LUCIFERASE Reporter viruses”, J VIROL, 80:
5308–5320, 2006.
[8]
Chang, K.S.; Jiang, J.; CAI, Z. and LUO, G., “Human
APOLIPOPROTEIN E is required for infectivity and
production of hepatitis C virus in cell culture”, J
VIROL, 81: 13783–13793, 2007.
BJARNE Knudsen; Thomas Knudsen; MIKAEL
FLENSBORG; HENRIK SANDMANN; Michael
HELTZEN;
Alex Andersen; Mikkel Dickenson;
Jakob Bardram; Peter J. Steffensen; Søren Mønsted;
Torben Lauritzen, Roald Forsberg; Agnes
Thanbichler; Jannick D. Bendtsen; Lasse Görlitz;
Jane Rasmussen; David Tordrup; Morten Værum;
Mikkel Nygaard Ravn; Christian Hachenberg; Esben
Fisker; Patrick Dekker and Jacob Schultz, CLC Main
Workbench 4.1.1, www.clcbio.com.
895