Download CHAPTER 5 CARBON CONTENT: LOW LARGE HYDROPHOBIC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

G protein–coupled receptor wikipedia , lookup

Gene expression wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Magnesium transporter wikipedia , lookup

QPNC-PAGE wikipedia , lookup

List of types of proteins wikipedia , lookup

Molecular evolution wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Bottromycin wikipedia , lookup

Protein moonlighting wikipedia , lookup

Peptide synthesis wikipedia , lookup

Protein folding wikipedia , lookup

Western blot wikipedia , lookup

Homology modeling wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Metabolism wikipedia , lookup

Cyclol wikipedia , lookup

Protein wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Point mutation wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Protein adsorption wikipedia , lookup

Genetic code wikipedia , lookup

Expanded genetic code wikipedia , lookup

Protein structure prediction wikipedia , lookup

Biochemistry wikipedia , lookup

Transcript
CHAPTER 5
CARBON CONTENT: LOW LARGE HYDROPHOBIC
RESIDUES (LHRs) IN AMINO ACID PATTERNS AND
INSIGHTS INTO PROTEIN FOLDING AND FUNCTION
5.1 INTRODUCTION
The genetic code plots the relationship between a triplet base sequence on
RNA and an amino acid that corresponds to a protein associated with a required
function in organisms. Proteins are considered as the major functional units in all
living organisms. They evolve from mRNAs with post-translational modification
to have a defined structure with specific functions (Hatfield and Roth, 2007;
Angov, 2011; Ikehara and Niihara, 2007). All proteins are constructed from linear
sequences of smaller molecules, scilicet amino acids. Proteins also fold up to
particular three dimensional structures which renders their specific biological
functionality. The ultimate mechanism that cells use to ensure the quality of
intracellular proteins is the selective destruction of misfolded or damaged
polypeptide (Goldberg, 2003; Schrader et al., 2009; Gsponeret et al., 2008).
However, a linear amino acid sequence of a given protein does not completely
specify the three dimensional structure for most, if not all, proteins. In addition, a
protein’s three dimensional structure is not fixed; individual atoms have a
significant specific role that allows the proteins to move and flex in constrained
ways which are required for them to function properly. A widely accepted
principle is that protein evolution is mainly determined on limited criteria such as
folding and activity. Knowledge of how proteomic amino acid composition has
changed over time is important for constructing realistic models of protein
evolution and increasing our understanding of molecular evolutionary protein
functions (Ikehara and Niihara, 2007; Brooks et al., 2004).
69
In biological macromolecules, the lowest level of biological organization
is that of atoms. However, individual amino acids, short peptides and longer
proteins can vary greatly in their contents of specific individual atoms such as
carbon (Bragg and Wagner, 2007; Bragg and Wagner, 2009; Li et al., 2009;
White and Jacobs, 1993). Further carbon content and distribution along in protein
sequences are vital for understanding the biology of individual sequences
(Rajasekaran et al., 2011), As a result of applying this strategy in protein
sequences It can prevent diseases caused by misfolded proteins as carbon is
responsible for disorders in proteins (Rajasekaran et al., 2011). and involved in
protein evolution (Rajasekaran et al., 2011). Keeping those facts mentioned in
mind and considering that carbon is the main architectural component of proteins
as well as the only atom that contributes towards the hydrophobic interactions in
proteins, the purpose of the current research was to focus on the carbon content in
peptide sequences.
5.2 METHODOLOGY
For all available amino acid patterns in a given protein were searched and
retrieved from NCBI [cited 2008 Feb 17] as given in the table 5.1B. The number
of carbon atoms and total atoms in the given length were calculated using the
program written in ‘C’ language. The program reads protein sequences in Fasta
format and assigned atomic details to each amino acids as given in the table 5.2.
The amino acid pattern of various protein sequences, varying in length from 7 to
16 amino acids, has been carried out. It was designed as amino acid pattern:
minlen=8; maxlen=17; lengap=1; (L=minlen-lengap) amino acids at both (N-/C-)
terminals of a protein (where L is the length of an amino acid pattern). Given a
proportion of e.g. carbon atoms set fixed as 50. then the pattern obtained would be
assigned as positive amino acid pattern, otherwise it would be assigned as
negative amino acid pattern to generate the pattern corresponding to the terminal
residues in a protein sequence, as reported previously(Kumar et al., 2008). They
have added (L-1)/2 dummy residue “X” at both terminals of a protein (where L is
the length of pattern). As an example, for a window size of 17 amino acids, they
have added 8 "X" before/at the N-terminal and 8 "X" after/at the C-terminal, in
order to create M patterns from a sequence of a length M (Kaur and Raghava et
70
al., 2003a, 2003b). Finally, it was obtained various stretches of sequences
consisting of an amino acid pattern with a fixed number of carbon atoms set at 50
as given figure 5.1. The amino acid pattern analysis of various protein sequences,
varying in length from 7 to 16 amino acids were listed in the table 5.1A.
Table 5.1B: Identification of peptide sequences at NCBI using ‘BLAST’ analysis and the
patterns described in Table 5.1A.
Organism
ID number
No. of Amino acids
Homo sapiens
NP_073737.1
704
Pan troglodytes
XP_001138269.1
1198
Saccharomyces cerevisiae
NP_010795.1
1142
Saccharomyces cerevisiae
NP_009385.2
1356
Kluyveromyces lactis
CAD43214.1
219
Mus musculus
NP_001153078.1
944
Schizosaccharomyces pombe NP_592787.1
591
Homo sapiens
NP_001193858.1
420
Bos taurus
NP_001193508.1
317
Homo sapiens
AAY21823.1
387
71
Table 5.2 Average of C, S, N, O and H atoms in any peptide unit with a given
number of aa residues and provided that one aa will have about 16
atoms
atom for a given length of aa
C
S
N
O
H
% average of atoms for a given length
of aa (one aa will have about 16 atoms)
4.9
0.3
1.35
1.46
8
72
Read amino acid in protein sequence and convert into
atomic
sequence
Split the atomic sequence into residues windows of
equal size
Identify number of carbon in each residues of
window
Group the residues of window based on number of
carbon
Divide the number of windows in each category of
carbon amount of (% of C) by total number of
residues windows to get unit value
Repeat the above steps for all protein sequence
Plot % of C versus residues
Figure 5.1 The persentage of carbon atom in the given length of amino acid is computed
as follows
73
5.3 RESULTS AND DISCUSSION
The purpose of this research was to focus on the carbon content in proteins
because for carbon the largest variations in protein sequences have been observed.
The idea behind this task was very simple: to visualize the protein molecule on its
actual basis i.e. its atomic level. The basic units of proteins were carbon (C),
sulphur (S), nitrogen (N), oxygen (O) and hydrogen (H). The arrangement of
these atoms along the protein sequences were carried out. Then the question was
raised: what was the average of C, S, N, O and H atoms in a given number of
amino acid residues. The results shown that for any given number of residues the
average of C, S, N, O and H atoms was 4.9, 0.3, 1.35, 1.45 and 8 per unit
respectively (one amino acid will have about 16 atoms).
In the present study it was observed that peptide patterns with the same
number of carbons (e.g. # C = 50) can have different lengths of amino acids (e.g.
# aa = 7-16); the increased number of amino acids goes along with an increased
number of total atoms in these peptides. The amino acid pattern in proteins with
different length (7-16) of protein sequences would have the same number of
carbon but a different number of total amino acid residues. The present study also
suggests that the amino acid patterns of length from (11-16), the LHRs (F, I, L, M
and V) decreases. At the same time the SHRs (G, A, P, W and C) were found in
the amino acid patterns with the addition of these residues, the length of amino
acid patterns also increases. Few amino acid patterns have been listed in table
5.1A. Because of which the length of amino acid sequence getting increased and
in turn protein length also increases. Although the proteins have different lengths
but the same function for which it is intended for in various species, its atomic
nature is undisturbed alignment.
In biological macromolecules, the lowest level of biological organization
is that of atoms. However, individual amino acids, short peptides and longer
proteins can vary greatly in their contents of specific individual atoms such as
carbon (White and Jacobs, 1993; Schwartz et al., 2001; Bragg and Wagner, 2007;
Bragg and Wagner, 2009; Li et al., 2009).
74
Significance of carbon atom may be helpful to accurately predict in the
amino acid pattern which are vital for understanding protein stability and its
function as well as these pattern prefers to be folded one with a compact 3D
structure. So that the problem of protein-protein & protein –DNA specific
interactions, protein-small-molecule interaction and evolutionary understanding of
protein sequence for sequence alignment can be solved.
As a result of applying this strategy in protein sequences it can prevent
diseases caused by misfolding sequence. Therefore, it is an efficient and flexible
approach to analyze the function of all proteins for both molecular biologists and
computational scientists. Thus, bioinformatics can play a major role towards this
task. Previously, it has been shown that short amino acid residue patterns (e.g.
pentapeptides) can be a useful tool for predicting sequence features such as
secondary structure (Figureau et al., 2003). Furthermore, efforts at local structure
prediction have been made with sequence segments of a length with nine amino
acids, using profiles based on structurally aligned regions (Yang and Wang,
2003). This research work provides a potential new avenue to represent protein
sequences at the atomic level, such as carbon, in order to predict amino acid
patterns, peptide sequences, protein structures and protein functions which may
help to understand the origin of diseases caused by misfolded proteins such as
AD/PD.
5.3.1 The problem of protein folding
Particular sequences of amino acids usually get folded into characteristic
three dimensional structures. More than 50 years work have been carried out and
declaimed to crack the code which governs protein folding. Computational
scientists address the problems of this kind as ″protein folding problem″, and it
remains one of the greatest challenges in structural biology. Although researchers
have formulated some general rules which cannot be applied in practical cases,
due to the limitations in the formulation, rough guesses needs to be done.
Moreover, predicting the protein shape and its position of every atom by a
molecule-based system has not been carried out in the past and prompted us to an
extensive investigation on amino acid patterns in proteins.
75
Table 5.1A In peptides, amino acid patterns with the same number of carbon (here: 50)
can have various amino acid (aa) lengths with a different number atoms and a different
number of carbon average.
Amino acid (aa) patterns
#
# amino
Carbon acids /
length
of
residues
RFLRRRW
RLHIKFKE
FNGLEKLLR
IQNPSMLLEP
AQAQREAAAEY
SGRYISAAPGAE
EDSMGGTSGGLYS
EPGEEGPTAGSVGG
GMGGHGYGGAGDASS
GAAGGCGVAGAGADGY
50
50
50
50
50
50
50
50
50
50
7
8
9
10
11
12
13
14
15
16
#
totals
atoms
for the
given
aa
pattern
158
159
161
163
163
162
164
165
162
163
#
Carbon/amino
acid/length
of residues
7.14
6.25
5.55
5.00
4.54
4.16
3.84
3.57
3.33
3.12
Reduced LHRs (FILMV) in aa patterns with length from aa 11-16 decreased and
increased SHRs (AGPWC) from aa 11-16
76
Many scientists believed that, if the structures of proteins were deciphered
from their sequences, then it would be easy to understand and predict the
functions of proteins – particularly also with respect to folding/misfolding and
functioning/malfunctioning aspects. Later, this knowledge could help to improve
the treatment of diseases pertaining to protein misfolding nature. The availability
of complete protein sequences has given new dimensions of research, thereby
studying the structures of all proteins from a single organism and comparing it
with many different species and so on. The ultimate dream of structural biologists
around the globe is to determine the protein folding from genetic sequences
thereby analyzing not only the three-dimensional structure but also some aspects
of the functions of all proteins which have been carried out here.
5.4 CONCLUSION
The amino acid pattern in proteins with different length (7-16) of protein
sequences would have the same number of carbon but a different number of total
amino acid residues. The amino acid patterns of length from (11-16), the LHRs
(F, I, L, M and V) decreases. At the same time the SHRs (G, A, P, W and C) were
found in the amino acid patterns with the addition of these residues, the length of
amino acid patterns also increases. Because of which the length of amino acid
sequence getting increased and in turn protein length also increases.
77