Download CHAPTER 4 DISTRIBUTION OF CARBON, SULPHUR, NITROGEN

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ancestral sequence reconstruction wikipedia , lookup

Carbon sink wikipedia , lookup

Gaseous signaling molecules wikipedia , lookup

Point mutation wikipedia , lookup

Western blot wikipedia , lookup

Microbial metabolism wikipedia , lookup

Carbon wikipedia , lookup

Protein wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Biosequestration wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Genetic code wikipedia , lookup

Metabolism wikipedia , lookup

Biosynthesis wikipedia , lookup

Isotopic labeling wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Proteolysis wikipedia , lookup

Biochemistry wikipedia , lookup

Metalloprotein wikipedia , lookup

Transcript
CHAPTER 4
DISTRIBUTION OF CARBON, SULPHUR, NITROGEN,
OXYGEN AND HYDROGEN CONTENT ANALYSIS IN
PROTEIN SEQUENCES OF DIFFERENT SPECIES
4.1 INDRODUCTION
Proteins are large organic compounds made of amino acids arranged in a
linear fashion. The side chains of these amino acids are chemically different from one
another in some respect that can be classified broadly into two ways i.e., hydrophobic
and hydrophilic. Atomic details in these side chains make the amino acid different.
The atoms include carbon, nitrogen, oxygen, sulphur and hydrogen. Living organisms
encounter various growth conditions in their living environment, raising the question
of whether ecological fluctuations could alter biological macromolecules.
Recently, significant correlations between atomic compositions and metabolic
functions were found in sulfur- and carbon-assimilatory enzymes, which appear
depleted in sulfur and carbon, respectively, in both the bacterium Escherichia coli
and the eukaryote Saccharomyces cerevisiae, thus providing new insights into the
molecular evolution of a protein atomic composition (Baudouin et al., 2001).
Besides, proteins that assimilate particular elements were found to avoid using amino
54
acids containing the element, which indicates that the metabolic constraints of amino
acids may influence the evolution of proteins. For instance, carbon and nitrogen
contents in amino acid side chains are negatively correlated with protein abundance.
An amino acid with a high number of carbon atoms in its side chain generally
requires relatively more energy for its synthesis and seems to be avoided because of
economy in building blocks or because of economy in energy and consequently,
highly abundant proteins preferentially use cheap (in terms of energy) amino acids.
However, the carbon content is still negatively correlated with protein abundance
after controlling for the energetic cost of the amino acids. But the negative correlation
between protein abundance and energetic cost disappeared after controlling for
carbon content indicating that building blocks seems to be more restricted than
energetic factors. Therefore, the amino acid sequences of highly abundant proteins
have to compromise between optimization for their biological functions and reducing
the consumption of limiting resources. Accordingly, low contents of carbon and
nitrogen in highly abundant proteins provide further evidence of a selection for the
economy of atomic composition. By contrast, the amino acid sequences of weakly
expressed proteins are more likely to be optimized for their biological functions (Li et
al., 2009).
Though there are discussions by the biologist on the proteins at residue level,
the small unit of these systems i.e. atom level study can give better results than one
would think at macroscopic level in comparing the proteins. There were only 5 atoms,
Carbon (C), Sulphur (S), Nitrogen (N), Oxygen (O) and Hydrogen (H) that constitute
the entire proteins. What is the probable number of C, S, N, O and H atoms in a given
length (number of amino acids)? Understanding of these atoms along the protein
sequences is must at this juncture of time. To study these atoms in proteins, the
researcher’s has set up a systematic analysis on how these basic elements distributed
in proteins.
55
4.2 METHODOLOGY
4.2.1 Explanation and analysis
The protein sequences were taken from the public NCBI database [cited 2008
Feb 17] as given in the table 4.1. The probability analyses were carried out. The idea
behind this task is very simple. That is visualize the molecule on actual basis i.e. atom
level. The basic units of proteins are Carbon (C), Sulphur (S), Nitrogen (N), Oxygen
(O) and Hydrogen (H). The arrangement of these atoms along the protein sequences
were carried out in a given length number of amino acids. This is achieved by
counting the total number of these atoms present in the given stretch and grouped.
The group has a highest number of same number of atom was taken as the probable
one. The stretch lengths studied were 10 to 55 amino acids.
The average number of carbon, sulphur, nitrogen, oxygen and hydrogen in the
given number of residues is computed using the programs written in ‘C’-language.
The principle behind this calculation is that proteins prefer to have 4.9% carbon,
0.3% sulphur, 1.35% Nitrogen, 1.45% oxygen and 8% hydrogen per unit respectively
(one amino acid will have about 16 atoms) as given in the table 4.2. The program
reads protein sequences and converts it into an array of atoms. For example a protein
having 100 amino acids will have about 1600 atoms in the array. The array of atoms
is then subjected to a carbon, sulphur, nitrogen, oxygen and hydrogen distribution
study. That is, taken e.g. a window length of 100 atoms, it will determine how many
carbon, sulphur, nitrogen, oxygen and hydrogen atoms were counted in it. This way
there were 1500 windows in it. These windows were then grouped based on the
number of carbon, sulphur, nitrogen, oxygen and hydrogen atoms in it. The grouped
numbers were then divided by the total number of windows to get unit values as
given in Figure 4.1. It is repeated for all sequences of the given organism. Similar
calculations on individual protein sequences were expected to give the same results as
described in the chapters 5 and 6.
56
Table 4.1 Total number of protein sequences taken for study in each species
S.No.
Organisms
Number of protein
sequences
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Homo sapiens (Hs)
Bos taurus (Bt)
Canis familiaris (Cf)
Danio rerio (Dr)
Arabidopsis thaliana (At)
Schizosaccharomyces_pombe (Sp)
Saccharomyces cerevisiae (Sc)
Kluyveromyces lactis (Kl)
Caenorhabditis elegans (Ce)
Apis mellifera (Am)
Gallus gallus (Gg)
Pan troglodytes (Pt)
Strongylocentrotus purpuratus (Spu)
Rattus norvegicus (Rn)
Mus musculus (Mm)
Plasmodium falciparum (Pf)
Drosophila melanogaster (Dm)
Anopheles_gambiae(Ag)
Tribolium_castaneum (Tc)
Influenza virus (Iv)
27960
35907
33651
29663
30480
5034
5844
5327
22717
6298
18031
21737
20989
22722
26650
5267
18941
13840
9766
35142
57
Read amino acid sequence and convert into atomic
sequence
Split the atomic sequence into length (10-55) of
windows of equal size
Identify number of carbon, hydrogen, nitrogen,
oxygen and sulfur in each length of window
Group the length of window based on number of
carbon, hydrogen, nitrogen, oxygen and sulfur
Divide the number of windows (frequency) in each
category of carbon, hydrogen, nitrogen, oxygen and
sulfur amount of (% of C, % of H, % of N, % of O
and % of S) by total number of windows to get unit
value
Repeat the above steps for all proteins
Plot % of C, % of H, % of N, % of O and % of S
Figure 4.1 The average number of carbon, hydrogen, nitrogen, oxygen and sulfur atom as
given length is computed as follows
58
4.3 RESULTS AND DISCUSSION
The nature of amino acids residues was studied in order to understand this
buried information in the protein sequences. The results showed that given any
number of lengths the average number of C, S, N, O and H atoms were 4.9, 0.3, 1.35,
1.45 and 8 per unit length respectively in table 4.2.
The variation of frequency and the number of carbon atoms were given
graphically (Figure 4.2). For different number of carbon atoms at given number of
length the carbon atom frequency first increases with increasing number length of
carbon atom.
The number of length of the carbon atom attains a maximum frequency at
some short probable carbon atom and then decrease again with further increase of
length of carbon atom. The value of probable carbon atom depends only on the
number of length of carbon atoms. The probable carbon atom decrease with
increasing number of length of carbon atoms. The frequency obtained was
independent of the nature number of length of carbon atom.
The variation of frequency and the number of hydrogen atom were as given
graphically (Figure 4.3). For different number of hydrogen atom as given number of
length the hydrogen atom frequency increases with increasing number length of
hydrogen atom.
The number of length of the hydrogen atom attains a maximum at some short
probable hydrogen atom and then decrease again with further increase of length of
hydrogen atom. The value of probable hydrogen atom depends only on the number of
length of hydrogen atoms. The probable hydrogen atom decrease with increasing
number of length of hydrogen atoms. The frequency obtained was independent of the
nature number of length of hydrogen atom.
The distributions of nitrogen and oxygen atoms curves have been drawn
accurately for several lengths of N and O atoms were given graphically (Figures 4.4-
59
4.5). It was seen that all the curves were familian maxiwellian velocity distribution
curves first rising to a maximum and then decreasing to zero at a well defined N and
O atom.
This upper limit of the N and O atom, which varies, with the different
frequency of N and O atom. A few N and O atom have high frequency but the
majority of them were grouped round the mean value 0.16 which were 21 was equal
to about 0.06 probable lengths of N and O atoms and the frequency 0.16 to 10 length
of N and O atoms.
Thus there was a change in N and O atoms of the probable frequency with
change in length of N and O atoms. Nitrogen and oxygen atoms were balanced amino
acid in the protein sequence: i.e., hydrophilicity was maintained; when N decreases,
the O increases accordingly. It was in the hydrophilic contribution.
The distribution of Sulphur (S) atom curves have been drawn accurately for
several length of atom was as given graphically (Figure 4.6). It was seen that all the
curves first rising to a maximum and then decreasing a well defined atom. This upper
limit of the atom, which varies, with the different frequency of atom. A few atoms
have high frequency but the majority of them were grouped round the mean value 1.
The carbon atom was chosen to study in this calculation, as it is the only atom
involved in hydrophobicity as given (Figure 4.7). Hydrogen also show similar
observation as given (Figure 4.8). Nitrogen and oxygen atoms were balanced in the
amino acid: i.e., hydrophilicity was maintained; when N decreases, the O increases
accordingly as given (Figure 4.9). It is in the hydrophilic contribution. The amount of
sulphur showed very small contribution into the calculation of hydrophobicity as
given (Figure 4.10).
As far as the carbon atom distribution was concerned the number of amino
acids was normal but the same varies from species to species (Figure 4.7). To
overcome this variation among the species atomic level calculation has been carried
out. This can give universal representation of proteins for comparison as shown
60
(Figure 4.7). The averages of carbon in all the protein sequences were calculated.
S.cerevisiae, K.lactis, S.pombe, T.cruzi and C.elegans were having the average of
carbon 4.95, 4.94, 4.94, 4.94 and 4.94 respectively. These species posses a higher
degree of orderness in the protein sequences. At the same time the animals such as
human, chimpanzee, cow, dog, Gallus gallus, Dr, At, Spu, Rn, Mm, Dm, Ag, and Iv
have the average of carbon 4.81, 4.75, 4.83, 4.83, 4.79, 4.84, 4.85, 4.84, 4.84, 4.84,
4.83, 4.86 and 4.83 respectively. The chimpanzee and Gallus gallus have lower
degree of order compared to human. Again these species posses a lesser degree of
order compared to S.cerevisiae, K.lactis, S.pombe, T.cruzi and C. elegans.
Interestingly the rat and mouse have the average of carbon 4.84 and 4.84. This clearly
shows that the alteration at protein length may be due to food habit and
environmental factors. Overall, during the evolution the length of the protein
increases in all species. It is phenomenal in heterosexuals because of mixing of DNA
takes place during reproduction. Apart from the heterosexual reproduction, the food
habit also contributes towards this alteration i.e., increase in length of amino acid in
protein sequences. The P. falciparum has the average of carbon 5.20 in its overall
structure and the functional sites contain more amount of carbon.
This clearly indicates that the higher value was comparable to the lower
percentage in various stages of malaria parasite life-cycle in the human host through
mosquito vector (Ag). The protein is highly stable at that number of atoms per
residue. Specifically, individual amino acids and whole proteins can vary greatly in
their content of carbon. Since, carbon is the only element that contributes towards the
hydrophobic interaction, a dominant force, which helps in maintaining the protein
stability.
61
Table 4.2 Average percentage of C, S, N, O and H atoms in any peptide unit with a
given number of amino acid residues and provided that one amino acid will have
about 16 atoms
atom for a given length of amino acid
C
S
N
O
H
% average of atoms for a given length of
amino acid
(one amino acid will have about 16
atoms)
4.9
0.3
1.35
1.45
8
62
Figure 4.2 Showing in H.sapiens protein length or number of carbon atom
Figure 4.3 Showing in H.sapiens protein length or number of hydrogen atom
63
Figure 4.4 Showing in H.sapiens protein in length or number of nitrogen atom
Figure 4.5 Showing in H.sapiens protein length or number of oxygen atom
64
Figure 4.6 Showing in H.sapiens protein length or number of sulphur atom
Figure 4.7 Showing the average of carbon content distribution of all species
65
Figure 4.8 Showing the average of hydrogen content distribution of all species
Figure 4.9 Showing the average of nitrogen and oxygen content distribution of all species
66
Figure 4.10 Showing the average of sulphur content distribution of all species
67
4.4 CONCLUSION
The carbon atom was chosen to study in this calculation, as it is the only atom
involved in hydrophobicity. Hydrogen was similar observation at 50%. Nitrogen and
oxygen atoms were balanced in the amino acid: i.e., hydrophilicity is maintained;
when N decreases, the O increases accordingly. It is in the hydrophilic contribution.
The amount of sulphur is very small into the calculation of hydrophobicity. The
protein is highly stable at that number of atoms per residue. Specifically, individual
amino acids and whole proteins can vary greatly in their content of carbon. Since,
carbon is the only element that contributes towards the hydrophobic interaction, a
dominant force, which helps in maintaining the protein stability.
There was a decrease in carbon (C) atoms per amino acid in human compared
to all species except Gg and Pt. Also the oxygen (O) and nitrogen (N) atoms
proportionally decreased. Sulphur (S) atom has increased though its amount far less
compared to other atoms. H atom does not show any significant change but
broadening of the distribution curve. All these observations clearly imply that the
proteins in any species must have definite ratios of these atoms to maintain stability
(hydrotropism) and functional. It was observed that properties reflecting
hydrophobicity strongly correlated with stability of buried mutations and there was a
direct relationship between the property values and the number of carbon atoms.
These carbons added to proteins by largely due to large hydrophobic residues LHRs
such as F, I, L, M and V. This reduction in number of atoms per residue in H.sapiens
was agreed with the earlier observations made in chapter 2 and 3 that the length of the
H.sapiens proteins increases.
68