* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CHAPTER 4 DISTRIBUTION OF CARBON, SULPHUR, NITROGEN
Survey
Document related concepts
Ancestral sequence reconstruction wikipedia , lookup
Carbon sink wikipedia , lookup
Gaseous signaling molecules wikipedia , lookup
Point mutation wikipedia , lookup
Western blot wikipedia , lookup
Microbial metabolism wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Biosequestration wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Genetic code wikipedia , lookup
Biosynthesis wikipedia , lookup
Isotopic labeling wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein structure prediction wikipedia , lookup
Proteolysis wikipedia , lookup
Transcript
CHAPTER 4 DISTRIBUTION OF CARBON, SULPHUR, NITROGEN, OXYGEN AND HYDROGEN CONTENT ANALYSIS IN PROTEIN SEQUENCES OF DIFFERENT SPECIES 4.1 INDRODUCTION Proteins are large organic compounds made of amino acids arranged in a linear fashion. The side chains of these amino acids are chemically different from one another in some respect that can be classified broadly into two ways i.e., hydrophobic and hydrophilic. Atomic details in these side chains make the amino acid different. The atoms include carbon, nitrogen, oxygen, sulphur and hydrogen. Living organisms encounter various growth conditions in their living environment, raising the question of whether ecological fluctuations could alter biological macromolecules. Recently, significant correlations between atomic compositions and metabolic functions were found in sulfur- and carbon-assimilatory enzymes, which appear depleted in sulfur and carbon, respectively, in both the bacterium Escherichia coli and the eukaryote Saccharomyces cerevisiae, thus providing new insights into the molecular evolution of a protein atomic composition (Baudouin et al., 2001). Besides, proteins that assimilate particular elements were found to avoid using amino 54 acids containing the element, which indicates that the metabolic constraints of amino acids may influence the evolution of proteins. For instance, carbon and nitrogen contents in amino acid side chains are negatively correlated with protein abundance. An amino acid with a high number of carbon atoms in its side chain generally requires relatively more energy for its synthesis and seems to be avoided because of economy in building blocks or because of economy in energy and consequently, highly abundant proteins preferentially use cheap (in terms of energy) amino acids. However, the carbon content is still negatively correlated with protein abundance after controlling for the energetic cost of the amino acids. But the negative correlation between protein abundance and energetic cost disappeared after controlling for carbon content indicating that building blocks seems to be more restricted than energetic factors. Therefore, the amino acid sequences of highly abundant proteins have to compromise between optimization for their biological functions and reducing the consumption of limiting resources. Accordingly, low contents of carbon and nitrogen in highly abundant proteins provide further evidence of a selection for the economy of atomic composition. By contrast, the amino acid sequences of weakly expressed proteins are more likely to be optimized for their biological functions (Li et al., 2009). Though there are discussions by the biologist on the proteins at residue level, the small unit of these systems i.e. atom level study can give better results than one would think at macroscopic level in comparing the proteins. There were only 5 atoms, Carbon (C), Sulphur (S), Nitrogen (N), Oxygen (O) and Hydrogen (H) that constitute the entire proteins. What is the probable number of C, S, N, O and H atoms in a given length (number of amino acids)? Understanding of these atoms along the protein sequences is must at this juncture of time. To study these atoms in proteins, the researcher’s has set up a systematic analysis on how these basic elements distributed in proteins. 55 4.2 METHODOLOGY 4.2.1 Explanation and analysis The protein sequences were taken from the public NCBI database [cited 2008 Feb 17] as given in the table 4.1. The probability analyses were carried out. The idea behind this task is very simple. That is visualize the molecule on actual basis i.e. atom level. The basic units of proteins are Carbon (C), Sulphur (S), Nitrogen (N), Oxygen (O) and Hydrogen (H). The arrangement of these atoms along the protein sequences were carried out in a given length number of amino acids. This is achieved by counting the total number of these atoms present in the given stretch and grouped. The group has a highest number of same number of atom was taken as the probable one. The stretch lengths studied were 10 to 55 amino acids. The average number of carbon, sulphur, nitrogen, oxygen and hydrogen in the given number of residues is computed using the programs written in ‘C’-language. The principle behind this calculation is that proteins prefer to have 4.9% carbon, 0.3% sulphur, 1.35% Nitrogen, 1.45% oxygen and 8% hydrogen per unit respectively (one amino acid will have about 16 atoms) as given in the table 4.2. The program reads protein sequences and converts it into an array of atoms. For example a protein having 100 amino acids will have about 1600 atoms in the array. The array of atoms is then subjected to a carbon, sulphur, nitrogen, oxygen and hydrogen distribution study. That is, taken e.g. a window length of 100 atoms, it will determine how many carbon, sulphur, nitrogen, oxygen and hydrogen atoms were counted in it. This way there were 1500 windows in it. These windows were then grouped based on the number of carbon, sulphur, nitrogen, oxygen and hydrogen atoms in it. The grouped numbers were then divided by the total number of windows to get unit values as given in Figure 4.1. It is repeated for all sequences of the given organism. Similar calculations on individual protein sequences were expected to give the same results as described in the chapters 5 and 6. 56 Table 4.1 Total number of protein sequences taken for study in each species S.No. Organisms Number of protein sequences 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Homo sapiens (Hs) Bos taurus (Bt) Canis familiaris (Cf) Danio rerio (Dr) Arabidopsis thaliana (At) Schizosaccharomyces_pombe (Sp) Saccharomyces cerevisiae (Sc) Kluyveromyces lactis (Kl) Caenorhabditis elegans (Ce) Apis mellifera (Am) Gallus gallus (Gg) Pan troglodytes (Pt) Strongylocentrotus purpuratus (Spu) Rattus norvegicus (Rn) Mus musculus (Mm) Plasmodium falciparum (Pf) Drosophila melanogaster (Dm) Anopheles_gambiae(Ag) Tribolium_castaneum (Tc) Influenza virus (Iv) 27960 35907 33651 29663 30480 5034 5844 5327 22717 6298 18031 21737 20989 22722 26650 5267 18941 13840 9766 35142 57 Read amino acid sequence and convert into atomic sequence Split the atomic sequence into length (10-55) of windows of equal size Identify number of carbon, hydrogen, nitrogen, oxygen and sulfur in each length of window Group the length of window based on number of carbon, hydrogen, nitrogen, oxygen and sulfur Divide the number of windows (frequency) in each category of carbon, hydrogen, nitrogen, oxygen and sulfur amount of (% of C, % of H, % of N, % of O and % of S) by total number of windows to get unit value Repeat the above steps for all proteins Plot % of C, % of H, % of N, % of O and % of S Figure 4.1 The average number of carbon, hydrogen, nitrogen, oxygen and sulfur atom as given length is computed as follows 58 4.3 RESULTS AND DISCUSSION The nature of amino acids residues was studied in order to understand this buried information in the protein sequences. The results showed that given any number of lengths the average number of C, S, N, O and H atoms were 4.9, 0.3, 1.35, 1.45 and 8 per unit length respectively in table 4.2. The variation of frequency and the number of carbon atoms were given graphically (Figure 4.2). For different number of carbon atoms at given number of length the carbon atom frequency first increases with increasing number length of carbon atom. The number of length of the carbon atom attains a maximum frequency at some short probable carbon atom and then decrease again with further increase of length of carbon atom. The value of probable carbon atom depends only on the number of length of carbon atoms. The probable carbon atom decrease with increasing number of length of carbon atoms. The frequency obtained was independent of the nature number of length of carbon atom. The variation of frequency and the number of hydrogen atom were as given graphically (Figure 4.3). For different number of hydrogen atom as given number of length the hydrogen atom frequency increases with increasing number length of hydrogen atom. The number of length of the hydrogen atom attains a maximum at some short probable hydrogen atom and then decrease again with further increase of length of hydrogen atom. The value of probable hydrogen atom depends only on the number of length of hydrogen atoms. The probable hydrogen atom decrease with increasing number of length of hydrogen atoms. The frequency obtained was independent of the nature number of length of hydrogen atom. The distributions of nitrogen and oxygen atoms curves have been drawn accurately for several lengths of N and O atoms were given graphically (Figures 4.4- 59 4.5). It was seen that all the curves were familian maxiwellian velocity distribution curves first rising to a maximum and then decreasing to zero at a well defined N and O atom. This upper limit of the N and O atom, which varies, with the different frequency of N and O atom. A few N and O atom have high frequency but the majority of them were grouped round the mean value 0.16 which were 21 was equal to about 0.06 probable lengths of N and O atoms and the frequency 0.16 to 10 length of N and O atoms. Thus there was a change in N and O atoms of the probable frequency with change in length of N and O atoms. Nitrogen and oxygen atoms were balanced amino acid in the protein sequence: i.e., hydrophilicity was maintained; when N decreases, the O increases accordingly. It was in the hydrophilic contribution. The distribution of Sulphur (S) atom curves have been drawn accurately for several length of atom was as given graphically (Figure 4.6). It was seen that all the curves first rising to a maximum and then decreasing a well defined atom. This upper limit of the atom, which varies, with the different frequency of atom. A few atoms have high frequency but the majority of them were grouped round the mean value 1. The carbon atom was chosen to study in this calculation, as it is the only atom involved in hydrophobicity as given (Figure 4.7). Hydrogen also show similar observation as given (Figure 4.8). Nitrogen and oxygen atoms were balanced in the amino acid: i.e., hydrophilicity was maintained; when N decreases, the O increases accordingly as given (Figure 4.9). It is in the hydrophilic contribution. The amount of sulphur showed very small contribution into the calculation of hydrophobicity as given (Figure 4.10). As far as the carbon atom distribution was concerned the number of amino acids was normal but the same varies from species to species (Figure 4.7). To overcome this variation among the species atomic level calculation has been carried out. This can give universal representation of proteins for comparison as shown 60 (Figure 4.7). The averages of carbon in all the protein sequences were calculated. S.cerevisiae, K.lactis, S.pombe, T.cruzi and C.elegans were having the average of carbon 4.95, 4.94, 4.94, 4.94 and 4.94 respectively. These species posses a higher degree of orderness in the protein sequences. At the same time the animals such as human, chimpanzee, cow, dog, Gallus gallus, Dr, At, Spu, Rn, Mm, Dm, Ag, and Iv have the average of carbon 4.81, 4.75, 4.83, 4.83, 4.79, 4.84, 4.85, 4.84, 4.84, 4.84, 4.83, 4.86 and 4.83 respectively. The chimpanzee and Gallus gallus have lower degree of order compared to human. Again these species posses a lesser degree of order compared to S.cerevisiae, K.lactis, S.pombe, T.cruzi and C. elegans. Interestingly the rat and mouse have the average of carbon 4.84 and 4.84. This clearly shows that the alteration at protein length may be due to food habit and environmental factors. Overall, during the evolution the length of the protein increases in all species. It is phenomenal in heterosexuals because of mixing of DNA takes place during reproduction. Apart from the heterosexual reproduction, the food habit also contributes towards this alteration i.e., increase in length of amino acid in protein sequences. The P. falciparum has the average of carbon 5.20 in its overall structure and the functional sites contain more amount of carbon. This clearly indicates that the higher value was comparable to the lower percentage in various stages of malaria parasite life-cycle in the human host through mosquito vector (Ag). The protein is highly stable at that number of atoms per residue. Specifically, individual amino acids and whole proteins can vary greatly in their content of carbon. Since, carbon is the only element that contributes towards the hydrophobic interaction, a dominant force, which helps in maintaining the protein stability. 61 Table 4.2 Average percentage of C, S, N, O and H atoms in any peptide unit with a given number of amino acid residues and provided that one amino acid will have about 16 atoms atom for a given length of amino acid C S N O H % average of atoms for a given length of amino acid (one amino acid will have about 16 atoms) 4.9 0.3 1.35 1.45 8 62 Figure 4.2 Showing in H.sapiens protein length or number of carbon atom Figure 4.3 Showing in H.sapiens protein length or number of hydrogen atom 63 Figure 4.4 Showing in H.sapiens protein in length or number of nitrogen atom Figure 4.5 Showing in H.sapiens protein length or number of oxygen atom 64 Figure 4.6 Showing in H.sapiens protein length or number of sulphur atom Figure 4.7 Showing the average of carbon content distribution of all species 65 Figure 4.8 Showing the average of hydrogen content distribution of all species Figure 4.9 Showing the average of nitrogen and oxygen content distribution of all species 66 Figure 4.10 Showing the average of sulphur content distribution of all species 67 4.4 CONCLUSION The carbon atom was chosen to study in this calculation, as it is the only atom involved in hydrophobicity. Hydrogen was similar observation at 50%. Nitrogen and oxygen atoms were balanced in the amino acid: i.e., hydrophilicity is maintained; when N decreases, the O increases accordingly. It is in the hydrophilic contribution. The amount of sulphur is very small into the calculation of hydrophobicity. The protein is highly stable at that number of atoms per residue. Specifically, individual amino acids and whole proteins can vary greatly in their content of carbon. Since, carbon is the only element that contributes towards the hydrophobic interaction, a dominant force, which helps in maintaining the protein stability. There was a decrease in carbon (C) atoms per amino acid in human compared to all species except Gg and Pt. Also the oxygen (O) and nitrogen (N) atoms proportionally decreased. Sulphur (S) atom has increased though its amount far less compared to other atoms. H atom does not show any significant change but broadening of the distribution curve. All these observations clearly imply that the proteins in any species must have definite ratios of these atoms to maintain stability (hydrotropism) and functional. It was observed that properties reflecting hydrophobicity strongly correlated with stability of buried mutations and there was a direct relationship between the property values and the number of carbon atoms. These carbons added to proteins by largely due to large hydrophobic residues LHRs such as F, I, L, M and V. This reduction in number of atoms per residue in H.sapiens was agreed with the earlier observations made in chapter 2 and 3 that the length of the H.sapiens proteins increases. 68