* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CHAPTER 5 CARBON CONTENT: LOW LARGE HYDROPHOBIC
G protein–coupled receptor wikipedia , lookup
Gene expression wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Magnesium transporter wikipedia , lookup
List of types of proteins wikipedia , lookup
Molecular evolution wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Bottromycin wikipedia , lookup
Protein moonlighting wikipedia , lookup
Peptide synthesis wikipedia , lookup
Protein folding wikipedia , lookup
Western blot wikipedia , lookup
Homology modeling wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Cell-penetrating peptide wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Point mutation wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Protein adsorption wikipedia , lookup
Genetic code wikipedia , lookup
Expanded genetic code wikipedia , lookup
CHAPTER 5 CARBON CONTENT: LOW LARGE HYDROPHOBIC RESIDUES (LHRs) IN AMINO ACID PATTERNS AND INSIGHTS INTO PROTEIN FOLDING AND FUNCTION 5.1 INTRODUCTION The genetic code plots the relationship between a triplet base sequence on RNA and an amino acid that corresponds to a protein associated with a required function in organisms. Proteins are considered as the major functional units in all living organisms. They evolve from mRNAs with post-translational modification to have a defined structure with specific functions (Hatfield and Roth, 2007; Angov, 2011; Ikehara and Niihara, 2007). All proteins are constructed from linear sequences of smaller molecules, scilicet amino acids. Proteins also fold up to particular three dimensional structures which renders their specific biological functionality. The ultimate mechanism that cells use to ensure the quality of intracellular proteins is the selective destruction of misfolded or damaged polypeptide (Goldberg, 2003; Schrader et al., 2009; Gsponeret et al., 2008). However, a linear amino acid sequence of a given protein does not completely specify the three dimensional structure for most, if not all, proteins. In addition, a protein’s three dimensional structure is not fixed; individual atoms have a significant specific role that allows the proteins to move and flex in constrained ways which are required for them to function properly. A widely accepted principle is that protein evolution is mainly determined on limited criteria such as folding and activity. Knowledge of how proteomic amino acid composition has changed over time is important for constructing realistic models of protein evolution and increasing our understanding of molecular evolutionary protein functions (Ikehara and Niihara, 2007; Brooks et al., 2004). 69 In biological macromolecules, the lowest level of biological organization is that of atoms. However, individual amino acids, short peptides and longer proteins can vary greatly in their contents of specific individual atoms such as carbon (Bragg and Wagner, 2007; Bragg and Wagner, 2009; Li et al., 2009; White and Jacobs, 1993). Further carbon content and distribution along in protein sequences are vital for understanding the biology of individual sequences (Rajasekaran et al., 2011), As a result of applying this strategy in protein sequences It can prevent diseases caused by misfolded proteins as carbon is responsible for disorders in proteins (Rajasekaran et al., 2011). and involved in protein evolution (Rajasekaran et al., 2011). Keeping those facts mentioned in mind and considering that carbon is the main architectural component of proteins as well as the only atom that contributes towards the hydrophobic interactions in proteins, the purpose of the current research was to focus on the carbon content in peptide sequences. 5.2 METHODOLOGY For all available amino acid patterns in a given protein were searched and retrieved from NCBI [cited 2008 Feb 17] as given in the table 5.1B. The number of carbon atoms and total atoms in the given length were calculated using the program written in ‘C’ language. The program reads protein sequences in Fasta format and assigned atomic details to each amino acids as given in the table 5.2. The amino acid pattern of various protein sequences, varying in length from 7 to 16 amino acids, has been carried out. It was designed as amino acid pattern: minlen=8; maxlen=17; lengap=1; (L=minlen-lengap) amino acids at both (N-/C-) terminals of a protein (where L is the length of an amino acid pattern). Given a proportion of e.g. carbon atoms set fixed as 50. then the pattern obtained would be assigned as positive amino acid pattern, otherwise it would be assigned as negative amino acid pattern to generate the pattern corresponding to the terminal residues in a protein sequence, as reported previously(Kumar et al., 2008). They have added (L-1)/2 dummy residue “X” at both terminals of a protein (where L is the length of pattern). As an example, for a window size of 17 amino acids, they have added 8 "X" before/at the N-terminal and 8 "X" after/at the C-terminal, in order to create M patterns from a sequence of a length M (Kaur and Raghava et 70 al., 2003a, 2003b). Finally, it was obtained various stretches of sequences consisting of an amino acid pattern with a fixed number of carbon atoms set at 50 as given figure 5.1. The amino acid pattern analysis of various protein sequences, varying in length from 7 to 16 amino acids were listed in the table 5.1A. Table 5.1B: Identification of peptide sequences at NCBI using ‘BLAST’ analysis and the patterns described in Table 5.1A. Organism ID number No. of Amino acids Homo sapiens NP_073737.1 704 Pan troglodytes XP_001138269.1 1198 Saccharomyces cerevisiae NP_010795.1 1142 Saccharomyces cerevisiae NP_009385.2 1356 Kluyveromyces lactis CAD43214.1 219 Mus musculus NP_001153078.1 944 Schizosaccharomyces pombe NP_592787.1 591 Homo sapiens NP_001193858.1 420 Bos taurus NP_001193508.1 317 Homo sapiens AAY21823.1 387 71 Table 5.2 Average of C, S, N, O and H atoms in any peptide unit with a given number of aa residues and provided that one aa will have about 16 atoms atom for a given length of aa C S N O H % average of atoms for a given length of aa (one aa will have about 16 atoms) 4.9 0.3 1.35 1.46 8 72 Read amino acid in protein sequence and convert into atomic sequence Split the atomic sequence into residues windows of equal size Identify number of carbon in each residues of window Group the residues of window based on number of carbon Divide the number of windows in each category of carbon amount of (% of C) by total number of residues windows to get unit value Repeat the above steps for all protein sequence Plot % of C versus residues Figure 5.1 The persentage of carbon atom in the given length of amino acid is computed as follows 73 5.3 RESULTS AND DISCUSSION The purpose of this research was to focus on the carbon content in proteins because for carbon the largest variations in protein sequences have been observed. The idea behind this task was very simple: to visualize the protein molecule on its actual basis i.e. its atomic level. The basic units of proteins were carbon (C), sulphur (S), nitrogen (N), oxygen (O) and hydrogen (H). The arrangement of these atoms along the protein sequences were carried out. Then the question was raised: what was the average of C, S, N, O and H atoms in a given number of amino acid residues. The results shown that for any given number of residues the average of C, S, N, O and H atoms was 4.9, 0.3, 1.35, 1.45 and 8 per unit respectively (one amino acid will have about 16 atoms). In the present study it was observed that peptide patterns with the same number of carbons (e.g. # C = 50) can have different lengths of amino acids (e.g. # aa = 7-16); the increased number of amino acids goes along with an increased number of total atoms in these peptides. The amino acid pattern in proteins with different length (7-16) of protein sequences would have the same number of carbon but a different number of total amino acid residues. The present study also suggests that the amino acid patterns of length from (11-16), the LHRs (F, I, L, M and V) decreases. At the same time the SHRs (G, A, P, W and C) were found in the amino acid patterns with the addition of these residues, the length of amino acid patterns also increases. Few amino acid patterns have been listed in table 5.1A. Because of which the length of amino acid sequence getting increased and in turn protein length also increases. Although the proteins have different lengths but the same function for which it is intended for in various species, its atomic nature is undisturbed alignment. In biological macromolecules, the lowest level of biological organization is that of atoms. However, individual amino acids, short peptides and longer proteins can vary greatly in their contents of specific individual atoms such as carbon (White and Jacobs, 1993; Schwartz et al., 2001; Bragg and Wagner, 2007; Bragg and Wagner, 2009; Li et al., 2009). 74 Significance of carbon atom may be helpful to accurately predict in the amino acid pattern which are vital for understanding protein stability and its function as well as these pattern prefers to be folded one with a compact 3D structure. So that the problem of protein-protein & protein –DNA specific interactions, protein-small-molecule interaction and evolutionary understanding of protein sequence for sequence alignment can be solved. As a result of applying this strategy in protein sequences it can prevent diseases caused by misfolding sequence. Therefore, it is an efficient and flexible approach to analyze the function of all proteins for both molecular biologists and computational scientists. Thus, bioinformatics can play a major role towards this task. Previously, it has been shown that short amino acid residue patterns (e.g. pentapeptides) can be a useful tool for predicting sequence features such as secondary structure (Figureau et al., 2003). Furthermore, efforts at local structure prediction have been made with sequence segments of a length with nine amino acids, using profiles based on structurally aligned regions (Yang and Wang, 2003). This research work provides a potential new avenue to represent protein sequences at the atomic level, such as carbon, in order to predict amino acid patterns, peptide sequences, protein structures and protein functions which may help to understand the origin of diseases caused by misfolded proteins such as AD/PD. 5.3.1 The problem of protein folding Particular sequences of amino acids usually get folded into characteristic three dimensional structures. More than 50 years work have been carried out and declaimed to crack the code which governs protein folding. Computational scientists address the problems of this kind as ″protein folding problem″, and it remains one of the greatest challenges in structural biology. Although researchers have formulated some general rules which cannot be applied in practical cases, due to the limitations in the formulation, rough guesses needs to be done. Moreover, predicting the protein shape and its position of every atom by a molecule-based system has not been carried out in the past and prompted us to an extensive investigation on amino acid patterns in proteins. 75 Table 5.1A In peptides, amino acid patterns with the same number of carbon (here: 50) can have various amino acid (aa) lengths with a different number atoms and a different number of carbon average. Amino acid (aa) patterns # # amino Carbon acids / length of residues RFLRRRW RLHIKFKE FNGLEKLLR IQNPSMLLEP AQAQREAAAEY SGRYISAAPGAE EDSMGGTSGGLYS EPGEEGPTAGSVGG GMGGHGYGGAGDASS GAAGGCGVAGAGADGY 50 50 50 50 50 50 50 50 50 50 7 8 9 10 11 12 13 14 15 16 # totals atoms for the given aa pattern 158 159 161 163 163 162 164 165 162 163 # Carbon/amino acid/length of residues 7.14 6.25 5.55 5.00 4.54 4.16 3.84 3.57 3.33 3.12 Reduced LHRs (FILMV) in aa patterns with length from aa 11-16 decreased and increased SHRs (AGPWC) from aa 11-16 76 Many scientists believed that, if the structures of proteins were deciphered from their sequences, then it would be easy to understand and predict the functions of proteins – particularly also with respect to folding/misfolding and functioning/malfunctioning aspects. Later, this knowledge could help to improve the treatment of diseases pertaining to protein misfolding nature. The availability of complete protein sequences has given new dimensions of research, thereby studying the structures of all proteins from a single organism and comparing it with many different species and so on. The ultimate dream of structural biologists around the globe is to determine the protein folding from genetic sequences thereby analyzing not only the three-dimensional structure but also some aspects of the functions of all proteins which have been carried out here. 5.4 CONCLUSION The amino acid pattern in proteins with different length (7-16) of protein sequences would have the same number of carbon but a different number of total amino acid residues. The amino acid patterns of length from (11-16), the LHRs (F, I, L, M and V) decreases. At the same time the SHRs (G, A, P, W and C) were found in the amino acid patterns with the addition of these residues, the length of amino acid patterns also increases. Because of which the length of amino acid sequence getting increased and in turn protein length also increases. 77