* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Biochemistry wikipedia , lookup
Gene regulatory network wikipedia , lookup
Community fingerprinting wikipedia , lookup
Western blot wikipedia , lookup
Gene expression wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Genetic code wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Proteolysis wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Molecular ecology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein structure prediction wikipedia , lookup
Point mutation wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Structural alignment wikipedia , lookup
History of molecular evolution wikipedia , lookup
EVOLUTION OF GLOBINS Evolution of Globins Evolution of visual pigments and related molecules Evolution of gene clusters • Many genes occur as multigene families (e.g., actin, tubulin, globins, Hox) – Inference is that they evolved from a common ancestor – Families can be • clustered - nearby on chromosomes (αglobins, HoxA) • Dispersed – on various chromosomes (actin, tubulin) • Both – related clusters on different chromosomes (α,β-globins, HoxA,B,C,D) – Members of clusters may show stage or tissue-specific expression • Implies means for coregulation as well as individual regulation Evolution of gene clusters • multigene families (contd) – Gene number tends to increase with evolutionary complexity • Globin genes increase in number from primitive fish to humans – Clusters evolve by duplication and divergence • History of gene families can be traced by comparing sequences – Molecular clock model holds that rate of change within a group is relatively constant • Not totally accurate – check rat genome sequence paper – Distance between related sequences combined with clock leads to inference about when duplication took place Classic phylogenetic studies of sequence conservation: the globins The globins are the best studied family in terms of sequence conservation, partly because they were one of the first families for which multiple members were sequenced, and partly because some of the earliest protein structures (in fact, the earliest) solved were globins. The classic papers of Perutz, Kendrew and Watson were the first to correlate sequence conservation with aspects of protein structure and function. They drew their conclusion based on only a few aligned sequences. Later globin studies, such as that of Bashford, Chothia and Lesk, expanded the analyses of globin sequence conservation to include hundreds of sequences. Perutz, Kendrew & Watson J Mol Biol 13, 669 (1965) Bashford, Chothia & Lesk J Mol Biol 196, 199 (1987) Scapharca inaequivalvis oxygenated hemoglobin Conservation of functional residues There were only 2 perfectly conserved residues among the 8 known globin structures at the time of the Bashford et al study. These are residues critical in binding of heme and/or interaction w/heme-bound oxygen. It will often be found that the best conserved residues in related Phe 43 proteins are those involved in heme critical aspects of the general function. His 87 Residues involved in more specific aspects of function may or may not be conserved, depending upon the relationship between the proteins under consideration. For example, residues involved in substrate specificity for serine proteases may be conserved among orthologs, such as the chymotrypsins, but not between paralogs, such as chymotrypsins and trypsins. Conservation at buried positions • core residues, which are usually hydrophobic, often tolerate conservative substitutions, i.e. to other hydrophobics • overall core volume is well-conserved (Lim & Ptitsyn, 1970) though individual core positions tolerate variation in volume • this reflects what we know about packing and the effects of core mutations on stability--thus sequence conservation is partly related to maintaining a stable structure portion of alignment of prokaryotic and eukaryotic globins Y140 yellow = small neutral/polar green = hydrophobic red/pink = polar/acidic blue = basic buried H156 human hemoglobin beta chain Conservation at solvent-exposed positions • solvent-exposed (surface) positions are mutable and usually tolerate mutation to many residue types including hydrophobics. Bashford et al., however, noted that for globins at least, some surface positions do not tolerate large hydrophobics. Since polar-to-hydrophobic mutations on protein surfaces do not reduce stability, this conservation could reflect constraints on solubility. Indeed, it is clear that the overall polar character of the surface is conserved for soluble, globular proteins, even though a certain number of hydrophobics may be tolerated. Y140 yellow = small neutral/polar green = hydrophobic red/pink = polar/acidic blue = basic examples of surface residues H156 human hemoglobin beta chain Conservation of loops and turns • “Spacer” regions between secondary structures, such as loops and turns, are often hypermutable and vary not only in sequence but in length, tolerating insertion and deletion events (Insertions and deletions are much less often found within secondary structure elements. Why?) part of alignment of animal hemoglobin a and b chains human a chain Are the a and b chains related to each other by paralogy or orthology? Sequence identity and homology: poor coverage the two proteins have the same fold,both bind heme and oxygen in same place: good independent structural/functional evidence for homology... Yet alignments of their sequences reveal only 24% identity. There are also many examples of related globins and other proteins with much lower identity than this. 1MBO and 1HBB hemoglobin and myoglobin Any reasonable sequence identity criterion, whether it is a flat percent cutoff or a length-dependent cutoff, will give incomplete coverage--in other words, it will fail to identify many distant but true relationships. Evolutionary analysis: one step into the a priori prediction Synonymous Consensus Seq1 Seq2 Seq3 Consensus: AAT GGC TCT TTT GAA AAA ... N Seq4 Seq5 Seq6 G F F N K . Seq2: AAC GGA TGT TTC GAG AAA... N G C F E K . Seq7 Seq8 Seq9 Seq10 Seq11 Non-synonymous Neutrally fixed Number of individuals Positive selection Purifying selection E Number of mutations AAT GGC TGT TTT GAA AAA ... N G C F N K . Neutral evolution vs selection Non-synonymous nucleotide substitution changes Amino acid replacements Protein function or structure Neutral Theory of molecular evolution Purifying selection Amino acid changes Neutrality Positive selection Biological fitness (W) Measuring the strength of selection Non synonymous(d N ) Synonymous(d S ) N dN n S dS s =1 <1 >1 Neutrality Purifying selection Positive selection Two ways of testing the functional importance of peptide regions Experimental (Functional Biologists) Predictive (Evolutionary Biologists) Serial deletions and random directed Evolutionary and structural analysis mutagenesis Consensus: AAT GGC TCT TTT GAA AAA ... N G F F N K . Seq2: AAC GGA TGT TTC GAG AAA... N G C F E K . Methods to detect adaptive evolution using DNA divergence data A B Maximum-likelihood models Multiple alignment Kimura-based models Sq1: ...ATGGGCGTC... Sq2: ...ATGGACGTA... A1 Sq3: ...ATGGGAGAG... B1 Sq4: ...ATGAGCGTC... Models to detect adaptive evolution at single codon sites Parsimony method to detect Selection at single sites Tree A2 b Models to detect adaptive evolution at specific lineages of the tree Sq3 6 1 a 2 4 Sliding-window based Methods B3 A4 Tree b B2 Sq4 A3 5 Sq1 Sq2 a Sq1 Sq2 Sq4 Sq3 Sq1: ...ATGGGCGTC... Sq2: ...ATGGACGTA... 5 b Sq3: ...ATGGGAGAG... Sq4: ...ATGAGCGTC... Tree 6 1 a 2 Sq1 ...ATGGGCGTC... ...ATGGACGTA... Sq2 Sq4 ...ATGGGAGAG... Sq3 ...ATGAGCGTC... Different levels of protein’s function and evolution Intra-molecular co-evolution Inter-protein/gene co-evolution Tully and Fares (2006) Evol. Bioinf. Co-evolution/interaction between two different biological systems Covariation analysis Substitution patterns at different positions in a sequence alignment are not necessarily independent. This is sometimes referred to as covariation or correlated evolution. name A B C D sequence YADLGRIKS YSDLGSEKE IDDFGEIAA IDDFGVIGT For example, in the mini multiple alignment shown at left, the identity of the residue at the 4th position is correlated to the identity of the residue at the 1st position. A statistical perturbation analysis can be used to characterize this covariation. An alignment of related sequences is “perturbed” by only considering sequences at which, for example, the first position is Y. The effect of this perturbation on the residue distribution observed at other positions is then measured. If the distribution changes significantly, covariation between sequence changes at the first site and other sites in the alignment is inferred. Covariation and hydrophobic core packing The hydrophobic core residues in related proteins tend to be covariant due to constraints on core packing. One sees compensatory volume changes at different positions. Davidson and coworkers found that for 266 aligned SH3 domain sequences, the strongest covariation was observed for a cluster of central hydrophobic residues. For example, substitution of a smaller residue (Ala->Gly) at 39 was strongly correlated to substitution of a larger residue (Ile->Phe) at 50. Hydrophobic core of SH3 domains, with most frequently covarying residues shown in yellow S.M. Larson, A.A. DiNardo and A.R. Davidson, J Mol Biol 303, 433 (2000) Some recent studies (Suel et al) have suggested a connection between covarying clusters of residues and transduction of signals between distant sites in proteins. For example, G-protein coupled receptors bind a ligand on one side of a membrane, and then transduce that signal to the other side through conformational change. Suel et al showed that the main clusters of covarying residues tended to connect the ligand and G-protein binding sites. ligand covarying networks (brown) membrane G-protein binding sites Suel et al. Nat Struct Biol 2003 A novel method to detect co-evolution in protein-coding genes (Fares and Travers, Genetics 2006) AAMWCGPCPNDEE AAMWCGPCPNDEE AAMWCGPCPNDEE AAMWCGPCPNDEE AAMWCGPCPNDEE AAMWCGPCPNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMDCGACANDEE CAMDCGACANDEE CAMDCGACANDEE CAMDCGACANDEE CAMDCGACANDEE CAMDCGACANDEE AAMMCGCCCNDEE AAMMCGCCCNDEE AAMMCGCCCNDEE AAMMCGCCCNDEE AAMMCGCCCNDEE AAMMCGCCCNDEE (q ek )ij Bek x 1 t ij (qek )ij T q A 1 (q ek )S 1 Bek x t ij T Testing the significance of the correlation coefficient S 1 1 (R), 1000 i 1 " 0.95 Z i P( i > 0.95) s ( ) AB S 1 ] [ Dˆ ek (q ek )ij q B ] 2 AAMWCGPCPNDEE AAMWCGPCPNDEE AAMWCGPCPNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMDCGACANDEE CAMDCGACANDEE CAMDCGACANDEE CAMDCGACANDEE AAMMCGCCCNDEE AAMMCGCCCNDEE AAMMCGCCCNDEE AAMMCGCCCNDEE [ 1 T DB (q ek )S q B T S 1 [(Dˆ ) S 1 ek S [(Dˆ ) T S 1 T [ 2 Dˆ ek (q ek )ij q A AAMWCGPCPNDEE T 1000 T q B 1 (q ek )S ek S ][( ) D A Dˆ ek DA S DB ] [(Dˆ ) 2 T S 1 ek S ] DB ] 2 1 T DA (q ek )S q A T S 1 ] 2 [ ] 2 Clade 1 > 75% Sequence alignment Clade 2 > 75% 3D Tree Molecular co-evolution analyses: CAPS (Fares and McNally, Bioinformatics 2006) Collate results from ‘re-sampling’ and ‘real’ data and sort by Calculate probabilities of R-values applying the step-down permutational correction i 1 P( 0.55) N Identify groups of co-evolving pairs with P > 0.95 Re-sampling 1 = 0.1 2 = 0.15 3 = 0.35 . . . i = 0.40 i+1 = 0.55 . . N-1 = 0.98 N = 0.99 Real 1 = 0.55 2 = 0.98 Flow of information in CAPS SENSITIVITY Comparative analysis of sensitivities 100 100 90 90 80 80 70 70 60 60 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 TRUE POSITIVES 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 100 90 80 70 60 50 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 100 100 90 90 90 80 80 80 70 70 70 60 60 60 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 0 0 0 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 DISTANCE 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 MICK Dependency CAPS lnLCorr Divergence CAPS Mean Sensitivity 100 90 80 70 60 50 40 30 20 10 0 MICK Dep. LnLCorr 0.1 CAPS MICK DEPENDENCY lnLCorr 0.5 1 Distance Mean Sensitivity 100 90 80 70 60 50 40 30 20 10 0 0.2 n. sequence CAPS MICK Dep. 10 20 30 Number of Sequences LnLCorr Three-dimensional spheres to detect proteinprotein interfaces Co-evolving amino acid sites Spheres of 4Å radius Highly conserved sites at overlapping areas Co-evolving Amino acids share properties of hydrophobicity and molecular weight Protein-protein interfaces could be predicted with greater accuracy