Comparative Genomics: Overview Shrish Tiwari CCMB, Hyderabad Introduction • Sequences of 340 species available (274 bacterial, 25 archaeal and 41 eukaryotic) • An additional 848 prokaryotic and 560 eukaryotic genome projects are ongoing • Comparison of genomes can provide insights into the functional regions as well as genome dynamics Sequence Comparison • Let us look at a simple example A A T T G A - A T C G C C A A – A T C A C A G – G A T C 5 matches, 6 mismatches, 3 indels A A T T G A – A T C G C - C A A A T – C A C A – G G A T C – 7 matches, 3 mismatches, 5 indels Sequence Comparison • Requirements for sequence comparison: – A scoring scheme or scoring matrix – A search algorithm to identify the optimal alignment • Scoring matrices available: PAM, BLOSUM • Search algorithm used: Dynamic programming Applications • Tracing our origins and history • Assessing the diversity of a species • Finding virulence genes • Designing primers for novel species • Identifying disease-causing mutations • Predicting mutations in viral genome and design vaccines Comparative Genomics • Of distantly related species: look for similarities/conserved regions to infer functional regions of the genome; example mouse and man • Of closely related species: look for differences, identify subtle mutations that make one species different from the other, understand how genomes evolve; examples chimp and man, virulent E. coli and benign E. coli Comparative Genomics • Comparison of the 73Kbp region of human β-globin with mouse and chimp genome shows 1) small stretches covering the first two exons and intervening intron matching at ~73% identity between human and mouse, 2) almost the complete 73Kbp region matches at ~97% for human and chimp How different are we? • Physical similarity is striking How different are we? • Socially, we have similar behaviour, including cooperation, warfare, politics and even bribery Ape the toolmaker Chimp Genome: Statistics • Sequence of a single male captive-born chimpanzee from West Africa subspecies Pan troglodytes verus, obtained using a whole genome shotgun approach • Assembly of the genome was done with PCAP and ARACHNE programs • PCAP is a de novo assembly method; ARACHNE uses the human genome build 34 to facilitate and confirm contig linking and has more continuity Chimp Genome: Statistics • 3.6 fold redundancy of autosomes and 1.8 fold for sex chromosomes; covers 94% of chimp genome with >98% of the sequence in high quality bases (quality score >40, error rate <10-4) • 50% of the sequence (N50) in contigs of length >15.7Kbp and supercontigs of length >8.6Mbp Chimp Genome Sequence • Chimp genomes are polymorphic within and between subspecies • 1.66 million high-quality SNPs identified, of which 1.01 million are heterozygous in the primary donor • Diversity rates among West African chimps is 8x10-4 (roughly the same as human diversity) and 17.6x10-4 among Central African chimps Genome Comparison • Genome comparisons can help to reveal the molecular basis of these traits as well evolutionary mechanisms that have moulded our species • Reciprocal nucleotide-level alignment of the chimp and human genome covers ~2.4Gbp of high quality sequence Genome Comparison • Observed difference nearly always a single event in time and not multiple independent changes over time • Most differences reflect random drift and hold extensive information about mutational processes • A minority of functionally important changes underlie our phenotypic differences Segmental Duplication • Has had a larger impact (~2.7%) in altering the genomic landscape than single nucleotide substitutions (~1.2%) • They are responsible for the emergence of new genes and adaptation of humans to their environment • Human genome particularly enriched in genes resulting from recent duplications Segmental Duplication • 33% of human duplications (>94% identity) are not duplicated in chimpanzee • An estimated duplication rate of 45Mbp per million years • These have resulted in differences in gene expression, disease-causing duplications and change in the genomic landscape in general Segmental Duplication • Chimp only duplications: 11 out 17 were found only in chimp and not in man or other great apes in a crossspecies comparison, whereas 6 were found also in gorilla • De novo duplications followed by deletion of older duplications are the most likely scenarios for excess of segmental duplications observed in human-ape genomes Gene Evolution • 13,454 pairs of human and chimp genes with unambiguous 1:1 orthology were used • Rate of evolution of a gene assessed using the non-synonymous substitution rate KA Gene Evolution • The background rate is estimated as the synonymous substitution rate Ks • KA/Ks is a measure of evolutionary constraint on a gene • KA/Ks > 1 implies adaptive or positive selection, under the assumption that synonymous changes are neutral Gene Evolution • KA/Ks = 0.23 for human-chimpanzee lineage 77% of amino acid substitutions are removed by natural selection • CpG and non-CpG substitution at synonymous sites show lower duvergence, ~50% and ~30% lower respectively, than in introns, implying evolutionary constraint on synonymous substitutions Gene Evolution • 585 gene of the 13,454 human-chimp orthologues have KA/KI > 1 • Given the low divergence between human-chimp genome, KA/KI statistic has large variance • Simulations show that KA/KI > 1 would be expected to occur by chance in 263 cases, if purifying selection acts non-uniformly on genes Gene Evolution • The extreme outliers are: – glycophorin C, mediates P. falciparum invasion pathways in human erythrocytes – granulysin, mediates antimicrobial activity against intracellular pathogens – protamines & semenogelins involved in reproduction – Mas-related gene family involved in nociception Conclusions • Mean rate of single nucleotide changes 1.23%, <1.06% corresponding to fixed divergence • Regional variations same in hominid and murid genomes except at subtelomeric regions • 25% changes in CpG which are similar in both male and female germ lines • Indels fewer but account for 1.5% of euchromatic sequence being lineage specific Conclusions • SINEs have been more active in human while chimp has acquired two new retroviral elements • Orthologous proteins differ by 2 amino acids, with ~29% identical • Amino acid altering changes are more frequent in hominids compared to murids, but close to changes seen human polymorphisms • Substitution rate at silent sites lower than at intronic sites => purifying selections Is Y going extinct? • X and Y chromosomes have evolved from an autosomal pair in ancient mammal nearly 300 million years ago • Most of Y genes in the X-degenerate regions • X-degenerate region of Y does not recombine, which may lead to rapid gene loss • Rate of gene loss estimated at 5 genes every million years Is Y going extinct? • Assuming gene loss occurs randomly and that human and chimp separated nearly 6 million years ago, many chimp Y genes are expected to have no functional orthologues in human • Orthologues of all human X-degenerate genes and pseudogenes were searched • Chimpanzee orthologues of 16 genes and 11 pseudogenes were identified Is Y going extinct? • All the 11 chimp orthologues of the human pseudogenes were pseudogenes in the chimp as well, with majority of inactivating mutations shared • This indicates that none of the pseudogenes were lost between human and chimp in the last 6 million years • GenScan and BLAST analysis of the chimp X-degenerate Y transcripts revealed that none were chimp specific Is Y going extinct? • Divergence of X-degenerate exons was compared with those of introns for genes as well as pedudogenes • The divergence was found to be less in the exons than introns for genes, but same or more in pseudogenes • These results suggest that purifying selection has been more effective during human evolution than previously assumed J.F. Hughes et al. (2005) Nature 437, 101-104 Summary • While we can learn a lot from a comparison of the human-chimp genomes, they are too much alike to get meaningful answers to many questions, e.g. a DNA sequence found in humans but missing in chimps: was it added in humans or lost in chimps? Summary • A difference found could be significant or just a variant within one species • Sequences of other primates will be needed to establish the uniqueness of changes seen in human and chimps • Genomes of primates like the orangutan and rhesus macaque are expected soon Origin of Clothing • Humans infested with head and body lice • Head louse lives and feeds on the scalp • Body louse lives in clothing and feeds on body • Chimp louse used as outgroup Origin of Clothing • 2 sequences from mtDNA (ND4 and CYTB) and 2 from nuclear DNA (EF-1 and RPII) from 40 lice (26 head lice and 14 body lice) from 12 different geographic regions were used for analysis along with one chimpanzee louse • Trees built using ND4 and CYTB nearly identical Origin of Clothing • Results: – Greater diversity seen in African lice than in non-African lice African origin for body lice – Body louse originated ~72000 years ago (assumption human and chimp lice diverged ~5.5 million years ago) – Demographic expansion of body lice correlates with the spread of modern humans out of Africa Origin of Clothing • Results indicate a recent origin of clothing ~72000 years R, Kittler, M. Kayser and M. Stoneking (2003) “Molecular evolution of Pediculus humanus and the origin of clothing” Current Biology 13, 1414-1417 Conclusions • Genomes of human and model organisms were sequenced in order to understand ourselves at the molecular level • Comparative genomics studies have revealed interesting features of genome evolution so far • This is just the tip of the iceberg!!