* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download comp - Imtech - Institute of Microbial Technology
Genomic imprinting wikipedia , lookup
Metagenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Point mutation wikipedia , lookup
Gene therapy wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Transposable element wikipedia , lookup
Genomic library wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Copy-number variation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene expression programming wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Protein moonlighting wikipedia , lookup
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Genome editing wikipedia , lookup
Microevolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Basics of Comparative Genomics Dr G. P. S. Raghava AIM: To understand Biology of Organisms Importance: More than 100 genomes sequenced, more than 250 in progress Definition: Comparison of set of proteins of one genome to another genome + comparision of gene location, gene order and gene regulation Application – Visualization of information on genome – Genome annotation (Prediction of gene, repeats, regulation region) – Evolutionary information (gene loss, duplication, horizontal gene transfer, ancestor) – Essential genes for cell survival – Classification of genes based on function Tools and Databases What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand the uniqueness between different species Why Comparative Genomics ? It tells us what are common and what are unique between different species at the genome level. Genome comparison may be the surest and most reliable way to identify genes and predict their functions and interactions. What is compared? Gene location Gene structure – – – – Exon number Exon lengths Intron lengths Sequence similarity Gene characteristics – Splice sites – Codon usage – Conserved synteny Few facts from genome comparision High degree of conservation of microbial proteins (~70% ancestral conserved region) Protein related with ENERGY process are generally found all genomes Proteins related to COMMUNICATION repersent repersent most distinctive function in each genome INFORMATION related protein have complex behaviour High frequence (~10%) non-orthologous gene displacement Few Terminologies Homology :- Homology is the relationship of any two characters ( such as two proteins that have similar sequences ) that have descended, usually through divergence, from a common ancestral character. Homologues are thus components or characters (such as genes/proteins with similar sequences) that can be attributed to a common ancestor of the two organisms during evolution. Homologoues can either be orthologues xenologues, paralogues or. Orthologues are homologues that have evolved from a common ancestral gene by speciation. They usually have similar functions. Paralogues are homologues that are related or produced by duplication within a genome followed by subsequent divergence. They often have different functions. Xenologues are homologous that are related by an interspecies (horizontal transfer) of the genetic material for one of the homologues. The functions of the xenologues are quite often similar. Analogues Analogues are non-homologues genes/proteins that have descended convergently from an unrelated ancestor. They have similar functions although they are unrelated in either sequence or structure. Frequently used terms Homology – Orthologous: Common ancestral gene. They usually have similar functions – Paralogous: duplication of gene within genome have usually different functions – Xenologous: That are related by an interspecies (horizontal gene transfer) of the genetic material, have similar function Analogous: Not evolve from same ancestor Similarity: sequence similarity Percent Identitity Visualising Genome Information Genome Annotation The Process of Adding Biology Information and Predictions to a Sequenced Genome Framework All-against-all Self-comparison How? – Making a database of the proteome – Use each protein as a query in a similarity search against the database (BLAST, WU-BLAST or FASTA) – Generate a matrix of alignment scores (P or E value) : A conservative cutoff E value : 10e-6 Why? – Number of Gene Families This comparison distinguishes unique proteins from proteins arisen from gene duplication, and also reveals the # of gene families. – Paralogs Significantly matched pairs of protein sequences may be paralogs. Between-Proteome Comparisons : Why? To identify orthologs, gene families, and domains Orthologs: (proteins that share a common ancestry & function) – A pair of proteins in two organisms that align along most of their lengths with a highly significant alignment score. – These proteins perform the core biological functions shared by the two organisms. – Two matched sequences (X in A, Y in B) may not be orthologs (Y and Z are paralogs in B, X and Z are orthologs) – Identify true orthologs (a) highest-scoring match (best hit) (b) E value < 0.01 (c) > 60% alignment over both proteins Between-Proteome Comparisons: How? 1. 2. 3. 4. 5. 6. 7. 8. 9. Choose a yeast protein and perform a database similarity search of the worm proteome (WU-BLAST): a yeast-versus-worm search Group the worm seqs that match the yeast query seq with a high P value (10-10 to 10-100), also include the yeast query seq in the group From the group made in 2, choose a worm seq and make a search of the yeast proteome, using the same P limit Add any matching yeast seq to the group made in 2 Repeat 3 & 4 for all initially matched seqs in the group Repeat 1-5 for every yeast protein As 1-6, perform a comparable worm-versus-yeast search Coalesce the groups of related seqs. and remove any redundancies so that every sequence is represented only once. Eliminate any matched pairs in which less than 80% of each seq is in the alignment Figure 1 Regions of the human and mouse homologous genes: Coding exons (white), noncoding exons (gray}, introns (dark gray), and intergenic regions (black). Corresponding strong (white) and weak (gray) alignment regions of GLASS are shown connected with arrows. Dark lines connecting the alignment regions denote very weak or no alignment. The predicted coding regions of ROSETTA in human, and the corresponding regins in mouse, are shown (white) between the genes and the alignment regions. Target Validation Target validation involves taking steps to prove that a DNA, RNA, or protein molecule is directly involved in a disease process and is therefore a suitable target for development of a new therapeutic compound. Genes that do not belong to an established family are critical to many disease processes and also need to be validated as potential drug targets.