Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BB30055: Genes and genomes Genomes - Dr. MV Hejmadi ([email protected]) BB30055: Genomes - MVH 3 broad areas (A) Genomes (B) Applications genome projects (C) Genome evolution Why sequence the genome? 3 main reasons • description of sequence of every gene valuable. Includes regulatory regions which help in understanding not only the molecular activities of the cell but also ways in which they are controlled. • identify & characterise important inheritable disease genes or bacterial genes (for industrial use) • Role of intergenic sequences e.g. satellites, intronic regions etc History of Human Genome Project (HGP) 1953 – DNA structure (Watson & Crick) 1972 – Recombinant DNA (Paul Berg) 1977 – DNA sequencing (Maxam, Gilbert and Sanger) 1985 – PCR technology (Kary Mullis) 1986 – automated sequencing (Leroy Hood & Lloyd Smith 1988 – IHGSC established (NIH, DOE) Watson leads 1990 – IHGSC scaled up, BLAST published (Lipman+Myers) 1992 – Watson quits, Venter sets up TIGR 1993 – F Collins heads IHGSC, Sanger Centre (Sulston) 1995 – cDNA microarray 1998 – Celera genomics (J Craig Venter) 2001 – Working draft of human genome sequence published 2003 – Finished sequence announced Human Genome Project (HGP) Goal: Obtain the entire DNA sequence of human genome Players: (A) International Human Genome Sequence Consortium (IHGSC) - public funding, free access to all, started earlier - used mapping overlapping clones method (B) Celera Genomics – private funding, pay to view - started in 1998 - used whole genome shotgun strategy Whose genome is it anyway? (A) International Human Genome Sequence Consortium (IHGSC) - composite from several different people generated from 10-20 primary samples taken from numerous anonymous donors across racial and ethnic groups (B) Celera Genomics – 5 different donors (one of whom was J Craig Venter himself !!!) sequencing genomes Mapping phase Sequencing phase Strategies for sequencing the human genome IHGSC Celera Result…. ~30 - 40,000 protein-coding genes estimated based on known genes and predictions definite genes possible genes IHGSC 24,500 5000 Celera 26,383 12,000 Other genomes sequenced 1997 4,200 genes 2002 36,000 genes 1998 19,099 genes 2002 38,000 genes Sept 2003 18,473 human orthologs Science (26 Sep 2003)Vol301(5641)pp1854-1855 Genomics: World's smallest genome • the smallest genome known is the DNA of a 'nucleomorph' of Bigelowiella natans, a singlecelled algae of the group known as chlorarachniophytes. • 373,000 base pairs and a mere 331 genes • The nucleomorph is an evolutionary vestige that was originally the nucleus of a eukaryotic cell. The eukaryotic cell swallowed a cyanobacterium to acquire a photosynthetic 'plastid' organelle, and that cell was in turn engulfed by another cell to produce B. natans as we know it. Now, most of the nucleomorph's genome is concerned with its own maintenance, and just 17 of its genes still exert any control over the plastid. Its small size suggests it is heading for evolutionary oblivion. Proc. Natl Acad. Sci. USA 103, 9566–9571 (2006) by G McFadden, University of Melbourne, Australia Organisation of human genome Nuclear genome (3.2 Gbp) Mitochondrial genome 24 types of chromosomes Y- 51Mb and chr1 -279Mbp http://www.ncbi.nlm.nih.gov/Genomes/ General organisation of human genome Basic structure of a gene Fig. 21.11 Polypeptide-coding regions Gene organisation Rare bicistronic transcription units E.g. UBA52 transcription generates ubiquitin and a ribosomal protein S27a Non polypeptide–coding: RNA encoding Class of RNA Example types Ribosomal RNA 16,23,18,28S Function Ribosomal subunits Transfer RNA 22 mitochondrial 49 cytoplasmic mRNA binding Small nuclear U1,U2,U4,U5 etc RNA(snRNA) RNA splicing Small nucleolar U3,U8 etc RNA (snoRNA) rRNA modification and processing microRNA >200 types (miRNA) Regulatory RNA XIST RNA Inactivation of X chromosome Imprinting H19 RNA associated RNA Antisense RNA >1500 types Telomerase RNA Genomic imprinting Suppression of gene expression Telomere formation General organisation of human genome Pseudogenes () non functional copies of an active gene. May be either a) Nonprocessed pseudogenes May contain exons, introns & promoters but are inactive due to inappropriate termination codons Arise by gene duplication events usually in gene clusters (e.g. a and b–globin gene clusters) Pseudogenes in globin gene cluster Gene fragments or truncated genes Gene fragments: small segments of a gene (e.g. single exon from a multiexon gene) Truncated genes: Short components of functional genes (e.g. 5’ or 3’ end) Thought to arise due to unequal crossover or exchange b) processed pseudogenes Thought to arise by genomic insertion of a cDNA as a result of retrotransposition Contributes to overall repetitive elements (<1%) processed pseudogenes - General organisation of human genome Unique or low copy number sequences Non –coding, non repetitive and single copy sequences of no known function or significance General organisation of human genome Repetitive elements Main classes based on origin Tandem repeats Interspersed repeats Segmental duplications References 1) Chapter 9 pp 265-268 HMG 3 by Strachan and Read 2) Chapter 10: pp 339-348 Genetics from genes to genomes by Hartwell et al (2/e) 3) Nature (2001) 409: pp 879-891 http://www.ncbi.nlm.nih.gov/Genomes/