Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genomics: The Technology behind the Human Genome Project Shu-Ping Lin, Ph.D. Institute of Biomedical Engineering E-mail: [email protected] Website: http://web.nchu.edu.tw/pweb/users/splin/ O.J. Simpson capital murder case,1/95-9/95 Odds of blood in Ford Bronco not being R. Goldman’s: 6.5 billion to 1 Odds of blood on socks in bedroom not being N. Brown-Simpson’s: 8.5 billion to 1 Odds of blood on glove not being from R. Goldman, N. Brown-Simpson, and O.J. Simpson: 21.5 billion to 1 Number of people on planet earth: 6.1 billion Odds of being struck by lightning in the U.S.: 2.8 million to 1 Odds of winning the Illinois Big Game lottery: 76 million to 1 Odds of getting killed driving to the gas station to buy a lottery ticket 4.5 million to 1 Odds of seeing 3 albino deer at the same time: 85 million to 1 Odds of having quintuplets: 85 million to 1 Odds of being struck by a meteorite: 10 trillion to 1 DNA Technology and Genome Genome: collection of DNA molecules that carries hereditary information of organism Genomics: study of sequence, content, and history of the genome Random mutations in sequence of DNA and DNA duplication play a significant role in the evolution of genomes. Sequence similarity: in comparison of genomes and in constructing a tree of life indicating kinship relationships among species Technology of Genome Sequencing 1. Restriction enzymes are used to make recombinant DNA: Enzymatic techniques for cutting DNA into pieces and combining DNA from different sources (Recombinant DNA) 2. Separating DNA fragments according to size (Electrophoresis) 3. Making copies of DNA using cells’ machinery for DNA replication (Cloning) 4. Polymerase chain reaction (PCR) amplify DNA directly in-vitro Causing genomic perturbations by gene mutation, insertion, and deletion Tree of Life Constructing a timeline (history) and kinship for millions of life forms on Earth In Darwin’s book Origin of Species is an evolutionary tree, describing hierarchical structure relating species to their most recent common ancestors More recently, gene-sequence similarity as a criterion to uncover hierarchical kinship relations between organisms. Tree divides organisms into 3 domians: bacteria, archaea, and eukaryota. Such genes have similar, but often not identical, nucleotide sequences in different organisms. Family trees of organisms that are based on gene similarity are called gene trees Based on sequence similarity analysis of gene for 16S ribosomal RNA, which specifies component of machinery that translates nucleotide sequence of gene into protein RNA could function as template for both DNA (reverse transcription) and protein (translation), and it has an enzymatic role in fundamental cellular activities. Universal ancestor cells as the trunk and descendents as branches and leaves (present day life forms) – Working back to ancestor using sequence of DNA letters is similar to construction of words of ancient mother tongues by linguists. Tree construction based on molecular similarity was dependent on molecule or molecule clusters In animals, genes are passed vertically from parent to child Gene sequence similarities suggest both archaea and eukaryotes have acquired metabolic genes from bacteria in lateral gene transfer called horizontal transfer All life Viruses Protists Archaea Fungi Bacteria Eukaryotes Green Pants Invertebrates Fish Monotremata Reptiles Animals Vertebrates Amphibians Marsupials Woese et al. (1990) Birds … Mammals Primates Archaea Have no nuclei or other organelles Bacteria and archaea together form prokaryotes. Include microbial species growing at 95℃ in the highly acidic conditions found in hot sulfur beds – Extreme resistance of the enzymes produced by archaea to heat and acid make them highly attractive to biotechnology companies for their potential use in industry Bacteria Ubiquitous single-celled organisms (millions everywhere) Their membranes are made of material typically different than the ones in eukaryotes Have no nuclei or other organelles Almost all they do is make more bacteria Include disease causing germs and symbiotic organisms Escherichia coli (E. coli) is a bacterium that lives in human intestines and is required for normal digestion Well-studied and easy to grow Viruses Obligatory parasites They rely on the biochemical machinery of their host cell to survive and reproduce Consist of just a small amount of genetic material surrounded by a protein coat A small virus can have as few as 5000 elements in its genetic material. Actively studied because of their simplicity role in human disease Eukaryotes Eukaryotes have cells that contain: DNA organization Nuclei: a specialized area in the cell that holds the genetic material Other organelles (specialized cellular areas): mitochondria where respiration takes place, chloroplasts (in plants) capture energy from sunlight Cytoskeleton: genes for cytoskeletal proteins such as actin, myosin, and microtubules All multicellular organisms (e.g., people, mosquitoes, maple trees) are eukaryotes as are many singlecelled organisms (e.g. yeasts) Living Parts Tissues, cells, compartments, and organelles Groups of cells specializing in a particular function are tissues and their cells are said to be differentiated Once differentiated, a cell cannot change from one type to another Yet, all cells of an organism have exactly the same genetic code Differences come from differences in gene expression, that is whether or not the product a gene codes for is produced and how much is produced Mutation Due to imperfections in replication, repair, and quality-control processes Any change in base sequence of gene or noncoding DNA segment External environmental factors: radiation and chemical insult Point mutation: a change affecting a single nucleotide in gene; may cause corresponding change in amino-acid sequence of protein that gene produces, Ex: GGC(CCG,Proline,P) to CGC (GCG,Alanine,A); others does not affect amino-acid sequence of protein, Ex: TGC(ACG) to TGT(ACA) both correspond to Threonine(T); occur much more frequently in noncoding regions of genome because gene mutations leading to nonfunctional proteins do not survive the forces of evolution Rearrangement mutation affects large region of DNA: insertions of additional material, order of codons shifted, or deletions of gene; occur in the sequence of gene may prevent gene expression or result in gene product that is unrecognized by cell, mutated genes may survive and contribute to diversity of species Sequence Similarity Sequence shared among the pairs with the highest S value is the predicted ancestor sequence GTAATCG, 2 sequences with S=13 are said to be homologous sequences to the ancestor sequence Global alignment of pairs of genes (or proteins): alignment throughout their lengths where all bases are aligned with another base or a gap Local alignment: does not need to align all the bases in all sequences, Provide information on sequence motifs of proteins found at the sites of interaction with other proteins Similarity Index +3 for each matching base, -5 for mismatch, -6 as gap opening Homework S=-13 S=-21 S=?? +1 for each matching base, -1 for mismatch, -5 as gap opening GTAACTGCTAGA_ _; GTAC_ _GC_ _GTCG. Probability of Sequence Similarity Nucleotide sequences of length r: Nucleotide sequences of length r, random sequence of length m: E-value means expectation value The E-value is the measure most commonly used for estimating sequence similarity How many times is a match at least as good expected to happen by chance ? This estimate is based on the similarity measure If a match is highly unexpected, it probably results from something other than chance Common origin is the most likely explanation This is how homology is inferred Low E-value good hit 1 = bad e-Value 10e-3 = borderline E-value 10e-4 = good E-value 10e-10 = very good E-value E-values lower than 10e-4 indicate possible homology E-values higher than 10e-4 require extra evidence to support homology Comparison of Amino-Acid of Proteins Also yields information about their origin and evolution Most prevalent replacements occur between amino acids with similar side chains such as substitution within the following groups: (G, A); (A, S); (S, T); (I, V, L) Amino acids replacements involving chemically similar amino acids appear more often on DNA than expected for random mutations. Tryptophan is not chemically similar to any of the other 19 amino acids found in proteins, and its presence is largely conserved during evolution. Defining Homolog, Ortholog and Paralog Homolog: A gene related to a second gene by descent from a common ancestral DNA sequence. The term "homolog" may apply to the relationship between genes separated by speciation (ortholog), or to the relationship between genes originating via genetic duplication (paralog). Ortholog: Orthologs are genes in different species that have evolved from a common ancestral gene via speciation. Orthologs often (but certainly not always) retain the same function(s) in the course of evolution. Thus, functions may be lost or gained when comparing a pair of orthologs. Paralog: Paralogs are genes produced via gene duplication within a genome. Paralogs typically evolve new functions or else eventually become pseudogenes. http://lh6.ggpht.com/_Z6TlOmziVoM/SS69ycCKcC I/AAAAAAAAGUA/QtGd2QwekE/s720/%5BUNSET%5D.png Things You Share with -Mouse http://www.youtube.com/watch?v=VhgSReb4RR Y&NR=1 Zebra fish http://www.youtube.com/watch?v=DF5CG_p1TC w Fruit fly http://www.youtube.com/watch?v=mw5SPcEc5Q