Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Biological preliminaries Genome: entire complement of genetic material carried by an individual. Transcriptome: entire set of transcribed sequences produced by the genome. Proteome: entire set of proteins encoded by the genome. One letter nucleotide codes. Based on Nomenclature Committee of the International Union of Biochemistry (NC-IUB). Molecular Biology and Evolution 3:99-108 (1986). Guanine Adenine Thymine Cytosine Purine Pyrimidine Amino Keto Strong (3H bonds) Weak (2H bonds) Not G Not A Not T Not C Any Unknown G A T C G T A G G A A G G G G ? or or or or or or or or or or or A C C T C T C or T T or C C or A A or T C or T or A G A T C R Y M K S W H B V D N X One letter amino acid codes Alanine Aspartic acid Glutamine Isoleucine Methionine Serine Tyrosine A D Q I M S Y Arginine Cysteine Glycine Leucine Phenylalanine Threonine Valine R C G L F T V Asparagine Glutamic acid Histidine Lysine Proline Tryptophan Unknown N E H K P W X Biological preliminaries Similarity: resemblance between two characters. Homology: Two traits are homologous if they are derived (with or without modifications) from a common ancestor. A B A B B homologs Homoplasy: independent origin of similar characters between species A A B A homoplasy B Plesiomorphy: primitive or ancestral character state Apomorphy: derived state representing an evolutionary novelty Symplesiomorphy: primitive state shared by several taxa Synapomorphy: derived character state shared by several taxa Autapomorphy: derived character state unique to a taxa Mutations: a mutation is an error in replication of the nucleotide sequence. It may encompass one or more nucleotides and in complicated situations may involve disjoint nucleotides. They can be caused by internal errors of metabolism or by external agents such as radiation. Substitutions: Substitutions are differences in two sequences caused originally by mutations but which have been acted on by selection. Replacements: observed differences between amino acid sequences. Transition - Transversions purines A G pyrimidines C T Transition: changes purine ⇐⇒ purine or pyrimidine ⇐⇒ pyrimidine Transversion: changes purine ⇐⇒ pyrimidine 8 transversions for 4 transitions Genomics Beginnings The first protein sequenced was bovine insulin in 1956. This was basically done by a series of tricks. Each individual amino acid was determined by a separate and different experiment. Beginnings The first protein sequenced was bovine insulin in 1956. This was basically done by a series of tricks. Each individual amino acid was determined by a separate and different experiment. The first direct attempts to sequence an RNA molecule were by Holley and co-workers in 1965 (R.W. Holley et al., 1965, Science 147:1462-1465). The technique that they used was very labor intensive and it took them approximately one year to determine the 77 nucleotides that make up the alanine transfer RNA of yeast. Genome Sequencing 1956 1965 1977 1977 1995 1997 1998 2000 2001 2005 First protein sequence (Bovine insulin) Holley et al. Sience 147: 1462-1465 (yeast alanine NA) Maxam and Gilbert. PNAS 74: 560-564. Sanger et al. PNAS 74: 5463-5467 First sequenced genome: Haemophilus influenzae Escherichia. coli and Saccharomyces cerevisae Caenorhabditis elegans genome Drosophila melanogaster genome Human genome sequence 454 introduce next generation sequencing (NGS) Maxam & Gilbert sequencing I Principle I I I Method: Four different treatments I I I I I radioactively label DNA fragments at their 50 end using alkaline phosphatase / polynucleotide kinase separate the fragments according to their size using gel electrophoresis followed by an autoradiography to visualize the fragments Dimethylsulfate followed by heat treatment (G) + mild acid (A+G) Hydrazine (C+T) Hydrazine + 2M NaCl (C) Limitation I Require a cloning step (amplification and labeling) Maxam & Gilbert sequencing G −ve A+G T+ C C Inferred DNA sequence P 32 C T T C AGT AC GT C G P 32 C T T C AGT AC GT C P 32 C T T C AGT AC GT P 32 C T T C AGT AC G P 32 C T T C AGT AC C T T C AGT A P 32 P 32 C T T C AGT P 32 C T T C AG P 32 CT T CA P 32 CT T C P 32 CT T P 32 CT P 32 +ve C Sanger sequencing I Principle I I I I Method: I I Use the properties of DNA replication Therefore requires the use of primers Aim: amplification of fragments of different sizes by stopping the DNA replication with the use of 20 ,30 dideoxyribonucleotide triphosphates Four different individual reactions are performed with each of the radioactively labeled 20 ,30 -dideoxyribonucleotide triphosphates Gel electrophoresis followed by autoradiography Sanger sequencing O O P O− "Ob " b O–CH2 " T T O Base O b P O− O–CH2 O O P O O− Base "Ob " b " b CH2 — T T OH Base "Ob b " b " T T O O P O O− Base "Ob " b " b CH2 — T T O O O O O P O O − O P O O − P O − O–CH2 Base "Ob b " b " T T OH P O− O CH2 —" "Ob " b T T OH Base b Sanger sequencing O O O P O O − O P O O − P O− O–CH2 Ob " " b " T T Base b Sanger sequencing DNA Replication O O O P O- O CH2 O O Base P O- O CH2 O O O Base O P O- O CH2 O O Base P O- O CH2 O O OH O O O- P O- O O P O- Base P O- O CH2 O O O P O- O CH2 O OH Base OH Base Sanger sequencing O O P O- O CH2 O Base O O P O- O CH2 O Base OH O O- O O P - O O O P - O P O- O CH2 O OH Base Sanger sequencing O O P O- O CH2 O Base O O P O- O CH2 O Base OH O O- O O P - O O O P - O P O- O CH2 O Base dideoxynucleotide triphosphate Sanger sequencing O O O P O- O CH2 O O Base P O- O CH2 O O O O Base P O- O CH2 O O Base P O- O CH2 O O OH O O O- O O P - O O O P - O Base P O- O CH2 O Base P O- O CH2 O Base Sanger sequencing Dideoxynucleotide G A T C −ve Inferred DNA sequence P 32 C T T C AGT AC GT C G P 32 C T T C AGT AC GT C P 32 C T T C AGT AC GT P 32 C T T C AGT AC G P 32 C T T C AGT AC C T T C AGT A P 32 P 32 C T T C AGT P 32 C T T C AG P 32 CT T CA P 32 CT T C CT T P 32 CT P 32 P 32 +ve C Sanger sequencing Dideoxynucleotide G A T C −ve Inferred DNA sequence P 32 C T T C AGT AC GT C G P 32 C T T C AGT AC GT C P 32 C T T C AGT AC GT P 32 C T T C AGT AC G P 32 C T T C AGT AC C T T C AGT A P 32 P 32 C T T C AGT P 32 C T T C AG P 32 CT T CA P 32 CT T C CT T P 32 CT P 32 P 32 +ve C Autoradiogram courtesy of Dr. Rahat Zaheer Example of a good trace Example of a poorer quality trace Example of a bad trace Example of the beginning of a trace Example of the middle of a trace Example near the useful end of a trace