* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CS691K Bioinformatics Kulp Lecture Notes #0 Molecular
DNA damage theory of aging wikipedia , lookup
Metagenomics wikipedia , lookup
Genetic code wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Non-coding RNA wikipedia , lookup
Designer baby wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA supercoil wikipedia , lookup
Epigenomics wikipedia , lookup
History of RNA biology wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
DNA vaccination wikipedia , lookup
Human genome wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic library wikipedia , lookup
Microevolution wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Point mutation wikipedia , lookup
Genome editing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Primary transcript wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Helitron (biology) wikipedia , lookup
CS691K Bioinformatics Kulp Lecture Notes #0 Molecular & Cell Biology Fall 2005 [email protected] Logistics • Syllabus distributed – Class taught in 3 stages by faculty in CS, math/stats, and microbio – Grades will be based on up to six homework assignments – Office hours on syllabus. All faculty are readily available by email. We are happy to discuss the class with you personally. – Not all notes will be available online - you should attend all lectures and take good notes • Diverse group of students • Emphasis will be on understanding methods and practical use of existing bioinformatics tools • Why are you here? What is your background? What are you hoping to get out of this class? Please sign the email sheet! • Homework will involve the use of the unix ED-LAB computers. There will be a special meeting on WEDNESDAY, SEPTEMBER 14 for novice unix users. What is Bioinformatics • Computational Biology: The use of algorithmic, mathematical, and statistical methods to analyze genome sequences (i.e. DNA, RNA, protein) and derived data (e.g. expression, NMR, etc.) • Informatics: The software and data management methodologies for storing, retrieving, and intrigrating such data • Data Mining / In-silico Biology: Hypothesis generation and testing from genome data sets Topics • Detecting similar sequences (homology) – Pairwise and multiple sequence alignment – Protein function/structure prediction • Sequence pattern modeling and recognition – Motif discovery – Gene finding • Analyzing high-dimension data – Function prediction, target discovery, etc. from gene expression • Constructing trees – Phylogenetics • Informatics and integration – Genome biology The Cell • Prokaryotes are unicellular with minimal compartments bacteria, archaea • Eukaryotes are multicellular with differentiation and many organelles including the nucleus that typically can reproduce sexually - all higher organisms including mammals, birds, fish, invertebrates, mushrooms, plants, and yeast. ~300,000,000,000,000 cells in a human. The Cell • The cell is composed of and makes thousands of proteins, e.g. – the cell wall is made of a layer of proteins and lipids. – There are special proteins embedded in the wall as channels and pumps – And the cell makes (synthesizes) proteins • “DNA makes RNA, RNA makes proteins, and proteins make us!” F. Crick • • The cell is a chemical catalytic machine Networks: – one type of network are metabolic networks describing catalytic reactions for the consumption or synthesis of products necessary for life. Many of these are fairly well understood. (e.g. photosynthesis) – Another type of network are signaling networks where information is conveyed about the environment. These are partially understood. (e.g. protein kinases are involved in cell differentiation and cell death) • From KEGG (http://www.genome.ad.jp/kegg/pathway.html) The Cell - Genetic Information • There is a third major type of network: genetic information processing. We will focus on these networks. • To understand this: – we describe the nature of DNA – Tangentially mention homology and conservation – Then discuss the process of translation DNA Structure - Eukaryotic Chromosome • • • • DNA - a string of nucleic acids (Adenine, Guanine, Cytosine, and Thymine) Regular, long, stable, oriented, double-stranded, helical structure Humans: 23 pairs of chromosomes. Total ~3B “bases” (x2) DNA resides in nucleus in eukaryotes DNA Structure DNA • Always: chemical pairing of A-T and C-G. Thus, strands are complementary. • Two chains run in opposite directions: 5’ to 3’ 5’ 3’ 3’ 5’ Prokaryotic Chromosomes • Prokaryotes (and mitochondria) have one circular chromosome • This shows the E. coli genome with orange and yellow bars indicating the positions of the genes on the two strands. RNA RNA is a similar molecule composed of 4 nucleic acids (A, C, G, and U) • Single-stranded. • Can base-pair with DNA (synthesis) • Can self-base-pair and fold DNA Replication • We won’t be discussing the details of DNA replication. There are 2 processes: – Mitosis for normal cell duplication – Meiosis for gametes for sexual reproduction - single, recombined chromosomes • In both processes, DNA is copied by breaking doublestrand (dsDNA) into single-strands (ssDNA) at origins of replication and synthesizing a complementary copy from the template. – 50 bp/sec * 15K origins = ~1 hr to replicate human genome • Problem: – How does DNA polymerase find the origins? Are there sequence patterns? The Tree of Life Single common ancestral genome! DNA Conservation and Variation • • • • • Mutations occur in DNA due to environmental effects (e.g. radiation) and random mistakes during synthesis. Usually just single nucleotides are changes, sometimes large rearrangements. Those changes occurring in somatic (non-sex) cells cause local damage, usually cell death, but can cause cancer. (Search for the common mutations that cause different types of cancers.) Those changes occurring in gametes can be inherited and if favorable can become “fixed” Variation in non-functional (junk) DNA tends to “drift”, whereas functional DNA (e.g. containing genes) tends to remain “conserved”. Problems: – Given a set of sequences from different organisms: • Identify and align sequences from a common ancestor (homologous) • What are the important (conserved) parts? • What was the evolutionary history? (Reconstruct the “tree”) – Given a model organism (e.g. mouse, yeast, fruitfly, etc.), find the orthologous locus in human Examples of Sequence Conservation • A segment from the RNA needed for protein synthesis - a fundamental process in all life forms. It is conserved across all 3 major branches of the tree of life. • A multiple alignment of homologous protein sequences. Colors indicate different classes of amino acids. Dots are inserts/deletes. DNA contains “GENES” • Genes are heriditary units of DNA – We now know that, for the most part, genes are regions that “code” for proteins • Proteins are derived from DNA according to the “central dogma”: DNA => RNA => Protein – Like DNA replication, DNA is opened into two single strands. – Using a ssDNA as a template, a complementary copy of RNA is synthesized for a small region of the genome (1000-100000nt) – The RNA is processed and transported (more about that in later lectures) – Each triple of RNA (codon) is translated to one of 20 amino acids creating a polypeptide chain, which folds into a protein • Problems: – How does the cell know where to find a gene? (Sequence patterns?) – How does RNA transcription know when to stop? (Patterns?) – How is RNA edited? “Central Dogma” - DNA - RNA - Protein ©1998 by Alberts, Bray, Johnson, Lewis, Raff, Roberts, Walter Codon Translation • Each triplet translates to a unique amino acid. For example, CUU is Leucine. • There are 4*4*4=64 possible codons that translate into 20 amino acids • This translation table is fixed for almost all life Cell Differentiation • Eukaryotes have many different cell types (skin, muscle, neurons, etc.) that each play a different role. • To accomplish the cell’s role, different genes must be activated • Problems: – How are genes activated? What regulatory patterns are in the DNA? – What genes control other genes? What network associations among genes can be found? – What genes are “differentially expressed”? Cell Differentiation Differential Expression • Interleukin 1 alpha expressed in different cell types Protein Sequence, Structure, Function • Lastly, given a protein sequence, what is the 3-D structure and function? • The most common approach is to exploit conservation (see earlier) • Problem: – Find similar proteins to my query protein. Maybe I can assign structure or function to my new query protein, if structure or function is already known for a homologous protein. (Sequence similarity searching, protein family modeling) Protein Structure Further Reading • Many online intros to genome biology – E.g. http://www.ncbi.nlm.nih.gov/About/primer/ • Any molecular biology text – E.g. Molecular Biology of the Cell by Alberts, et al or Genomes by Brown.