* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CSE 181 Project guidelines - Computer Science and Engineering
Restriction enzyme wikipedia , lookup
Epitranscriptome wikipedia , lookup
Promoter (genetics) wikipedia , lookup
SNP genotyping wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Genetic code wikipedia , lookup
Genetic engineering wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Biochemistry wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Genomic library wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Transformation (genetics) wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene expression wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA supercoil wikipedia , lookup
Non-coding DNA wikipedia , lookup
Biosynthesis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Molecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen, Mike Daly, Hoa Dinh, Erinn Hama, Robert Hinman, Julio Ng, Michael Sneddon, Hoa Troung, Jerry Wang, Che Fung Yung An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Outline: • • • • • • • What Is Life Made of? What Is Genetic Material? What Do Genes Do? What Molecules Code for Genes? What Is the Structure of DNA? What Carries Information between DNA and Proteins How Are Proteins Made? An Introduction to Bioinformatics Algorithms Outline Cont. • How Can We Analyze DNA • • • • Copying DNA Cutting and Pasting DNA DNA Sequencing Probing DNA www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Section1: What is Life made of? An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Cells • Fundamental working units of every living system. • Every organism is composed of one of two radically different types of cells: • prokaryotic cells • eukaryotic cells. • Prokaryotes and eukaryotes are descended from the same primitive cell. • All extant prokaryotic and eukaryotic cells are the result of a total of 3.5 billion years of evolution. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info 2 types of cells: Prokaryotes v.s. Eukaryotes An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Prokaryotes and Eukaryotes •According to the most recent evidence, there are three main branches to the tree of life. •Prokaryotes include Archaea (“ancient ones”) and bacteria. •Eukaryotes are kingdom Eukarya and includes plants, animals, fungi and certain algae. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Prokaryotes and Eukaryotes, continued Prokaryotes Eukaryotes Single cell Single or multi cell No nucleus Nucleus No organelles Organelles One piece of circular DNA Chromosomes No mRNA post Exon/Intron splicing transcriptional modification An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Some Terminology • Genome: An organism’s genetic material • a bacteria contains about 600,000 DNA base pairs • human and mouse genomes have some 3 billion • consists of one of more chromosomes • Gene: A discrete unit of hereditary information located on the chromosomes and consisting of DNA bases (or nucleotides). It is a basic physical and functional unit of heredity, and encodes instructions on how to make proteins. • Genotype: The genetic makeup of an organism • Phenotype: The physically expressed traits of an organism • Nucleic acid: Biological molecules (RNA and DNA) that allow organisms to reproduce An Introduction to Bioinformatics Algorithms www.bioalgorithms.info All life depends on 3 critical molecules • DNAs • Hold information on how cell works. Made of 4 types of nucleotides. • RNAs • Act to transfer short pieces of information to different parts of cell • Provide templates to synthesize into protein • May be involved in the regulation of gene expression • Made of 4 types of nucleotides • Proteins • Make up the cellular structure • large, complex molecules made up of 20 types of smaller subunits called amino acids. • Form enzymes that send signals to other cells and regulate gene activity • Form body’s major components (e.g., hair, skin, etc.) An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA: The Code of Life • The structure and the four genomic letters code for all living organisms • Adenine, Guanine, Thymine, and Cytosine which pair A-T and C-G on complementary strands. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA, continued • DNA has a double helix structure which composed of • sugar molecule • phosphate group • and a base (A,C,G,T) • DNA always reads from 5’ end to 3’ end for transcription replication 5’ ATTTAGGCC 3’ 3’ TAAATCCGG 5’ An Introduction to Bioinformatics Algorithms The Purines www.bioalgorithms.info The Pyrimidines An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA, RNA, and the Flow of Information Replication Transcription Translation An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Cell Information: Instruction book of life • DNA, RNA, and Proteins are examples of strings written in either the four-letter nucleotide of DNA and RNA (A C G T/U) • or the twenty-letter amino acid of proteins. Each amino acid is coded by 3 nucleotides called codon (Leu, Arg, Met, etc.) An Introduction to Bioinformatics Algorithms What is genetic material? • Mendel’s experiments • Pea plant experiments • Mutations in DNA • Good, Bad, Silent • Chromosomes • Linked Genes www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info The Pea Plant Experiments • Mendel discovered that genes were passed on to offspring by both parents in two forms: dominant and recessive. • The dominant form would be the phenotypic characteristic of the offspring An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA: The building blocks of genetic material • DNA was later discovered to be the molecule that makes up the inherited genetic material. • DNA provides a code, consisting of 4 letters, for all cellular function. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Mutation • The DNA can be thought of as a sequence of the nucleotides: C,A,G, or T. • What happens to genes when the DNA sequence is mutated? An Introduction to Bioinformatics Algorithms www.bioalgorithms.info The Good, the Bad, and the Silent • Mutations can serve the organism in three ways: • The Good : A mutation can cause a trait that enhances the organism’s function: Mutation in the sickle cell gene provides resistance to malaria. • The Bad : A mutation can cause a trait that is harmful, sometimes fatal to the organism: Huntington’s disease, a symptom of a gene mutation, is a degenerative disease of the nervous system. • The Silent: A mutation can simply cause no difference in the function of the organism. Campbell, Biology, 5th edition, p. 255 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Genes are Organized into Chromosomes • What are chromosomes? It is a threadlike structure found in the nucleus of the cell which is made from a long strand of DNA. Different organisms have a different number of chromosomes in their cells. • Human genome has 24 distinct chromosomes. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Chromosomes Organism Number of base pairs Number of chromosomes --------------------------------------------------------------------------------------------------Prokayotic Escherichia coli (bacterium) 4x106 1 Eukaryotic Saccharomyces cerevisiae(yeast) 1.35x107 Drosophila melanogaster(insect) 1.65x108 Homo sapiens(human) 2.9x109 Zea mays(corn) 5.0x109 17 4 24 10 An Introduction to Bioinformatics Algorithms What Do Genes Do? • Design of life (gene->protein) • protein synthesis • Central dogma of molecular biology www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Structure of a Gene (in Eukaryotes) • Regulatory regions: up to 50 kb upstream of +1 site • Exons: protein coding and untranslated regions (UTR) 1 to 178 exons per gene (mean 8.8) 8 bp to 17 kb per exon (mean 145 bp) • Introns: splice acceptor and donor sites, junk DNA average 1 kb – 50 kb per intron • Gene size: Largest – 2.4 Mb (Dystrophin). Mean – 27 kb. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins: Workhorses of the Cell • 20 different amino acids • different chemical properties cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell. • Proteins do all essential work for the cell • • • • build cellular structures digest nutrients execute metabolic functions Mediate information flow within a cell and among cellular communities. • Proteins work together with other proteins or nucleic acids as "molecular machines" • structures that fit together and function in highly specific, lockand-key ways. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info What carries information between DNA to Proteins? • RNA is similar to DNA chemically. It is usually only a single strand. T(hymine) is replaced by U(racil) • Some forms of RNA can form secondary structures by “pairing up” with itself. This may have impact on its properties. DNA and RNA can pair with each other. tRNA linear and 3D view: http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Central Dogma Revisited (Eukaryotes) Transcription Splicing Nucleus hnRNA mRNA Spliceosome DNA protein Translation Ribosome in Cytoplasm An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Terminology for Splicing • Exon: A portion of the gene that appears in both the primary and the mature mRNA transcripts. • Intron: A portion of the gene that is transcribed but excised prior to translation. An Introduction to Bioinformatics Algorithms Splicing www.bioalgorithms.info An Introduction to Bioinformatics Algorithms Splicing • Sometimes alternative splicing can create different valid proteins. • A typical Eukaryotic gene has 4-20 introns. Locating them by analytical means is not easy. www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info RNA Protein: Translation • Ribosomes and transfer-RNAs (tRNA) run along the length of the newly synthesized mRNA, decoding one codon at a time to build a growing chain of amino acids (“peptide”) • The tRNAs have anti-codons, which complementarily match the codons of mRNA to know what amino acids get added next An Introduction to Bioinformatics Algorithms Translation • The process of going from RNA to polypeptide. • Three bases of RNA (called a codon) correspond to one amino acid based on a fixed table. • Always starts with Methionine and ends with a stop codon www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Translation, continued • Catalyzed by Ribosome • Using two different sites, the Ribosome continually binds tRNA, joins the amino acids together and moves to the next location along the mRNA • ~10 codons/second, but multiple translations can occur simultaneously http://wong.scripps.edu/PIX/ribosome.jpg An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins • Complex organic molecules made up of amino acid subunits. • 20* different kinds of amino acids. Each has a 1 letter and 3 letter abbreviations. • The protein adopts a 3D structure specific to its amino acid arrangement and function • Proteins are often enzymes that catalyze reactions. • Also called “poly-peptides” *Some other amino acids exist but not in humans. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Protein Folding • • • • Proteins are not linear structures, though they are built that way. Proteins tend to fold into the lowest free energy conformation. Proteins begin to fold while the peptide is still being translated. The amino acids have very different chemical properties; they interact with each other after the protein is built • This causes the protein to start folding and adopting its functional structure • Proteins may fold in reaction to some ions, and several separate chains of peptides may join together through their hydrophobic and hydrophilic amino acids to form a polymer An Introduction to Bioinformatics Algorithms Protein Folding (cont’d) • The structure that a protein adopts is vital to its chemistry. • Its structure determines which of its amino acids are exposed and carry out the protein’s function. • Its structure also determines what substrates it can react with. www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Copying DNA - Polymerase Chain Reaction (PCR) • PCR is used to massively replicate DNA sequences. • How it works: • • • • Separate the two strands with low heat Add some bases, primer sequences, and DNA Polymerase • Creates double stranded DNA from a single strand. • Primer sequences create a seed from which double stranded DNA grows. Now you have two copies. Repeat. Amount of DNA grows exponentially. • 1→2→4→8→16→32→64→128→256… An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Cutting DNA • Restriction Enzymes cut DNA • Only cut at special sequences • DNA contains thousands of these sites. • Applying different restriction enzymes creates fragments of varying size. Restriction Enzyme “A” Cutting Sites Restriction Enzyme “B” Cutting Sites “A” and “B” fragments overlap Restriction Enzyme “A” & Restriction Enzyme “B” Cutting Sites An Introduction to Bioinformatics Algorithms Pasting DNA • Two pieces of DNA can be fused together by adding chemical bonds • Hybridization – complementary base-pairing • Ligation – fixing bonds within single strands www.bioalgorithms.info An Introduction to Bioinformatics Algorithms Cloning DNA • DNA Cloning • Insert the fragment into the genome of a living organism and watch it multiply. • Once you have enough, remove the organism, keep the DNA. • Use Polymerase Chain Reaction (PCR) Vector DNA www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Reading (Sequencing) DNA • Electrophoresis • • • • • Reading is done mostly by using this technique. This is based on separation of molecules by their sizes (and in 2D gel by size and charge). DNA or RNA molecules are charged in aqueous solution and move to a definite direction by the action of an electric field. The DNA molecules are either labeled with radioisotopes or tagged with fluorescent dyes. In the latter, a laser beam can trace the dyes and send information to a computer. Given a DNA molecule, it is then possible to obtain all fragments from it that end in either A, or T, or G, or C and these can be sorted in a gel experiment. This (Sanger technique) usually produces reads of lengths between 500 bps and 1000 bps. • Another route to sequencing is direct sequencing using gene chips or NGS technologies, which have much higher throughputs but produce shorter reads (30 bps – 500 bps). An Introduction to Bioinformatics Algorithms www.bioalgorithms.info 10 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info 10 An Introduction to Bioinformatics Algorithms Assembling Genome • Sequence each random fragment and put them back together • Not as easy as it sounds • SCS Problem (Shortest Common Superstring) • Some of the fragments will overlap • Fit overlapping sequences together to get the shortest possible sequence that includes all fragment sequences www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Assembling Genome • DNA fragments contain sequencing errors • Two complementary strands of DNA • Need to take into account both directions of DNA • Repeat problem • 50% of human DNA is just repeats • If you have repeating DNA, how do you know where it goes? Hint: Repeats are usually different due to mutations. You could probably figure it out if you know the mutation rates between repeats and sequencing error rates. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Probing DNA • DNA probes • • • • Oligonucleotide: single-stranded DNA of 20-30 nucleotides long Oligonucleotides are used to find complementary DNA segments. Made by working backwards: AA sequencemRNA cDNA. Made with automated DNA synthesizers and tagged with a radioactive isotope. 60 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Creating a Hybridization Reaction 1. 2. Hybridization is binding two genetic sequences. The binding occurs because of the hydrogen bonds [pink] between base pairs. When using hybridization, DNA must first be denatured, usually by using heat or chemicals. T C A G T TAGGC T G T C G CT A T ATCCGACAATGACGCC http://www.biology.washington.edu/fingerprint/radi.html 61 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Creating a Hybridization Reaction Cont. 3. 4. Once DNA has been denatured, a singlestranded radioactive probe [light blue] can be used to see if the denatured DNA contains a sequence complementary to probe. Sequences of varying homology may stick to the DNA even if the fit is not perfect. ACTGC ACTGC ATCCGACAATGACGCC Great Homology ACTGC ATCCGACAATGACGCC Less Homology ACTCC ATCCGACAATGACGCC ACCCC ATCCGACAATGACGCC Low Homology http://www.biology.washington.edu/fingerprint/radi.html 62 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA (Micro) Arrays --Technical Foundation • An array works by exploiting the ability of a given mRNA molecule to hybridize to the DNA template. • Using an array containing many DNA samples (corresponding to different genes) in an experiment, the expression levels of hundreds or thousands genes within a cell is obtained by measuring the amount of mRNA bound to each site on the array. • With the aid of a computer, the amount of mRNA bound to the spots on the microarray is “precisely” measured, generating a profile of gene expression in the cell. • Microarrays suffer from high noise and are being quickly replaced by NGS methods (RNA-Seq). http://www.ncbi.nih.gov/About/primer/microarrays.html 64 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info An experiment on a microarray In this schematic: GREEN represents Control RNA RED represents Case RNA YELLOW represents a combination of Control and Case RNA BLACK represents areas where neither the Control nor Case RNA Each color in an array represents either healthy (control) or diseased (case) tissue. The location and intensity of a color tell us whether the gene, or mutation, is present in the control and/or case RNA. http://www.ncbi.nih.gov/About/primer/microarrays.html An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Sources Cited • • • • • • • • • Daniel Sam, “Greedy Algorithm” presentation. Glenn Tesler, “Genome Rearrangements in Mammalian Evolution: Lessons from Human and Mouse Genomes” presentation. Ernst Mayr, “What evolution is”. Neil C. Jones, Pavel A. Pevzner, “An Introduction to Bioinformatics Algorithms”. Alberts, Bruce, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter. Molecular Biology of the Cell. New York: Garland Science. 2002. Mount, Ellis, Barbara A. List. Milestones in Science & Technology. Phoenix: The Oryx Press. 1994. Voet, Donald, Judith Voet, Charlotte Pratt. Fundamentals of Biochemistry. New Jersey: John Wiley & Sons, Inc. 2002. Campbell, Neil. Biology, Third Edition. The Benjamin/Cummings Publishing Company, Inc., 1993. Snustad, Peter and Simmons, Michael. Principles of Genetics. John Wiley & Sons, Inc, 2003.