* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 2 - rci.rutgers.edu
Public health genomics wikipedia , lookup
Oncogenomics wikipedia , lookup
DNA polymerase wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Metagenomics wikipedia , lookup
Transposable element wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Minimal genome wikipedia , lookup
Genealogical DNA test wikipedia , lookup
SNP genotyping wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression profiling wikipedia , lookup
Cancer epigenetics wikipedia , lookup
DNA vaccination wikipedia , lookup
Genome (book) wikipedia , lookup
Genetic engineering wikipedia , lookup
Genomic library wikipedia , lookup
Human genome wikipedia , lookup
Epigenomics wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA supercoil wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genome evolution wikipedia , lookup
Microsatellite wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Primary transcript wikipedia , lookup
Designer baby wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Genome editing wikipedia , lookup
Microevolution wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genomics basics Genes - In old times: Hereditary mechanism that carried information from parent to child. - Family members tend to exhibit similar traits: skin color, diabetes,…. - Monarch butterflies take 4 generations to complete a migration cycle. - Not a perfect copying process. (eye color). - Genes: discrete hereditary units: provide a clear and unambiguous set of instructions for producing some property of the organism. Genome - The complete set of genes in an organism, master blueprint. - Contains all the hereditary instructions for building, operating, and maintaining the organism, and for passing life in like form on to the next generation of that organism. - Twentieth century: innovative research work and path-breaking discoveries over the first half of the twentieth century gave genes a chemical (molecular) existence and culminated in the pivotal realization that genes are made of deoxyribonucleic acid (DNA, for short). DNA (Deoxyribonucleic Acid) - Two long strands wound tightly around each other in a spiral structure. - double helix: A twisted ladder: sides are made of sugar and phosphate : rungs are made of bases. - Nucleotides: Each strand of the DNA molecule is a linear arrangement of repeating similar units. - Nucleotide = a sugar (deoxyribose), a phosphate group, and a nitrogenous base. - Bases : Adenine, Thymine, Guanine, Cytosine (A, T, G, C). - Watson-Crick base pairing rules: The bases on one strand are paired with the bases on the other strand according to the Complementary base pairing rules - Adenine pairs with Thymine, Guanine pairs Cytosine. - Base pairs: they form the coplanar rungs of the ladder. Weak hydrogen bond. - DNA is chemically inert and is a stable carrier of genetic information. DNA replication: - Complementary sequencing => Easy to self-replicate. Information passed on from generation to generation. (i) DNA molecule unwinds, allowing the strands to separate. (ii) Each strand directs the synthesis of a brand new complementary strand, (iii) Two descendant DNA molecules: each new strand is an exact copy of the old one. DNA sequence: Mechanism that specifies the exact genetic instructions required to create the traits of a particular organism. Genes: Specific contiguous subsequence of the DNA sequence. A gene A-T-G-C sequence is the code required for constructing a protein. Proteins: Giant complex molecules made of chains of amino acids: - The building blocks and the workhorses of life - Proteins also regulate most of life’s day-to-day functions. - DNA replication process is mediated by enzymes, proteins whose job is to catalyze biochemical reactions. Gene expression - Cells are the fundamental units of all living organisms, both structurally and functionally. - Nucleus, It contains the organism’s DNA. Enclosed by the nuclear membrane, -Cytoplasm. plasma membrane encloses the cell. - Inside cytoplasm: Molecules that act as channels and pumps controlling in/out of the cell. -Gene expression: The set of protein-coding instructions in the DNA sequence of a gene resembles a computer program. (compiled and executed in order for anything to happen). - A gene must be expressed in order for anything to happen. - A gene expresses by transferring its coded information into proteins that dwell in the cytoplasm, a process called gene expression. - A central dogma of molecular biology: DNA → mRNA transcription → protein translation Messenger ribonucleic acid (mRNA) resembles a single strand of DNA. - In mRNA uracil (U) replaces thymine. - When a gene is expressed, the DNA double helix splits open along its length. - Transcription: One strand of the open helix remains inactive, while the other strand acts as a template against which a complementary strand of mRNA forms. - The mRNA strand then separates from the DNA strand and transports out of the nucleus, across the nuclear membrane, and into the cellular cytoplasm. - There it serves as the template for protein synthesis, with consecutive (nonoverlapping) triplets of bases (called codons) acting as a code to specify the particular amino acids that make up an individual protein. - Translation: The sequence of bases along the mRNA is thus converted into a string of amino acids that constitutes the protein molecule for which it codes. - Each possible triplet of mRNA bases codes for a specific amino acid, one of the twenty amino acids that make up proteins. - GCC codes for alanine, CAC for histidine, AUC for isoleucine, GAG for glutamic acid - There are 43=64 possible triplets, but only twenty possible amino acids. - This means that there is room for redundancy: for example, GCU, GCC, GCA, and GCG, all code for alanine → Safeguard against small errors. - Start codon, AUG, is the triplet of mRNA bases that signals the initiation of a sequence that is to be translated. - Stop codon is a triplet of mRNA bases, either UGA, UAG, or UAA, - Open reading frame (ORF, for short). All sequence information of coding interest lies in ORFs (but not every ORF codes for a gene). - Expressed sequence tags (ESTs) a unique short subsequence (only a few hundred base pairs long), generated from the DNA sequence of a gene, that acts as a “tag” or “marker” for the gene. - Not all cells express the same genes, which is why different cells do different things. - Within the same cell, different genes will be expressed at different times, at different levels, in response to different stimuli. - Few exceptions: Housekeeping genes, maintain basic cell functions. Hybridization assays - Hybrid DNA: Single-stranded DNA molecules of like sequence have a tendency to bind together two DNA strands or one DNA strand and one mRNA strand. - Hybridization assays. A probe consisting of a homogenous sample of singlestranded DNA molecules, whose sequence is known, is prepared and labeled with a reporter chemical, usually a radioactive or fluorescent substance. An immobilized target, usually a heterogeneous mixture of single-stranded DNA molecules of unknown composition is challenged by the probe. As the probe will hybridize only to sequences complementary to its sequence, DNA sequences in the target that are complementary to the probe DNA sequence can be identified by the presence of reporter molecules. Blotting techniques. - Southern blotting: the target DNA is separated by electrophoresis (see below) and transferred onto a filter, where it is exposed to the probe. - Northern blotting: is a variant in which the target is mRNA instead of DNA. As mRNA is the intermediary molecule in gene expression, Northern blotting provides a means of studying the expression patterns of specific genes. DNA microarrays can be regarded as a massively parallel version of Northern blotting. - In situ hybridization: denatured DNA (DNA in which the two strands are unwound and separated) is kept in place in the cell and is then challenged with mRNA or DNA extracted from another source and labeled with a reporter chemical, usually a fluorescent substance. By retaining the DNA in the cell, the specific chromosome containing the DNA sequence of interest can be identified by observing, under a microscope, the location of the fluorescence. Other relevant assays - Electrophoresis: use an electric field to separate large molecules, such as DNA and proteins, from a mixture of similar molecules. - Cloning: produce multiple, exact copies of a single gene or other segment of DNA to obtain enough material for further study. clone libraries. - Polymerase chain reaction (PCR) procedure for amplifying virtually any fragment of DNA. (i) Denaturing: Two strands of DNA are unwound and separated by heating (ii) Annealing: primers - short strands of single-stranded DNA that match the sequences at either end of the target DNA, are bound to their complementary bases on the now single-stranded DNA. (iii) Polymerase: an enzyme whose job is to copy genetic material. Starting from the primer, the polymerase reads a template strand and matches it with free complementary bases. This produces two descendant DNA strands. - Cycling through these three steps generates many copies of the target DNA. - The amount of DNA grows exponentially as it doubles with every cycle. - Reverse transcription: is a procedure for reversing, in a laboratory, the process of transcription. It is accomplished by isolating mRNA and using it as a template to synthesize a complementary DNA (cDNA, for short) strand, so-called because its sequence is complementary to the original mRNA sequence. This process utilizes the enzyme reverse transcriptase. The resultant single-stranded cDNA molecule is considerably shorter than the parent DNA sequence, as it will have only its coding exon sequences; the noncoding intron sequences would have been excised during the formation of the original mRNA. - The process of translation cannot be reversed. - Reverse transcriptase polymerase chain reaction (RT-PCR) The cDNA generated by reverse transcription is amplified by PCR. - RT-PCR can be used to detect and quantify target mRNA sequences, and thereby to provide information regarding gene expression. The Human Genome - DNA in the human genome is made up of roughly three billion base pairs. - 46 molecules, threadlike cellular structure called a chromosome. - Chromosomes come in pairs (except sex): father + mother - Chromosomes range in length from about 50 million bp to 250 million bp. Each chromosome contains many genes. In total, the human genome is estimated to contain somewhere around 35,000-40,000 genes. - Genes vary widely in length, from a few hundred bp to several thousand bp. - Only a tiny percentage of human DNA includes exons. - Introns: sequences that have no coding function - In between genes are other noncoding regions whose functions remain largely obscure. - Every single human being has almost the exact same genome. (99.9% identical) - Allele: Each alternate version of a gene Genes vary slightly from person to person. Every person carries two alleles of each gene. - Homozygous(person): When both alleles are the same for that gene. - Heterozygous. Only one of the alleles (dominant allele) may be expressed. Example: Blood type. A, B, AB or O. The ABO gene controls the blood group has three alleles, which are designated as A, B and O. Alleles A and B, which code for proteins A and B respectively, are co-dominant. A = AA or AO B = BB or BO AB = A and B O = O and O. Genome variations and their consequences - mutations and polymorphisms: alterations in a DNA sequence - substitution: one base being replaced by another - deletion: - insertion: - inversion: small subsequence of bases being removed and then reinserted in the opposite direction - translocation: a small subsequence of bases being removed and then reinserted in a different place. - Genome variations may be inherited or acquired. - An inherited variation could be passed on to the next generation of that organism. - Acquired variations are (i) Mutations that occur spontaneously during DNA replication caused by an external environmental factor such as exposure to a toxic substance. (ii) Hereditary gene mutation: Mutation that affects a reproductive cell. Present in less than 1 percent of people. (iii) Polymorphism: present in at least 1% Pop. - wild-type allele: common and properly functioning version of a gene - mutant allele: version with a mutation Robustness of the genetic code - Many genome variations do not produce any noticeable effects: (i) outside the genome’s coding regions. (ii) redundancies in the genetic code (iii) cell repairing mechanisms for certain types of damaged DNA. - A small percentage of genome variations do produce noticeable effects, some deleterious, some beneficial. This is the genetic basis of biological diversity and the evolutionary process. - Polymorphisms that produce noticeable effects → natural selection process -Not so simple: People with blood type O: peptic ulcers and cholera yet the trait did not die out Mutations in the p53 gene, which, in its wild-type, codes for a protein that suppresses abnormal cell proliferation, Mutations in the p53 gene → associated with cancer. Single nucleotide polymorphisms (SNPs) - Example: Blood type: difference between A and O: O = A with G base deleted - Example: The apolipoprotein E gene (ApoE, for short) and Alzheimer’s disease. - ApoE has three alleles (called E2, E3, E4), differs from any other by a SNP(2). - Have E4 allele => a greater risk of Alzheimer’s - Have E2 allele => lesser risk of developing Alzheimer’s. - SNPs are not randomly scattered along a chromosome. - Occur in groups, called haplotypes that inherit together. Genotype-phenotype relationships - Association between ApoE and Alzheimer’s disease. - Genotype refers to the genetic makeup of an individual. - Phenotype refers to the outward characteristics of the individual. Genomics - Structural genomics. Establish genome sequences for organisms, particularly humans. - Functional genomics, which, as its name implies, is the study of the functions of genes. - Genes do not function in isolation. Instead, genes (and proteins) operate collectively in pathways, as coordinated sequences of genetic and molecular activities. - Pathways underlie all cellular processes. Constant interplay between genes, proteins and external factors - Microarrays: simultaneous study of large numbers of molecular entities. Snapshot - Web page with biological pathway examples: http://emp.mcs.anl.gov/ - Questions in functional genomics : Which genes are expressed in which tissues? How is the expression of a gene affected by extracellular influences? Which genes are expressed during the development of an organism? How does gene expression change during development and differentiation? What is the effect of misregulated expression of a gene? What patterns of gene expression cause a disease or lead to disease progression? What patterns of gene expression influence response to treatment? The role of genomics in medical research and pharmaceutical research Genotype-phenotype relationships Risk of diseases How it affect drug response Hope for a more personalized medicine. Proteins: Bioinformatics activities. Creation and maintenance of databases. GenBank (a database that contains public DNA and protein sequence data), SWISS-PROT (a protein sequence database) PDB (a database of three dimensional biological macromolecular structure data). Analysis of sequence information. specialized tools (e.g., BLAST) search, view and analyze the data in these databases. Prediction of three-dimensional structure. Expression analysis. using statistical and data mining tools Modeling dynamic life processes. develop ways of putting together the information gathered from all the diverse areas of research in order to understand fundamental life processes. Supplementary reading Clark, D. and L. Russell (1997). Molecular Biology Made Simple and Fun, Vienna, IL: Cache River Press. Alberts, B., D. Bray, J. Lewis, M. Raff, K. Roberts and J. Watson (1994). Molecular Biology of the Cell, New York, NY: Addison-Wesley. Strachan, T. and A.P. Read (1999). Human Molecular Genetics, second edition, New York, NY: John Wiley. Vingron, M. (2001). Bioinformatics needs to adopt statistical thinking. Bioinformatics, 17, 389-390. Gonick, L. and M. Wheelis (1991). A Cartoon Guide to Genetics. New York, NY: Harper Collins.