* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture
Genetic engineering wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
DNA profiling wikipedia , lookup
History of RNA biology wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
DNA polymerase wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Metagenomics wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
SNP genotyping wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Human genome wikipedia , lookup
Point mutation wikipedia , lookup
Molecular cloning wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Epigenomics wikipedia , lookup
DNA vaccination wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Microsatellite wikipedia , lookup
DNA supercoil wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genomic library wikipedia , lookup
Primary transcript wikipedia , lookup
Genome editing wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
History of genetic engineering wikipedia , lookup
Helitron (biology) wikipedia , lookup
Nucleic Acids DNA and RNA are nucleic acids, long, thread-like polymers made up of a linear array of monomers called nucleotides All nucleotides contain three components: 1. A nitrogen heterocyclic base 2. A pentose sugar 3. A phosphate residue Chemical Structure of DNA vs RNA Ribonucleotides have a 2’-OH Deoxyribonucleotides have a 2’-H Bases are classified as Pyrimidines or Purines Structure of Nucleotide Bases The nucleus contains the cell’s DNA (genome) RNA is synthesized in the nucleus and exported to the cytoplasm Nucleus Cytoplasm replication DNA transcription RNA (mRNA) translation Proteins Deoxyribonucleotides found in DNA dA dG dT dC Nucleotides are linked by phosphodiester bonds DNA is double stranded Bases form a specific hydrogen bond pattern Properties of a DNA double helix The strands of DNA are antiparallel The strands are complimentary There are Hydrogen bond forces There are base stacking interactions There are 10 base pairs per turn DNA is a Double-Helix Transcription of a DNA molecule results in a mRNA molecule that is singlestranded. RNase P M1 RNA RNA molecules do not have a regular structure like DNA. hairpin Structures of RNA molecules are complex and unique. RNA molecules can base pair with complementary DNA or RNA sequences. G pairs with C, A pairs with U, and G pairs with U. bulge internal loop Nucleic Acids in Acid and Base The glycosidic bond of DNA and RNA is hydrolyzed by acids. Order of stability: dA, dG < rA, rG < dC, dT < rC, rU dA, dG hydrolyzed in boiling 0.1 M hydrochloric acid in 30 min rA, rG hydrolyzed in boiling 1 M hydrochloric acid in 60 min rC, rU hydrolyzed in boiling 12 M perchloric acid in 60 min DNA is quite stable under basic conditions. RNA is readily hydrolyzed by base. RNA is hydrolyzed under alkaline (basic) conditions Methylation of Nucleotide bases Certain nucleotide bases in DNA molecules are methylated, catalyzed by enzymes. Adenine and Cytosine are methylated more often than Guanine and Thymine. Methylation is confined to specific regions of DNA and aid in biological processes. E. coli DNA is methylated to distinguish its DNA from that of foreign invaders. In eukaryotic cells about 5% of cytidines are methylated, producing 5-methylcytidine. Spontaneous Alterations in Nucleic Acids In a human cell, DNA undergoes spontaneous alterations in structure (mutations). As a cell ages, the number of mutations increases, making it likely that a cell’s normal processes may be altered. There is a link between spontaneous mutation, aging, and carcinogenesis. Depurination Why does DNA contain thymine and not uracil? Hypothesis: If DNA contined uracil, during replication of DNA the uracils would be base-paired with adenine. Deaminated cytosines would also be base-paired with adenine. This would decrease the number of G-C base pairs over time and increase the number of A-U base pairs. Eventually all the G-C base pairs could be lost. The genetic code would not exist as we know it. Ultraviolet light is damaging to DNA Near-UV radiation (wavelengths of 200 – 400 nm) is a significant portion of the solar spectrum. Upon exposure to ultraviolet radiation, two adjacent pyrimidine bases can dimerize. This happens most often between two adjacent thymines. Two products often form: cyclobutane thymine dimer 6-4 photoproduct Nucleic Acids Where are they found in nature? and What do they look like? Genomes Source of DNA Size (bases) Type Escherichia coli 9,200,000 Closed-circular double-stranded DNA Bacillus subtilis 4,200,000 Closed-circular double-stranded DNA F plasmid 95,000 Closed-circular double-stranded DNA phage 48,500 Linear double-stranded DNA T7 phage 40,000 Linear double-stranded DNA M13 phage 6,400 Closed-circular single-stranded DNA MS2 phage 3,600 Linear single-stranded RNA Human 6,000,000,000 Linear double-stranded DNA Fruit fly 270,000,000 Linear double-stranded DNA HIV 9,700 Linear single-stranded RNA DNA molecules are packaged in the cell as structures called chromosomes. Bacteria have a single chromosome. Eukaryotes have multiple chromosomes. A single chromosome contains thousands of genes, each encoding a protein. All of an organism’s chromosomes make up the genome. Humans have 46 chromosomes. The human genome has about 3 billion nucleotide base pairs. The Human Genome http://www.ncbi.nlm.nih.gov/genome/guide/human/ How is DNA packaged into a cell? E. coli has a single double-stranded DNA molecule as its genome. There are 4,639,221 base pairs in the E. coli genome. The DNA is 1.7 mM long, 850 times the length of an E. coli cell. plasmid Large DNA molecules are compacted in a cell by supercoiling. relaxed supercoiled DNA in eukaryotic cells is packaged into nucleosomes, which contain proteins called histones. DNA wrapped around a histone core (side view) Nucleosomes are packaged to form 30 nm fibers Compaction of 30 nm fibers uses nuclear scaffolds In eukaryotes, genes contain exons (coding regions) and introns (non-coding regions). Prokaryotic genes do not contain introns. Telomeres Telomeres are sequences at the end of eukaryotic chromosomes that help stabilize the chromosome. Telomeres are repeats of the following sequence: 5’-(TxGy)n 3’-(AxCy)n x and y = 1 to 4 The TG strand is longer 5’-TTTGGTTTGGTTTGGTTTGGTTTGGTTTGG… 3’-AAACCAAACCAAACC… Can be >10,000 nucleotides in mammals. The ends of the chromosome are replicated by the enzyme telomerase. Telomeres and aging There appears to be a relationship between the length of telomeres at the end of chromosomes and the age of an individual. The older you are, the shorter your telomeres are. Germ-line cells (reproductive cells) contain telomerase activity. Non-germ-line cells (somatic cells) do not contain telomerase activity. We have a certain length of telomeres that we are born with. As we age, the telomeres get shorter. Is our life-span pre-determined by the length of our telomeres? Internet Resources Nucleic Acids National Center for Biotechnology Information (NCBI) National Library of Medicine (NLM) National Institutes of Health (NIH) http://www.ncbi.nlm.nih.gov/ GenBank GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences ( Nucleic Acids Research , 2011 Jan;39(Database issue):D32-7 ). There are approximately 126,551,501,141 bases in 135,440,924 sequence records in the traditional GenBank divisions and 191,401,393,188 bases in 62,715,288 sequence records in the WGS division as of April 2011. BLAST SEARCH What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses an algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity. The core of NCBI 's BLAST services is BLAST 2.0 otherwise known as "Gapped BLAST". This service is designed to take protein and nucleic acid sequences and compare them against a selection of NCBI databases. Instead of relying on global alignments (commonly seen in multiple sequence alignment programs) BLAST emphasizes regions of local alignment to detect relationships among sequences which share only isolated regions of similarity. Therefore, BLAST is more than a tool to view sequences aligned with each other or to calculate percent homology, but a program to locate regions of sequence similarity with a view to comparing structure and function. The BLAST search pages allow you to select from several different programs Below is a table of these programs. Program Description blastp Compares an amino acid query sequence against a protein sequence database. blastn Compares a nucleotide query sequence against a nucleotide sequence database. blastx Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence. tblastn Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. tblastx Compares the six-frame translations of a nucleotide query sequence against the sixframe translations of a nucleotide sequence database. Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is computationally intensive. Nucleotide Databases Database nr Description All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences). month All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days. dbest Non-redundant database of GenBank+EMBL+DDBJ EST Divisions. dbsts Non-redundant database of GenBank+EMBL+DDBJ STS Divisions. mouse ests The non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism mouse. human ests The Non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism human. other ests The non-redundant database of GenBank+EMBL+DDBJ EST Divisions all organisms except mouse and human. yeast Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences. Not a collection of all Yeast nucelotides sequences, but the sequence fragments from the Yeast complete genome. E. coli pdb E. coli (Escherichia coli) genomic nucleotide sequences. Sequences derived from the 3-dimensional structure of proteins. kabat [kabatnuc] Kabat's database of sequences of immunological interest. patents Nucleotide sequences derived from the Patent division of GenBank. vector Vector subset of GenBank(R), NCBI mito Database of mitochondrial sequences alu Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. It is available at epd Eukaryotic Promotor Database ISREC in Epalinges s/Lausanne (Switzerland). gss Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences. htgs High Throughput Genomic Sequences. Nucleic Acid Sequence What does it encode? CGTGATGAACGGCTTCGAGCGATACGAGGGAGTGCGTCACTGCCGCTATGTGGACGAGTTGCAG ATCGTCCAGAATGCGCCATGGACTCTGTCCGATGAATTCATCGCCGACAACAAAATCGACTTTGT GGCCCACGACGACATTCCGTATGTAACCGATGGCATGGACGACATCTATGCTCCTCTCAAGGCGC GCGGCATGTTTGTGGCCACGGAGCGCACTGAGGGTGTGTCCACCTCGGACATCGTAGCCCGGAT CGTCAAGGATTACGATCTGTATGTGCGTCGTAATCTGGCCAGAGGCTATTCGGCCAAGGAACTCA ATGTGTCGTTCCTGTCCGAGAAGAAGTTCCGGCTGCAGAACAA Problem • As an employee of the Environmental Protection Agency (EPA) you are charged with maintaining safe public swimming in lakes. • In a sample isolated from a lake used for swimming and boating you discover the following nucleic acid molecule that you believe is part of a larger gene sequence. You suspect the organism from which the gene came may be harmful to the public. • 5’-CATCCAGGGAATCACCAAGCCCGCCATTCGCCGTCTGGCTCGCCG-3’ • Determine if you should shut down public access to the lake, or if the lake is safe. Problem #2 • In the middle of the swimming season you re-test the lake to make sure it is safe for human use. • In a sample isolated from the lake you discover the following nucleic acid molecule that you again believe is part of a larger gene sequence. You wonder the organism from which the gene came may be harmful to the public. • 5’-GTCGAAGCGCCACTCGAAGGAGAAGGACACGCTCGGGGGCATCAC-3’ • As before, determine if you should shut down public access to the lake, or if the lake is safe. Regulatory Proteins DNA sequences recognized by regulatory proteins are often inverted repeats of a short DNA sequence. These repeats form a palindrome with two-fold symmetry about a central axis. DNA binding proteins are often dimeric, with two identical protein subunits. Each subunit binds to one strand of the DNA. 5’-TACGGTACTGTGCTCGAGCACTGCTGTACT-3’ 3’-ATGCCATGACACGAGCTCGTGACGACATGA-5’ central axis Protein – DNA interaction Proteins often bind to specific sequences of DNA. Example: Restriction enzyme EcoRI binds to the DNA sequence 5’-GAATTC-3’ 3’-CTTAAG-5’ Restriction Fragment Length Polymorphism (RFLP) A variation in sizes of DNA seen after cutting with restriction enzymes. Restriction enzymes cut DNA at a specific site. For example, the EcoR1 restriction enzyme cuts DNA whenever it sees the letters GAATTC: DNA before cutting by EcoR1: 5’-AATCTAGGGAATTCACAGCGATGCGAATTCGCAATTA-3’ 3’-TTAGATCCCTTAAGTGTCGCTACGCTTAAGCGTTAAT-5’ DNA after cutting by EcoR1: 5’-AATCTAGGG AATTCACAGCGATGCG AATTCGCAATTA-3’ 3’-TTAGATCCCTTAA GTGTCGCTACGCTTAA GCGTTAAT-5’ In this example, EcoR1 has cut the one strand of 37 base pairs into 3 smaller strands of DNA. If another person has slightly different DNA, EcoR1 may cut the DNA into pieces of different lengths. (For example: If the second GAATTC is GAATTT, EcoR1 will cut this other person's DNA in only one place, producing 2 smaller strands of DNA.) The words "fragment length polymorphism" mean "DNA pieces of different lengths." RFLPs are a quick way to see if two pieces of DNA are identical, without having to look at the entire DNA sequence. IS6110 Fingerprints of M. tuberculosis DNA Profiling Each person has a unique set of fingerprints. As with a person’s fingerprint no two individuals share the same genetic makeup. This genetic makeup, which is the hereditary blueprint imparted to us by our parents, is stored in the chemical deoxyribonucleic acid (DNA), the basic molecule of life. Examination of DNA from individuals, other than identical twins, has shown that variations exist and that a specific DNA pattern or profile could be associated with an individual. These DNA profiles have revolutionized criminal investigations and have become powerful tools in the identification of individuals in criminal and paternity cases. The first widespread use of DNA tests involved RFLP (restriction fragment length polymorphism) analysis, a test designed to detect variations in the DNA from different individuals. In the RFLP method, DNA is isolated from a biological specimen (e.g., blood, semen, vaginal swabs) and cut by an enzyme into restriction fragments. The DNA fragments are separated by size into discrete bands in a gel (gel electrophoresis), transferred onto a membrane, and identified using probes (known DNA sequences that are "tagged" with a chemical tracer). The resulting DNA profile is visualized by exposing the membrane to a piece of x-ray film which allows the scientist to determine which specific fragments the probe identified among the thousands in a sample of human DNA. A "match" is made when similar DNA profiles are observed between an evidentiary sample and those from a suspect’s DNA. A determination is then made as to the probability that a person selected at random from a given population would match the evidence sample as well as the suspect. The entire analysis may require from 6 to 10 weeks for completion. restriction fragment length polymorphism (= RFLP) Technique, also known as DNA fingerprinting, that allows familial relationships to be established by comparing the characteristic polymorphic patterns that are obtained when certain regions of genomic DNA are amplified (typically by PCR) and cut with certain restriction enzymes. In principle, an individual can be identified unambiguously by RFLP (hence the use of RFLP in forensic analysis of blood, hair or semen). Similarly, if a polymorphism can be identified close to the locus of a genetic defect, it provides a valuable marker for tracing the inheritance of the defect. The matching process for identifying DNA profile patterns which either "exclude" or "include" a person as being the parent of a child is shown in the figure below. In this instance man 1 is excluded from paternity and man 2 is included as a possible father of the child. Parentage Testing RNA and DNA Viruses Viruses • disease-causing agents that can multiply only in cells • viruses are DNA or RNA enclosed by a protective coat that enables them to move from one cell to another. • Viral-infected cells often break open (lyse) and allows viruses access to nearby cells. • A protein shell (capsid) surrounds the nucleic acid of most viruses. In many viruses the protein capsid is further enclosed by a lipid bilayer membrane that contains proteins. Viral capsid The capsids of some viruses, all shown at the same scale. (A) Tomato bushy stunt virus; (B) poliovirus; (C) simian virus 40 (SV40); (D) satellite tobacco necrosis virus. Acquisition of a viral envelope The Coats of Viruses Bacteriophage T4, a large DNA-containing virus that infects E. coli. Potato virus X, a filamentous plant virus that contains an RNA genome. Adenovirus, a DNA-containing virus that can infect human cells. The protein capsid forms the outer surface of this virus. Influenza virus, a large RNAcontaining animal virus whose protein capsid is enclosed in a lipid envelope with protruding spikes of viral glycoprotein Several types of viral genomes The smallest viruses contain only a few genes and can have an RNA or a DNA genome; the largest viruses contain hundreds of genes and have a double-stranded DNA genome. T4 bacteriophage chromosome This schematic shows the positions of the more than 30 genes involved in T4 DNA replication. The genome of bacteriophage T4 consists of 169,000 nucleotide pairs and encodes about 300 different proteins. The life cycle of the Semliki forest virus The virus parasitizes the host cell for most of its biosyntheses The life cycle of a retrovirus • The retrovirus genome consists of an RNA molecule of about 8500 nucleotides; two such molecules are packaged into each viral particle. • The enzyme reverse transcriptase first makes a DNA copy of the viral RNA molecule and then a second DNA strand, generating a double-stranded DNA copy of the RNA genome. • The integration of this DNA double helix into the host chromosome, catalyzed by the viral enzyme integrase, is required for the synthesis of new viral RNA molecules by the host-cell RNA polymerase. reverse transcription messenger RNA (mRNA) transfer RNA (tRNA) ribosomal RNA (rRNA) The life cycle of a retrovirus The AIDS Virus Is a Retrovirus • In 1982 physicians first became aware of a new sexually transmitted disease that was associated with an unusual form of cancer (Kaposi's sarcoma) and a variety of unusual infections. Because both of these problems reflect a severe deficiency in the immune system - specifically in helper T lymphocytes - the disease was named acquired immune deficiency syndrome (AIDS). By culturing lymphocytes from patients with an early stage of the disease, a retrovirus was isolated that is now known to be the causative agent of AIDS. • The retrovirus, called human immunodeficiency virus (HIV), enters helper T lymphocytes by first binding to a functionally important plasma membrane protein called CD4. There are two features of HIV that make it especially deadly. First, it eventually kills the helper T cells that it infects rather than living in symbiosis with them, as do most other retroviruses, and helper T cells are vitally important in defending us against infection. Second, the provirus tends to persist in a latent state in the chromosomes of an infected cell without producing virus until it is activated by an unknown rare event; this ability to hide greatly complicates any attempt to treat the infection with antiviral drugs. • Much current research on AIDS is aimed at understanding the life cycle of HIV. The complete nucleotide sequence of the viral RNA, which encodes nine genes, has been determined. This has made it possible to identify and study each of the proteins that it encodes. The three-dimensional structure of its reverse transcriptase is being used to help design new drugs that inhibit the enzyme. A map of the HIV genome The HIV genome is about 9000 nucleotides and contains nine genes. Three of the genes (green) are common to all retroviruses: gag encodes capsid proteins, env encodes envelope proteins, and pol encodes both the reverse transcriptase and the integrase proteins. The HIV genome contains six small genes (in red) plus the three (in green) that are normally required for the retrovirus life cycle. 1. • • 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Attachment CD4-gp120 Interaction Gp120-Chemokine Receptor Interaction Viral Fusion/Uncoating Reverse Transcription RNaseH Degradation Second Strand Synthesis Migration to Nucleus Integration Latency Early Transcription Late Transcription RNA Processing Protein Synthesis Protein Glycosylation Assembly of Virion Viral Budding Virion Maturation HIV binds to the CD4 receptor on the host cell. CD4 is present on the surface of many lymphocytes, which are a critical part of the body's immune system. A coreceptor, CXCR4 and/or CCR5, is needed for HIV to enter the cell. The HIV envelope fuses with the host cell membrane. The viral capsid and its contents enter the host cell The RNA HIV genome and the enzyme reverse transcriptase are released in to the host cell Reverse transcriptase makes a DNA copy of the RNA HIV genome. First, a single-stranded DNA is made Reverse transcriptase then makes a double-stranded DNA copy of the HIV genome The enzyme integrase fuses the double-stranded copy of the DNA genome with the host cell genome in the nucleus mRNA is produced encoding HIV proteins mRNA is translated to produce HIV-encoded polypeptides, including HIV protease HIV protease cleaves polypeptides and makes functional HIV proteins A new HIV particle is assembled at the cell surface and buds off The HIV virus particle leaves to infect other cells Human immunodeficiency virus (HIV) leaving an infected T lymphocyte Preventing and treating AIDS Vaccines? Preventing and treating AIDS Vaccines? Modern vaccines for viral infections often consist of one or more coat proteins of the virus that are not themselves infectious, but elicit an immune response from the person receiving the vaccine. Preventing and treating AIDS Vaccines? Modern vaccines for viral infections often consist of one or more coat proteins of the virus that are not themselves infectious, but elicit an immune response from the person receiving the vaccine. HIV reverse transcriptase has an error rate of one nucleotide per 2000. This means that the amino acid sequence of the HIV coat proteins is constantly changing. Preventing and treating AIDS Drugs? What should we target? Anti-HIV chemotherapy • • • • • • • • • • • • • • • • • • • • Antiretroviral Agents Currently Available (generic name/Trade name) zidovudine/Retrovir (AZT, ZDV) didanosine/Videx, Videx EC (ddI) zalcitabine/HIVID (ddC) stavudine/Zerit (d4T) lamivudine/Epivir (3TC) abacavir/Ziagen (ABC) nevirapine/Viramune (NVP) delavirdine/Rescriptor (DLV) efavirenz/Sustiva (EFV) tenofovir DF/Viread (TDF) indinavir/Crixivan ritonavir/Norvir saquinavir/Invirase, Fortovase nelfinavir/Viracept amprenavir/Agenerase lopinavir/ritonavir, Kaletra FUZEON (enfuvirtide, ENF or T-20) Anti-PDI antibodies T22 – Tyr-5,12,Lys-7]polyphemusin II Pharmacogenomics Title The study of how anSlide individual's genetic inheritance affects the body's response to drugs. Holds the promise that drugs might one day be tailor-made for individuals and adapted to each person's own genetic makeup. Combines traditional pharmaceutical sciences such as biochemistry with annotated knowledge of genes, proteins, and single nucleotide polymorphisms. Wouldn't it be wonderful if you knew exactly what measures you could take to stave off, or even prevent, Wouldn’t it be wonderful… the onset of disease? Wouldn't it be a relief to know that you are not allergic to the drugs your doctor just prescribed? Wouldn't it be a comfort to know that the treatment regimen you are undergoing has a good chance of success because it was designed just for you? With the recent harvest of millions of Single Nucleotide Polymorphisms (SNPs) biomedical researchers now believe that such exciting medical advances are not that far away. Sanger dideoxy nucleotide DNA sequencing uses a DNA polymerase to determine the sequence of DNA Sanger dideoxy sequencing incorporates dideoxy nucleotides, preventing further synthesis of the DNA strand Automated DNA Sequencing Automated DNA sequencing uses a mixture of unlabeled deoxy nucleotides and dideoxy nucleotides labeled with a fluorescent dye. A computer then determines the identity of the labeled nucleotide as each DNA fragment migrates through a polyacrylamide gel. What are SNPs and How are They Found? A Single Nucleotide Polymorphism, or SNP (pronounced "snip") is a small genetic change, or variation, that can occur within a person's DNA sequence. A single base change found in 1% of an ethnically diverse population is defined as a SNP. An example of a SNP is the alteration of the DNA segment AAGGTTA to ATGGTTA. Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences." SNPs found within a coding sequence are of particular interest to researchers as they are more likely to alter the biological function of a protein. Due to recent advances in technology, coupled with the unique ability of these genetic variations to facilitate gene identification, there has been a recent flurry of SNP discovery and detection. Although many SNPs do not produce physical changes in people, scientists believe that other SNPs may predispose a person to disease and even influence their response to a drug regimen. Needles in a Haystack Finding single nucleotide changes in the human genome seems like a daunting prospect, but, over the last 20 years, advances in DNA sequencing and recombinant DNA technology have made it possible to do just that. Selected regions of a DNA sequence obtained from multiple individuals who share a common trait are compared. Many common diseases in humans are caused by genetic variation within genes, some influenced by complex interactions among multiple genes. Therefore, a person may have a genetic predisposition, or the potential to develop a disease based on genes and hereditary factors. Genetic factors may also determine the severity or progression of disease. Since we do not yet know all of the factors involved in these intricate pathways, researchers have found it difficult to develop screening tests for most diseases and disorders. By studying stretches of DNA that have been found to harbor a SNP associated with a disease trait, researchers may begin to reveal relevant genes associated with a disease. Why are SNPs important to pharmaceutical companies? Estimate: Most commonly used drugs will only be effective in 30% to 60% of patients with the same disease. In addition, a subset of these patients may suffer side effects. Adverse drug reactions have been reported to be in the top five leading causes of death in the United States, with an economic impact up to $100 billion annually. Severe adverse effects have lead to the withdrawal of blockbuster drugs Rezulin, Seldane, Redux, and Pondimin. Bringing a new drug to market is estimated to cost as much as $500 million. Being able to predict a population’s response to a drug would be invaluable to the pharmaceutical industry. SNPs and Drug Interactions Drug Absorption in the breast Drug in breast tissue Metabolism in the liver Transportation in the blood Drug in bloodstream Transporter Drug becomes inactive or toxic Excretion in the kidney SNP Profiles and Response to Drug Therapy Breast Cancer Patients Individual SNP Profiles Are Sorted Responds to Standard Drug Treatment Does Not Respond to Standard Drug Treatment SNP profile A SNP profile B SNP profile E SNP profile C SNP profile D Cancer is many diseases "WE ALREADY know that if we sample tumor tissues from 100 different women, those tissues would have a molecular makeup that would break up into different categories. In essence, those patients [each] have a different disease, but we just happen to be calling it the same thing--breast cancer," Conway says. "We think we're going to subdivide diseases. Once we get people with the right disease diagnosis, the disease definitions are going to change from 'You have breast cancer' to 'You have molecular profile A, B, C, or D.' The treatments of those diseases are going to be different." SNPs and Cancer SNPs May Be the Solution SNPs A SNPs B SNPs C SNPs D What Is Variation in the Genome? Common Sequence Variations Polymorphism Deletions Insertions Chromosome Translocations Variations Causing Latent Changes = Variations in DNA that cause latent effects Many years later Many years later What Is Variation in the Genome? Common Sequence Variations Polymorphism Deletions Insertions Chromosome Translocations Variations Causing Latent Changes = Variations in DNA that cause latent effects Many years later Many years later Age-dependent Frequency in SNPs Sequenom's scientists are interested in changes in the frequency of SNPs as the population ages. "We take advantage of the fact that most human diseases are lateonset. Age is a major risk factor," Cantor says. "If young people are carrying a harmful variation, they're still well, whereas an old person carrying that same variation has a very high chance that he's been made sick or killed by it. You make the prediction that variations that are harmful to health should decline in frequency as a function of age in the healthy population.“ One percent of genes appears to show an age-dependent frequency in SNPs, Cantor says. He suspects that only 200 to 400 genes will be involved in disorders that affect a major population. Finding these genes in healthy people, however, gives no indication of what the diseases actually are. "After we find them in the healthy population, we have to go back and look at biochemically stratified populations or clinically stratified populations," he explains. "The advantage is that instead of having to do all the genes with these tricky populations, we only have to do 200 to 400. We can pay a lot more attention to the details." Laboratory Experiment Isolation of genomic DNA from the bacterium Escherichia coli (E. coli) E. coli has a single double-stranded DNA molecule as its genome. There are 4,639,221 base pairs in the E. coli genome. Promega Wizard® Genomic DNA Purification Kit The Wizard Genomic DNA Purification Kit is designed for isolation of DNA from white blood cells, tissue culture cells and animal tissue, plant tissue, yeast, Grampositive and Gram-negative bacteria. 1. Lyse (break open) the cells and the nuclei. An RNase digestion step may be included at this time. Depending on the DNA isolation method used, RNA will be co-purified with genomic DNA. Spectrophotometric measurements do not differentiate between DNA and RNA, so RNA contamination can lead to overestimation of DNA concentration. Treatment with RNase A will remove contaminating RNA; this can either be incorporated into the purification procedure or performed after the DNA has been purified. 2. Remove the cellular proteins by a salt-precipitation step, which precipitates the proteins but leaves the high molecular weight genomic DNA in solution. 3. Concentration of the genomic DNA followed by desalting by isopropanol precipitation. Laboratory Experiment 1. Determination of the molar absorptivity of adenosine 5’monophosphate (AMP) 2. Determination of the concentration of AMP in an aqueous solution 3. Determination of the concentration of DNA in purified E. coli genomic DNA solution 4. Determination of the concentration of DNA in oligonucleotide solutions