* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Nucleic Acids and Chromatin
Epigenetics of neurodegenerative diseases wikipedia , lookup
Oncogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
DNA profiling wikipedia , lookup
Histone acetyltransferase wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
DNA methylation wikipedia , lookup
DNA polymerase wikipedia , lookup
Designer baby wikipedia , lookup
Human genome wikipedia , lookup
SNP genotyping wikipedia , lookup
Epigenetics wikipedia , lookup
Genomic library wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Microevolution wikipedia , lookup
Point mutation wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA vaccination wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Genome editing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Cancer epigenetics wikipedia , lookup
DNA supercoil wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Microsatellite wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Primary transcript wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Helitron (biology) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic Acids and Chromatin John O. Thomas Objectives: - Use the principles of nucleic acid biology to be able to select the most appropriate diagnostic test for your patient and interpret the test results in light of limitations of the test. - Understand how proteins can interact strongly with specific nucleotide sequences and be able to apply these general principles to an understanding of gene expression. - Understand how chromosomal structure and chromatin structure affect gene expression. Understand how modification of chromatin structure can lead to epigenetic inheritance. Supplementary materials: These can be found on the Molecular basis of Medicine web site, I. Nucleic acid structure A. Chemical structure and nomenclature of the nucleotides. 1. DNA and RNA are polymers of nucleotides (polynucleotides). Nucleotides contain a base, a sugar and a phosphate. a. The base is either a purine (A & G), or a pyrimidine (T & C for DNA or U & C for RNA). In many cases the bases contain chemical modifications which may affect their function. Some of these are discussed below and in later lectures. b. The sugar is either ribose in the case of RNA or 2' deoxyribose in the case of DNA. The carbons of the sugar are numbered with primes (1' to 5'). The base is connected to the sugar through an N-glycosidic linkage with the 1' position. c. A phosphate is joined, through a phosphoester bond, to the 5' position of the sugar. d. The 2' OH of RNA can, like the serine-OH, function as a catalytic center. Two important consequences are 1) that RNA is much more susceptible to hydrolysis than DNA and 2) some RNAs catalyze biologically important reactions. 2. Nucleotides are joined together by phosphodiester bonds. a. Usually the bonds are between the 5' and 3' positions. Thus, polynucleotides have a polarity and one can refer to a 5' to 3' direction or a 3' to 5' direction. For linear (as opposed to circular) polynucleotides, one can also refer to 3' and 5' ends. b. Other phosphodiester linkages are also possible. For example, a 2'-5' phosphodiester is formed as an intermediate in RNA splicing, and the RNA cap structure of eukaryotic mRNA and snRNA contain nucleotides linked 5'-5'. These will be discussed in the lectures on transcription. 3. The length of a polynucleotide is measured as the number of bases or base pairs (b or bp). DNAs and RNAs may contain thousands to millions of bases or base pairs, in which case their sizes are expressed as kilo- or mega- bases or base pairs (kb or Mb). 4. Nucleotide sequences can be written in several ways. Often the nucleotides are represented by single letters (A,C,G,T or U) denoting the bases and p for phosphates. Unless otherwise indicated, the sequence is written from 5' to 3' (left to right): pppApCpGpT (5' triphosphate, 3' OH); pApCpGpT (5' phosphate, 3'OH); ApCpGpTp (5' OH, 3' phosphate). Usually, the presence of the phosphodiesters is not written, as for example: pACGT. Usually, when writing sequences of double stranded DNAs, the sequence of only one of the two strands is written with it being understood that the second strand has the 1 complementary sequence. Unless otherwise noted, the sequence is written in the 5' to 3' direction: ACGT refers to the sequence: 5'ACGT3' 3'TGCA5' II. Some examples of nucleic acids, and their relative sizes (for general information; not to be learned). DNA E. coli chromosomal DNA E. coli plasmid human genome human chromosome human mitochondrial DNA adenovirus DNA Form circular circular linear linear circular linear Approx. base pairs 4 x 106 1-200 x 103 3 x 109 50-250 x 106 20 x 103 36 x 103 RNA messenger RNA (mRNA) primary mRNA transcript (pre-mRNA) ribosomal RNA (5S,5.8S,18S,28S) Transfer RNA Small nuclear RNA Polio virus RNA (genome) HIV RNA Approx size (kb) ~2 0.2-30 0.12 - 5.1 0.08-0.1 0.1-0.2 7.44 9.7 Approx. length 1.5 mm 0.3-70 µm 1.7-8.5 cm 7 µm 12 µm III. DNA based diagnostics. A. Hybridization 1. The two strands of a DNA double helix can be separated by heating a solution containing the DNA. The transition from double stranded DNA to single stranded DNA occurs over a temperature range of a few degrees. The midpoint of this transition is the melting temperature, abbreviated as Tm. 2. The melting temperature, Tm, is largely dependent on the number of hydrogen bonds that hold the two DNA strands together. a. A double helix formed between two strands that are not perfectly complementary in sequence has a lower Tm than a double helix formed between perfectly complementary strands b. DNA with a high G:C content has a higher Tm than DNA with high A:T content. c. If reagents that disrupt hydrogen bonds (such as urea or formamide) are added to the solution of the DNA, the Tm is lowered. d. Lowering the ionic strength of the solution of DNA lowers the Tm. This is because the repulsive forces of the negatively charged phosphates are decreased by counter ions. e. Extremes of pH, disrupt the hydrogen bonds and hence convert double stranded DNA to single strands. 2 3. When a solution of denatured DNA is cooled. Complementary bases form base pairs. a. If the solution is cooled very slowly, the original duplex structure will reform since it is thermodynamically the most stable state. b. If the solution is cooled rapidly, the original duplex does not reform: base pairing between short complementary sequences on the same strand takes place before the complementary strands have a chance to find each other. 100 % double helix heating + slow cooling rapid cooling slow cooling 50 rapid cooling 0 80 85 90 Tm Temperature oC Denaturation an renaturation of DNA by heating. The Tm is the temperature at which half of the nucleotides are base-paired. 4. Hybrid DNA molecules can be formed by denaturing DNA (the target DNA) then renaturing it in the presence of a single-stranded competing DNA. a. Typically the target DNA is a chromosome or PCR product (see below). b. Typically the competing DNA is a synthetic oligonucleotide with a sequence that is complementary to a specific region of the target DNA. The oligonucleotides are usually about 20 nucleotides long. Based on probability, oligonucleotides of this length are likely to hybridize to just one location within the 3 X 109 base pairs of the human genome (there are 420 , about 3 X 1012, possible 20 nucleotide sequences). B. Fluorescence In situ hybridization (FISH) is used to detect anomalies in the number or structure of chromosomes. It is widely used in prenatal diagnosis and in the diagnosis of cancers. With this technique, individual chromosomes can be identified and the position(s) of particular DNA sequences on a chromosome can be observed. 1. Procedure: a. Obtain a DNA that is homologous to a chromosomal region of interest and label it with a fluorescent dye. Fluorescent DNAs that are homologous to specific regions of the chromosomes are commercially available (e.g. see www.vysis.com). a. Mount chromosomes (or nuclei) on a microscope slide. b. Denature the DNA (it remains attached to the slide). c. Renature the DNA in the presence of a probe DNA. d. The probe DNA is a fluorescently labeled DNA that is homologous to the chromosomal region of interest. FISH probes are typically quite large, limiting resolution to large genetic changes; point mutations can not be detected. Fluorescent 3 DNA probes that are homologous to specific regions of the chromosomes are commercially available (e.g. see www.vysis.com). e. Observe the slide by fluorescence microscopy. The probes that are bound to the DNA will be observed as colored regions. 2. Applications (these will be discussed in greater detail in the cytogenetics lectures): a. Prenatal diagnosis of disorders such as trisomy 21 (Down syndrome). FISH is especially useful when a prompt diagnosis is of importance. b. Diagnosing the presence of deletions, insertions or rearrangements. To be visible by FISH, these abnormalities must be large, on the order of thousands of base-pairs. c. Diagnosis of chromosomal abnormalities that are common in some types of cancer. This is used in diagnosis and to monitor the progress of chemotherapy. C. Polymerase Chain Reaction (PCR) is an elegant method for amplifying a defined region of DNA (or RNA). Although it is simple, it is a very powerful tool that is used widely in diagnostics. Central to the process of PCR is DNA polymerase, the enzyme that synthesizes DNA. PCR depends on the fact that DNA polymerases, are only capable of elongating a preexisting polynucleotide or oligonucleotide chain. They can not initiate polymerization. An oligonucleotide that serves as the starting point for elongation is referred to as a primer. (The biological functions of the DNA polymerases will be discussed in detail in lectures on DNA synthesis and repair.) Steps in the PCR procedure: 1. Set up: Step 1. Purchase two synthetic oligonucleotide primers with sequences such that: - The primers will hybridize to sequences that flank the region to be amplified - The primers will hybridize with opposite strands. Oligonucleotides of any desired sequence can be automatically synthesized by machines. A primer is typically about 20 nucleotides long, so that its sequence is likely to be present at only one location in the entire genome. Step 2. Mix the DNA to be amplified with a large molar excess of the primers, the four deoxynucleoside triphosphates, heat-stable DNA polymerase and buffer. 2. Steps of an automated process. The following steps are repeated many times. For n cycles, a 2n fold amplification of the DNA will result; 30 cycles will produce a billion fold amplification. Step 1. The DNA strands are separated by heating Step 2. The solution is cooled to allow the primers to hybridize with the DNA Step 3. The heat stable DNA polymerase synthesizes complementary strands by extending the primers. 4 3. Steps in the automated process: First step 1: unwind by heating 5'....__________________________________....3' Second step 1: again unwind by heating 5'....__________________________________....3' + 3'....__________________________________....5' + 3'....__________________________________....5' First step 2: hybridize primer by cooling 5'....__________________________________....3' primerB5' + 5'primerA 3'....__________________________________....5' + 5'______________________________....3' + 3'...._______________________5' First step 3: polymerase extends primer 5'....__________________________________....3' 3'....___________________________ primerB5' After nth step 3 this will be the major product 5'primerA __________________ 3' 3'___________________ primerB5' + 5'primerA_____________________________....3' 3'....__________________________________....5' D. Electrophoresis is commonly used to separate DNAs and RNAs on the basis of their size and/or shape. Nucleic acids ranging from mononucleotides to entire chromosomes can be analyzed by electrophoresis. 1. The electrophoretic mobility is dependent on: a. Size of the nucleic acid. b. Conformation (single stranded, double stranded, circular, supercoiled). c. Conditions of electrophoresis (e.g. porosity of the media; polyacrylamide is used for oligonucleotides and DNAs up to about 500 bp, agarose is used for larger DNAs. Specialized electrophoretic methods can be used to separate very big molecules such as chromosomes). 2. Example: detection of the cystic fibrosis ∆F508 allele in heterozygous or homozygous patients. E. Blotting. By combining electrophoresis with hybridization, it is possible to identify one particular DNA in a complex mixture of DNAs. 1. DNA that is present in the electrophoresis gel is transferred to the surface of a paper-like substrate to which it binds. This is easy to handle and allows oligonucleotides and other chemicals to have easy access to the DNA. An oligonucleotide that is complementary to a sequence of interest is then added and hybridized (denaturation and renaturation) with the targeted DNA sequence. The oligonucleotide, referred to as a probe, is long enough so that its sequence is likely to be present at only one location in the entire genome, (typically 18-20 nucleotides). The oligonucleotide is labeled (by radioactivity or color) so that it can be detected. 2. If DNA is the molecule that has been electrophoresed and transferred to the paper, the blotting and hybridization procedure is known as a Southern Blot. 3. If RNA is electrophoresed and transferred to the paper, the procedure is known as a northern blot. 4. When proteins are separated by electrophoresis, transferred to paper, and detected by 5 reaction with a specific antibody, the procedure is known as a western blot (discussed in the Proteomics lectures). F. Allele Specific Oligonucleotides (ASOs) are used in conjunction with PCR to specifically detect a particular allele. 1. An ASO is an oligonucleotide, typically about 18 base pairs long, with a sequence that is complementary to the DNA sequence of the allele to be detected. a. ASOs are usually used in pairs, one ASO being complementary to the normal allele, and the other being complementary to the variant allele that one wishes to detect. ASOs of any sequence can be synthesized and labeled with radioactivity or a chemical tag so that it can be detected. b. A region surrounding and including the mutation to be analyzed is amplified by PCR to provide sufficient DNA for the test. c. Each allele is detected by hybridizing the PCR-amplified DNA with the ASO for the allele of interest under conditions such that the ASO binds only to a perfectly complementary sequence, but not to a sequence with a mismatched base pair (high stringency hybridization). 2. Example: Detection of a cystic fibrosis point mutation (also see the example in the “courseware” section of the course web site. 3. A serious drawback to the use of ASOs is that each ASO will detect only one allele. The suspected disorder will be missed if it is due to a mutation that is different than the specific mutations that are examined. a. For diseases due to genes that have one or only a few alleles in the population, ASO testing can provide a powerful screening method (for cystic fibrosis screening, the American College of Medical Genetics recommends a panel of 25 ASOs corresponding to the 25 most common mutations). b. For diseases due to genes that have many alleles (such as the familial hypercholesterolemia), screening by ASO testing is not practical. G. DNA arrays are currently used primarily for analyzing patterns of gene expression. For example, the amount of each of thousands of specific mRNAs that are made by a cancer cell can be compared to the amount made by a normal cell. 1. DNA arrays contain thousands of DNA sequences mounted on a substrate (such as a microscope slide or silicon chip). The DNA sequences can be in the form of: a. Small dots of cDNA clones of known genes attached to a microscope slide. b. Oligonucleotides that are synthesized directly on a silicon matrix (about 300,000 sequences on a 1.28 X 1.28 cm array). 2. How DNA arrays are used for analysis of gene expression is illustrated in the following figure of a DNA array containing ten DNAs. In practice, a DNA array would contain thousands of DNAs. mRNA is isolated from a normal cell type and then converted to fluorescently labeled cDNA (several enzymatic steps). mRNA fom the cell type to be compared to normal is isolated and converted to cDNA of a contrasting fluorescent color. The two cDNAs are mixed together, hybridized to the DNAs on the matrix. Each spot on the matrix corresponds to a particular mRNA. The resulting fluorescent color of the spot reflects the relative concentrations of that mRNA that are present in the normal cell vs variant cell. 6 Biopsy Normal Cells Tumor cells mRNA mRNA red cDNA green cDNA Mix and hybridize with DNA array οοοοο οοοοο Results: ••••• ••••• Red spots: genes that are under expressed in the tumor. Yellow spots: (most of them) equally expressed genes. Green spots: genes that are over expressed in the tumor. V. Nucleic acid - protein interactions. A large number of proteins interact with nucleic acids. These interactions are essential for the proper expression of the information that is encoded by DNA and mRNA and for the functions of other RNAs such as rRNA, tRNA and snRNA. A. Some proteins bind to DNA and RNA with little sequence specificity. Proteins such as the histones and viral nucleic acid packaging proteins function to condense or package DNA. Proteins such as the single stranded DNA binding proteins that are involved in DNA synthesis and in recombination also interact with little sequence specificity. B. Some proteins recognize DNA or RNA sequences with a high degree of specificity. Examples of these are proteins that control the expression of genes by binding to specific DNA sequences. As an example of specificity, the E. coli lac repressor protein binds to a 28 bp DNA sequence that must be distinguished from the other four million base pairs of E. coli DNA. It does this by binding 4 million times stronger to its target sequence than to any other region of the DNA. The dissociation constant of the complex is about 10-13M 7 C. The structures of many DNA-protein complexes have been determined by x-ray crystallography, and several general patterns for how proteins interact with nucleic acids have emerged from these studies. You may view and manipulate 3-D models of some of these structures in the tutorial in the “courseware” section of the course web site. 1. Ionic interactions with the phosphate backbone add stability to the DNA-protein complex, but do not confer specificity. 2. Specificity in DNA-protein interactions is usually achieved by recognizing combinations of sequence specific atoms present in the major groove. The minor groove is too small to allow for the recognition of base-specific features by most proteins. Some proteins, however, have the ability to enlarge the minor groove and bind to it by causing a bend in the DNA. You should be able to identify the major groove and the minor groove in the above picture and explain why the two grooves are different sizes (observe the positions of the N-glycosidic bonds in the A:T and C:G pairs shown in the figure below). You should also be able to identify the phosphates, sugars and bases. 3. When DNA is viewed from the major groove, each of the four base pairs offer a different combination of hydrogen bond acceptors, hydrogen bond donors, and van der Waals interactions. Proteins interact with these sequence-specific features when they bind to specific DNA sequences. Similar principles apply for proteins that bind in the minor groove. Major groove H H CH3 H H O H N N N O H N H H H N N H Major groove VdW H N N N N N O N Minor Groove N N H O N H H Minor Groove Arrangement of potential H-bond donor (↑) and acceptor (↓) groups and Van der Waals interactions (VdW) in the major groove of A:T and G:C base pairs. You should identify the N-glycosidic linkages in the above pictures and note the relative positions of the sugars with respect to the major and minor grooves. 8 4. Amino acid side chains can form hydrogen bonds with nucleotides. In a typical DNAbinding protein, several amino acid side chains are spatially oriented so that many hydrogen bonds form with bases. The following are a few examples of how amino acid side chains can interact specifically with bases by binding to sites in the major groove. O O + NH3 O Asparagine H N N H H NH3 H N N H O N Deoxyguanosine N N O N N N N H H N N Arginine H O H N N O O Arginine H Deoxyadenosine N O N N N N H + NH3 Glutamate H O O + NH3 + O N N H Deoxyadenosine Examples of how proteins can recognize specific DNA sequences by binding to groups that extend into the major groove. 5. Several common motifs have been found in proteins that interact with DNA. Examples in the form of 3-dimensional molecular models that can be manipulated are presented at our web site. Examples are also shown in most Biochemistry texts. a. In a "helix-turn-helix" protein, the amino acid side chains that interact with DNA are located on an alpha helix that fits into the major groove of DNA. The recognition helix is held in position by interactions with a second helix connected to the recognition helix by a stretch of relatively unstructured peptide that forms the "turn" in the name helix-turn-helix. b. Proteins containing "zinc fingers" are another important class of DNA binding proteins. Cysteines coordinated with zinc atoms orient a recognition helix so that it will fit into the major groove of the DNA. The “nuclear receptor” class of transcription factors are zinc finger proteins. The nuclear receptors include the glucocorticoid receptor, estrogen receptor, vitamin D receptor thyroid hormone receptor, and several other receptors that will play prominent roles in this course. c. The "leucine zipper" motif is found in several important transcription factors involved in growth control. These proteins contain a long helical section. Part of the helix fits into the major groove of the target DNA, and part of the helix forms a dimerization domain where every seventh amino acid is a leucine. The dimerization domain looks like a zipper hence the name leucine zipper. IV . Eukaryotic Chromosome structure. The human genome includes both nuclear chromosomes, which are large (50-250 Mb) linear DNAs, and mitochondrial DNA, which is much smaller (about 20 kb) and is circular.The following topics refer to the nuclear chromosomes. A. Each chromosome contains specialized structures required for their replication and segregation during mitosis and meiosis. 9 1. The centromeres are regions where sister chromatids associate during mitosis. They are involved in mitotic spindle formation and are required for the proper segregation of the chromosomes to daughter cells. 2. Telomeres are located at the ends of chromosomes and are required for the completion of DNA synthesis. 3. Each chromosome must contain at least one origin of replication; most chromosomes contain multiple origins. B. Most (about 80 - 90%) of the DNA in the genome is present in introns and in the noncoding regions between genes. Although most of this DNA has no known function, it contains sequences that are very useful as genetic markers. 1. One particularly useful class of sequences is known as Simple Sequence Repeats or SSRs (also referred to microsatellites or Short Tandem Repeat Polymorphisms STRs). a. A SSR consists of a repeating short DNA sequence (most often a di- tri- or tetranucleotide repeat). For example (CG)n. b. SSRs occur frequently (there are tens of thousands of them in the human genome) and are distributed rather uniformly over the entire genome. c. A repeat such as (CG)n) may occur multiple times throughout the genome. One specific instance of the repeat can be identified by the unique DNA sequences that flank it. d. SSRs are polymorphic. That is, if a population is analyzed for the length of a particular SSR, a number of different repeat lengths will be observed. e. The length of a SSR is an inheritable trait. Normally, a person has two copies of a particular SSR: one from the person's mother and one from the father. The two copies may be the same length or they may be different lengths. f. A specific SSR can be isolated from an individual’s DNA by PCR amplification with primers that are complementary to the DNA sequences that flank the SSR. Electrophoresis of the PCR product will reveal the length(s) of the SSR that the individual possesses. 2. Microsatellite instability and cancer. Microsatellite is a term commonly used in tumor biology to describe an SSR (the term microsatellite is derived from experimental observations made in the early days of chromatin research). . In some types of cancer, the lengths of microsatellites throughout the genome differ from those found in the patient’s normal tissue. The altered microsatellite lengths can be attributed to improperly repaired DNA in one or a few cells that then undergo clonal expansion to form the tumor. The degree of microsatellite instability (the number of microsatellites with altered lengths) in a tumor may be of importance in diagnosis and for determining the best course of treatment (Hampel et al N Engl J Med. 2005 352(18):1851-60). Microsatellite instability is a commonly seen feature of nonpolyposis colorectal cancer, where it is usually attributed to the loss of the DNA repair enzyme hMLH1, which will be discussed in the lectures on DNA replication and repair. 3. Another type of polymorphism that is becoming increasing popular as a genetic tool is the single nucleotide polymorphism (SNP - pronounced "snip"). The polymorphisms that are observed (usually by sequencing) are differences between individuals in the nucleotides found at particular locations in the geneome. SNPs occur much more 10 frequently in the genome than SSRs. On average there is about one SNP per kb. VI. Chromatin A. In the nucleus, DNA is found associated with a large number of proteins to form chromatin. The packaging of DNA into chromatin serves two main functions: 1) the physical packaging of the chromosome into an ordered and untangled structure that can be replicated and segregated to daughter cells, and 2) the regulation of gene transcription. B. The nucleosome is the fundamental packaging unit of chromatin. 1. Histones H2A, H2B, H3 and H4 (two of each) form an octamer 2. 146bp of DNA is wrapped, in two turns, around this histone octamer. 3. There are about 50bp of DNA between nucleosomes (amount is variable). Nucleosome: 146 bp of DNA wrapped 1.75 turns around an octamer of histones The histone N-terminal tails are toward the surfaces of the nucleosome. Modifications of the histone tails affect nucleosome and chromatin structure. 4. The amino terminal regions of the histones are positively charged, and are located toward the surfaces of the nucleosome. They likely play important roles in maintaining the nucleosome structure and in directing the interactions between nucleosomes that are responsible for higher-order chromatin structures. These Nterminal tails are subject to a number of modifications including acetylation of lysines, methylation of lysines and arginines and phosphorylation of serines. These modifications can result in profound effects on chromatin structure and the expression of the packaged genes. 5. In humans, adjacent nucleosomes are separated by about 50 bp of DNA. When viewed by electron microscopy under partially denaturing conditions the nucleosomes appear as 10nm thick filament resembling beads on a string. 6. In some regions, the position of the nucleosomes on the DNA can be critical since they may mask important DNA sequences. The nucleosomes can be moved by "chromatin remodeling enzymes". These are multi-subunit complexes that are highly regulated and require the hydrolysis of ATP. C. In the nucleus, chromatin is separated into regions of highly condensed and less condensed chromatin that can be distinguished microscopically. The highly condensed regions are referred to as heterochromatin; the less condensed regions as euchromatin. Functionally, DNA sequences that are condensed into heterochromatin are, for the most part, not transcribed into RNA. 1. In heterochromatin, the nucleosomes are further condensed into a 30nm thick fiber. The structure of the 30nm fiber is not currently clear. One model suggests that it consists of a helix of nucleosomes . 11 10 nm fiber 30 nm fiber 2. The compact higher order packing of nucleosomes in heterochromatin is associated with transcriptional inactivity, and is an important mechanism for regulating the expression of genes. A number of factors are involved in the condensation of chromatin into the 30nm fiber including histone modification, and DNA methylation and the binding of histone H1. 3. Regions of a chromosome that are highly condensed are separated from regions that are less highly condensed by short DNA sequences referred to as insulators. Insulator function is mediated by specific insulator binding proteins (Gaszner & Felsenfeld Nat Rev Genet. 2006 7(9):703-13.) D. Higher order chromatin structures 1. In the nucleus, chromatin is organized into large loops of about 20-100kb. These loops may function to delineate regions of more or less highly condensed chromatin structure, and hence regions that are more or less available for transcription into RNA. 2. During mitosis, each chromosome is extremely condensed, with the loops of chromatin being further compacted into a structure organized around a protein scaffold to form the mitotic chromosomes that can be viewed by light microscopy. VII. Chromatin structure is critically important for the regulation of gene expression. A. Three broad classes of chromatin structure can be distinguished according to differences in ability to be transcribed, differences in structure, and differences in histological appearance. Chromatin structures are formed during stem cell differentiation. 1. Repressed chromatin is chromatin that will never be transcribed in a particular cell line. It tends to be packaged into a compact chromatin structure, is methylated at many CpG sequences, contains histone H3 that is methylated on specific lysines including lysine 27. It is seen in the nucleus as heterochromatin. 2. Potentially active chromatin is not transcribed, but may be transcribed in the future in a particular cell line. Transcriptional potential can be transferred to daughter cells. Nucleosomes of potentially active chromatin tends to be under methylated at CpG sequences, and contain histone H3 that is methylated on several specific lysines including lysine 4. 3. Active chromatin is being actively transcribed. The chromatin structure of the promoter regions is loosely folded, and histones in the promoter regions are modified by acetylation and other modifications. B. Chromatin structure is determined by many effectors. Three particularly important ones are histone methylation, histone acetylation, and DNA methylation at CpG dinucleotides. Other factors are also associated with transcriptional activity including other modifications of histones, and the binding of a number of nonhistone proteins. 1. Histone methylation a. Several lysine and argenines residues in the amino-terminal tails of histones can be modified by adding one, two or three methyl groups. The methylation of specific lysines provide signals for determining chromatin conformation. b. One example of the role of histone H3 methylation is chromatin formation and gene expression during differentiation of embryonic stem cells. 12 i. In embryonioc stem cells, many genes contain a di- or tri-methyl group on histone H3 lysine 4 (a H3K4 mark) and/or on histone H3 lysine 27 (a H3K27 mark). ii. Genes that are condensed into chromatin containing a H3K27 mark tend to be transcriptionally repressed. iii. During differentiation of embryonic stem cells, some genes lose the H3K27 marks but retain the H3K4 marks. These genes tend to be expressed in the differentiated cell. Other genes lose the H3K4 marks but retain the H3K27 marks. These genes tend to be repressed in the differentiated cell. 2. Histone acetylation is a key mechanism for regulating transcription. a. Lysines near the N-terminus of the core histones (2a, 2b, 3 and 4) are subject to acetylation. Histone acetylation results in a less condensed chromatin structure; deacetylation favors condensation into the 30nm form. Histone acetylation is usually required for active transcription. b. The cell contains many different Histone Acetyl Transferases (HATs) and histone deacetylases (HDACs) that function in conjunction with other proteins (such as transcription factors and 5-methyl-C binding proteins) that target their activities to specific regions of the chromosome in response to specific cellular and developmental conditions. HATs and HDACs will be discussed in detain in the gene expression lectures. 3. DNA methylation at CpG dinucleotides a. DNA is methylated on the 5 position of C at some CpG sequences. In humans, DNA methylation at CpG is important for X-inactivation, imprinting, and determining the pattern of gene expression during cellular differentiation. NH NH 2 H C 3 N N O Deoxyribose Deoxycytidine O 2 H C 3 N N O Deoxyribose Deoxy5-methyl cytidine N N O Deoxyribose Thymidine The structures of deoxycytidine and deoxy 5-methyl cytidine. Note that 5-methyl cytidine can be converted to thymidine by deamination (a non-enzymatic reaction). This is an important mutagenic event. b. Highly CpG methylated DNA induces the formation of heterochromatin; unmethylated DNA induces the formation of euchromatin. One mechanism that couples CpG methylation to gene expression is as follows. A complex containing a 5-methyl-C binding protein and a histone deacetylase binds to 5-methyl-C and deacetylates neighboring histones. The deacetylation of the histones condenses the chromatin structure in the region near the methylated CpG DNA. The condensed chromatin structure blocks transcription. Other modifications to nucleosomes near methylated DNA such as histone methylation are also likely to play a role in determining the local chromatin 13 structure and hence gene activity. c. The presence of the dinucleotide CpG and its methylation is not randomly distributed throughout the genome. i. Intergenic regions have far less CpG than expected by chance, and the CpG is usually methylated. ii. The 5’ ends of genes are often enriched in the dinucleotide sequence CpG. These are referred to as “CpG islands”. The state of methylation of CpG islands is correlated with gene activity: genes with highly methylated CpG islands are silenced. iii. In differentiated cells, the methylation pattern is an important determinant of the set of genes that can be expressed by that cell type. Undifferentiated cells (such as embryonic stem cells) have a low level of CpG methylation. Untranscribed genes Repressed chromatin --Methylated DNA --H3K27 --deacetylated histones Untranscribed genes Potentially active chronmatin --Unmethylated DNA --H3K4 --deacetylated histones Transcribed genes Noncompact chromatin --Acetylated histones DNA and histone modifications in determining chromatin structure and gene regulation d. The pattern of DNA methylation can be transferred from one cell to its daughter cells. This is an example of epigenetic inheritance. ...CmG ... ...G Cm... DNA replication ...CmG ... ...G C ... + ...C G ... ...G Cm... CmG specific methylation ...CmG ... ...G Cm... + ...CmG ... ...G Cm... A specific pattern of methylation can be transferred to daughter cells. Since the methylation pattern is an important determinant of a cell's phenotype, this is an example of epigenetic inheritance. e. During gametogenesis the methylation pattern is erased. i. Some genes are remethylated during gametogenesis (e.g. imprinting). ii. Some genes are remethylated during early embryogenesis (e.g. X-inactivation). iii. Some genes are remethylated during cellular differentiation. f. Aberrant DNA methylation and cancer i. Cancer cells isolated from many types of tumors often show the loss of one or more critical proteins involved in limiting the growth of cells (the functions of many of these critical proteins will be discussed later in this and other courses). One mechanism that can lead to the loss of these proteins is over-methylation of the promotor sequence. Tumor cells with gene inactivation due to over-methylation are said to have the “CpG island methylator phenotype” (CIMP) and the tumors are referred to as CIMP positive. iii. One example of this is nonpolyposis colorectal cancer where the DNA repair enzyme 14 hMLH1 is missing in the tumor cells (this enzyme will be discussed in detail in the DNA replication and repair lectures). In some of these cancers, the loss of the hMLH1 protein is due to over-methylation of the promotor of the hMLH1 gene. 15