* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A primer on the structure and function of genes
Cre-Lox recombination wikipedia , lookup
Gene desert wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
RNA interference wikipedia , lookup
List of types of proteins wikipedia , lookup
Community fingerprinting wikipedia , lookup
Epitranscriptome wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
RNA silencing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome evolution wikipedia , lookup
Non-coding RNA wikipedia , lookup
Gene expression profiling wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Gene regulatory network wikipedia , lookup
Molecular evolution wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene expression wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
A primer on the structure and function of genes DNA is very nearly the universal genetic material Hereditary information of all life on earth is chemically encoded in molecules of NUCLEIC ACID. Nucleic acids are a linear polymer made up of monomers called NUCLEOTIDES (see figure below). Nucleotides are composed of three subunits: (i) a nitrogenous base; (ii) a pentose sugar; and (iii) a phosphate group (see figure below). Because nucleotides are chemically basic, they are commonly referred to as “BASES”. Two major forms of nucleic acids serve as genetic material; DNA and RNA. DNA is very nearly the universal form of genetic material, with the exception being a number of viruses that use RNA. Note not all viruses use RNA, some use DNA as their genetic material. DNA differs from RNA in that the pentose is 2’deoxyribose, (i.e., there is no hydroxyl group), whereas it is ribose in RNA. Phosphate group P Nitrogenous base Pentose sugar Nucleotide Nucleic acids contain 4 types of bases. For DNA these bases are adenine (A), guanine (G), cytosine (C), and thymine (T). For RNA, uracil (U) is found in stead of thymine (T). Chains of nucleic acids polymers are joined together by hydrogen bonds between specific pairs of bases; hence DNA is sometimes referred to in numbers of “base-pairs”. G pairs with C by means of three hydrogen bonds. In DNA, A pairs with T by two hydrogen bonds; in RNA, A pairs with U, also by two hydrogen bonds. The two chains or “strands” of DNA bonded together in this way form a double helix. Such DNA is called “double-stranded” (dsDNA). O N ║ Guanine H N H --------N --------H --------N H N H NH2 H H Cytosine N O N H Hydrogen Bonds NH2 N ║ O N Adenine H N H N --------- H N H --------- O CH3 Thymine N H The backbone of each polynucleotide chain in nucleic acids is polarized; in fact, the chains bonded together are ANTI-PARALLEL. The sugar-phosphate links in the backbone are directional, with the 5’ position of one pentose ring connected to the 3’ position of the next pentose ring (see figure below). The two polynucleotide chains are bonded in anti-parallel directions. Note, some viruses have genomes that consist of only single-strands (ss); there are examples of this for both RNA and DNA. Note: an important convention is to write out DNA in the 5’ to 3’ direction! 5’ end 3’ end 3’ end 5’ end 5’ – A T T C A G T A A – 3’ is NOT the same as 3’ – A T T C A G T A A – 5’ Some additional comments about RNA are warranted. RNA is commonly found in nature in both single and double strand forms. Regions of RNA molecules, although found in the form a single polynucleotide chains, often pair up with other regions of the same chain, forming secondary structures. Also, base pairing between G and U is possible, whereas pairing between G and T in DNA does not occur. The structure and function of genes In the broad sense a GENE is defined as the genetic element which is transmitted from parent to offspring during the process of reproduction that influences hereditary traits. It has been more than a century since the essential characteristics of a gene were defined by Mendel (1865). For much of that time there was no mechanistic explanation of how a gene actually functioned. It wasn’t until 1941 that Beadle and Tatum had clearly shown that a genetic mutant resulted in a defective enzyme. There findings become formalized as the one-gene, one-enzyme hypothesis. Over time it became clear that some enzymes were encoded by the products of more than one gene, which were subsequently assembled into a functioning enzyme, the hypothesis was changed to one-gene, one-polypeptide. The definition later grew into one which included both the coding DNA sequence and the adjacent segments necessary for the use of that coding sequence. For example Benjamin Lewin defines the gene in his textbook “Genes V” as follows: GENE: is the segment of DNA involved in producing a polypeptide chain; it includes regions preceding and following the codon region (leader and trailer) as well as intervening sequences (introns) between individual coding segments (exons). [Note this is also the definition of a CISTRON.] This is essentially the modern view of the gene in molecular biology. Although somewhat expanded from the original one-gene, one-enzyme hypothesis, it is still a deterministic view of the gene as a discrete element of DNA, as it suggests that most, if not all, of the information required to obtain the functional protein is contained in the local DNA sequence. A consequence of this view was that most functional and structural diversity arose via local changes in the DNA sequence of genes. The deterministic view of the gene was not only popular, but productive; without it we could not have identified the genetic basis of many diseases. In fact, one of the motivating factors behind the huge effort and expense of the human genome project (HGP) was based on this view of the gene. It was envisioned that knowing the sequences of genes and then discovering genetic variation in all human genes would lead to the discovery of the genetic basis of many more diseases as well as provide clues to treatment and cures. A highly unexpected result of the HGP was the discovery of just 30,000 genes; far far more were believed to be required to encode all the information necessary to build a human being (the consensus opinion had been around 100,000 human genes). In the simple terms of the absolute number of genes, it seemed that humans are not much more complex than fruit flies and roundworms (Drosophila has about 13,000 genes and Caenorhabditis about 19,000 genes), and about the same as the mouse (Mus has about 30,000 genes). Among other things, the HGP highlighted the deficiencies of our classic view of the function of a gene and how to define it. The HGP and other genome projects have revealed that many genes encode more than one protein. It was well known that the products of genes could me modified at different stages during the process of producing the mature gene product; hence not all the information required to obtain the final gene product is encoded in the “gene”. However, in light of the very low gene number of the human genome, it is now thought that most of the evolutionary changes in functional and structural differences between humans, chimpanzees, and even mice, occurred at the level of gene regulation (Clark et al. 2003). If we want to define a gene by what it does, we have to alter our way of thinking about its form in the genome. What is a gene? 1. a unit of inheritance 2. a location on a chromosome 3. a sequence of base pairs 4. a determinant of phenotype They are all correct. Let’s reconsider the broad sense definition of a GENE: the genetic element which is transmitted from parent to offspring during the process of reproduction that influences hereditary traits. What happens if we use such a definition, and some of the information required to achieve the final function of a protein (e.g., the propensity for a particular disease) is not encoded in the segment of DNA that encodes protein? We can no longer assume a purely deterministic view of a gene. Although there is much interest in improving the definition of the gene, little progress has been made; we simply need more information about how the tremendous complexity and plasticity of gene expression is regulated. Recognizing that one “gene” could represent any number of products, with different functions, Venter et al. (2001) proposed defining a gene as a “transcriptional unit”; this does not seem to resolve the important issues. Nevertheless, in order to move forward we need one or more operational definitions. Bearing in mind the many limitations of our definitions, we will divide genes into three broad categories: (i) protein coding genes; (ii) regulatory genes and (iii) RNA encoding genes. 1. Protein coding genes. This type easily fits the above definition, in that they transcribe a messenger RNA (mRNA) that is used as a template for making a polypeptide. These genes are sometimes called STRUCTURAL GENES. We can see the problems with defining a gene as a segment of DNA involved in producing a polypeptide chain, as this differs among eukaryotes, prokaryotes and virus’s. Prokaryotic protein coding sequences are COLINEAR with the polypeptide; the sequence of nucleotides corresponds exactly to the sequence of amino acids in the polypeptide. Often several protein coding genes are regulated and expressed as a single unit; this is called an OPERON (see figure below). The mRNA for these adjacent coding sequences is synthesized in one piece. The operon includes regulatory sequence elements physically located in the same region of the coding sequences that mediate transcription of those sequences. Operons tend to be comprised of genes whose functions are related. For example, it is very common for all the enzymes of a metabolic pathway to be organized into a cluster of coding sequences that are co-ordinately regulated. Promoter for regulatory gene Regulatory gene DNA Pi i Plac Structural genes Z Operator Y a Promoter for lac operon z = Structural gene for β-galactosidase y = Structural gene for β-galactoside permease a = Structural gene for β-galactoside transacetylase Promoter: A region of DNA extending 150-300 bp upstream from the transcription start site that contains binding sites for RNA polymerase and a number of proteins that regulate the rate of transcription of the adjacent gene. Operator: a region of DNA that indicates the starting point for reading the coding sequences of bacterial structure genes and controls the expression of those genes via interaction with a repressor. Eukaryotic protein coding genes differ in many ways from prokaryotic ones; the most striking difference being presence of introns. INTRONS are regions of DNA within a protein-coding gene that do not code for amino acids; they are initially copied into the RNA, but are cut out of the final RNA transcript. Some eukaryotic genes do not possess introns (e.g., histone genes) while others can have dozens. The size of the introns can be highly variable as well. The figure below presents an example of a eukaryotic protein coding gene Regulatory Signals RNA start Introns DNA Exon 1 Exon 2 Exon 3 Poly-A addition site -220 +2400 2. Regulatory signal genes. These are elements or motifs of DNA that are not transcribed, and serve as signals to regulate the processing of the DNA molecule. The prominent types of such genes are: 1. Replicator signals: These signal the initiation or termination of DNA replication. Such sites often function as binding sites for specific molecules that initiate or suppress the DNA replication process. 2. Telomeres: These are repeats of specific DNA sequences found at the ends of eukaryotic chromosomes. Because eukaryotic chromosomes are linear, having two ends, they must be “capped” so that these ends are stable. Telomeres are crucial to the life of the cell, as they function as the cap. In humans, telomeres can exist in an array of up to 2000 repeat units. Arrays of telomeres shrink in size with each round of chromosome replication, so their length imposes a finite life span on a cell. 3. Segregator signals: These determine the specific sites at which the segregation machinery of the cell attaches to the chromosomes for the process of mitosis and meiosis. 4. Recombination signals: The sequence element that provides a recognition site for a recombination enzyme. Our understanding of the diversity and evolution of regulatory genes is far less advanced than that of the other types of genes. However, the HGP illustrated the importance of gene regulation in the origin and evolution of complexity. Remember that most protein coding genes are shared by humans, chimpanzees and mice, and that divergence in the regulation of these genes is believed to be responsible for much of the difference in complexity of these organisms. As a source of variation, regulatory sequences offer a tremendous source of variation and opportunities for evolution of organism complexity. Because regulatory genes are modular, complexity can arise from COMBINATORIAL EVOLUTION, in which case there is much less need for rare beneficial mutations. Let’s look at an example. Consider 50 genes, each with 2 possible ways of alternative splicing of the exons; this gives us 100 possibilities. Now consider that by mixing and matching the regulatory elements allows expression of any 10 of these genes at the same time. The number of unique sets of 10 different gene products is 1.7 × 1013. Even if only an extremely small fraction of these gene expression patterns alter the phenotype (say 0.000001), we still have an immense number of possibilities (>17 million) to work with, all without any mutations in the protein coding sequences Example of combinatory possibilities: Let’s take a look at a familiar example. Say you have a deck of 52 cards and are about to play a game a poker. You wonder how many different 5 card hands are possible. We will use the notation C(n,r) for the number of combinations of n things taken r at a time. So in this case we have n = 52 things taken r = 5 at a time. C(n,r) = n! / (n-r)!r! C(52,5) = 52! / 47! × 5! C(52,5) = 2,298,960 Now in our example of gene combinations we have n = (50 × 2) = 100 genes, taken at r = 10 genes at a time. C(n,r) = n! / (n-r)!r! C(100,10) = 100! / 90! × 10! C(100,10) = 1.7310 × 10 13 Combinatorial gene expression is well studies in the context of cell differentiation. Below is a diagram that illustrates combinations of regulatory proteins can be used to determine the development of different cell types. In this example differential expression of three different regulatory proteins (1, 2 and 3) leads to eight different cell types. Figure obtained from the Nation Health Museum (Access Excellence): http://www.accessexcellence.org The difference in the phenotypes of these cells is due to differences in the patterns of gene expression. Imagine that rather than point mutation in proteins we can alter the phenotype of a cell by mixing and matching the regulatory elements that control the pattern of gene expression. Mutation is an extremely slow process; but with combinatorial evolution change can be achieve much more quickly via the much faster process of recombination. The evolutionary dynamics of regulatory genes, and in particular combinatorial evolution, warrants serious attention. 3. RNA encoding genes. In contrast to mRNA of protein coding genes, the final product of the RNA gene is only transcribed RNA. RNA molecules specified by such genes fold into complex structures that associate with proteins to form a sort of “chemical machine”. Three most prominent types of such RNA molecules are: 1. Transfer RNA (tRNA): Amino acids have no affinity of their own for the mRNA; hence the tRNA molecule is used as an adaptor molecule. tRNAs function to position a specific amino acid within the translation complex so that it can be added to the growing polypeptide chain. 2. Ribosomal RNA (rRNA): The ribosomal RNA combines with proteins, to form the ribosome, which is the site of protein synthesis within the cell. This type of RNA makes up the vast majority of all RNA in the cell, about 95%. 3. Small nuclear RNA (snRNA): These are responsible for the processing of the mRNA molecule in the nucleus. They associate with protein molecules to form an RNA splicing complex that removes introns from mRNA. They are also important in the maintenance of the telomeres, or chromosomal ends. snRNAs are unique to eukaryotes. snRNAs are always associated with proteins in a complex called small nuclear ribonucleoproteins (SNRNPs, or snurps). Other types of RNA genes are small nucleolar RNA (snoRNA), microRNA, guide RNA (gRNA), and signal recognition particle RNA. Some general features of RNA genes: • In general, RNA specifying genes do not contain introns and are largely similar in structure among prokaryotes and eukaryotes. There are some exceptions, e.g., ciliates, slime molds, and certain bacteria, where RNA genes encode introns that are spliced out in order to obtain a functional RNA molecule. • Sequence elements that regulate the expression of RNA genes are sometimes found within the DNA sequence of the gene. Examples include the eukaryotic tRNA genes. • Many RNA molecules are modified by incorporation of standard and non-standard nucleotides after the process of transcription is complete. Standard nucleotides can also be modified into non-standard ones. • As we have seen in the diagrams above, folding of RNA molecules means that some sites have evolved to form base-pairs with other sites within the same RNA molecule. This is called RNA SECONDARY STRUCTURE.