* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gene
X-inactivation wikipedia , lookup
Oncogenomics wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Genomic imprinting wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic engineering wikipedia , lookup
RNA interference wikipedia , lookup
Minimal genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Human genome wikipedia , lookup
Transposable element wikipedia , lookup
Gene therapy wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
History of RNA biology wikipedia , lookup
RNA silencing wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene nomenclature wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epitranscriptome wikipedia , lookup
Genome editing wikipedia , lookup
Genome (book) wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Non-coding RNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Primary transcript wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human Molecular Genetics Institute of Medical Genetics Outline of this chapter Definition Structure Organization Gene Molecular definition: DNA sequence encoding protein What are the problems with this definition? Gene definition caveats Some genomes are RNA instead of DNA Some gene products are RNA (tRNA, rRNA, and others) instead of protein Some nucleic acid sequences that do not encode gene products (noncoding regions) are necessary for production of the gene product (RNA or protein) Gene Gene - is a segment of DNA encoding information leading to a functional product (RNA or polypeptide chain); The most important feature of a gene is it must code for a functional product. There are 30,000 to 35,000 genes in the human genome. Hybridization of mRNA and DNA Eukaryotic genes are split genes It includes coding region and noncoding regions. A “Simple” Eukaryotic Gene Transcription 5’ Untranslated Region Start Site Introns 5’ Exon 1 Int. 1 Promoter/ Control Region Exon 2 3’ Untranslated Region 3’ Int. 2 Exon 3 Terminator Sequence Exons RNA Transcript 5’ Exon 1 Int. 1 Exon 2 Int. 2 Exon 3 3’ Gene Structure Exons Introns Splicing junction Regulatory sequences - Promoter/proximal control elements - Enhancer/silencer - Terminator Exons Segment of a gene which is decoded to give an mRNA product or a mature RNA product. Individual exons may contain coding DNA or noncoding DNA (untranslated sequences, UTS). Coding region Nucleotides (open reading frame) encoding the amino acid sequence of a protein Introns Noncoding DNA which separates neighboring exons in a gene. During gene expression introns, like exons, are transcribed into RNA but the transcribed intron sequences are subsequently removed by RNA splicing and are not present in mRNA. Splice junction (exon/intron boundary Splice donor site: the junction between the end of an exon and the start of the downstream intron, commencing with the dinucleotide GT. Splice acceptor site: the junction between the end of an intron terminating in the dinucleotide AG, and the start of the next exon. Branch site: the third conserved intronic sequence that is known to be functionally important in splicing Splice junction (exon/intron boundary Splice junction (exon/intron boundary Splice junction (exon/intron boundary) Splice junction (exon/intron boundary) Consensus sequences are conserved throughout eukaryotes Conservation of sequence is expected, since recognition of sequences is accomplished by base pairing with snRNPs RNA component Secondary structure model of human U1 snRNP. The region where it recognizes the premRNA is also shown Regulatory Sequences 5’ untranscribed region. Signals for initiation and control of transcription - Promoter/proximal elements Enhancer / Silencer -Enhancer stimulates transcription -Silencer inhibits transcription 3’ untranscribed region. Signals for termination of transcription Regulatory Sequences Promoter/Proximal Elements Occur within ~200 bp of the start site. Contain up to ~20 bp. Cell-type specific Basal Promoter Analysis ATATAA -30 TBP GGCCAATC -75 CTF/NF1 GCCACACCC -90 SP1 +1 GC CAAT TATA Promoter-Proximal Elements TATA box Most common Highly transcribed genes 25~35 base pairs upstream of start site Initiator At start site GC boxes (CpG islands) “Housekeeping” genes (transcribed at low rate) Within ~100 base pairs of start site TATA box ~ 25 bp upstream of +1 Only promoter element that is relatively fixed in relation to start point Tends to be surrounded by GC-rich sequences Single base substitutions in TATA strong promoter down mutations Some promoters do not contain TATA Initiator Instead of a TATA box, some eukaryotic gene contain an alternative promoter element, called an initiator. Initiator is highly degenerative. +1 5’ Y Y A N T/A Y Y Y Y = pyrimidine (C or T) N = any CpG island Genes coding for intermediary metabolism are transcribed at low rates, and do not contain a TATA box or initiator. Most genes of this type contain a CG-rich stretch of 20-50 nt within ~100 bp upstream of the start site region. A transcription factor called SP1 recognizes these CG-rich region. Gives multiple alternative mRNA start sites. mRNA ~100 bp CpG island Multiple 5’-start sites Enhancers Can be located several kb from promoter Can be present in either orientation relative to the promoter Contain elements that bind inducible factors Usually ~100-200 bp long, containing multiple 8- to 20-bp control elements. Targets for tissue specific and/or temporal regulation Enhancer Variable distance from promoter Either orientation Upstream or downstream of gene TERMINATION • RNA polymerase meets the terminator • Terminator sequence: AAUAAA • RNA polymerase releases from DNA • Prokaryotes-releases at termination signal • Eukaryotes-releases 10-35 base pairs after termination signal Termination Different mechanisms of termination Prokaryotes rho-independent termination: formation of a hairpin structure rho-dependent termination: external protein disrupts transcription Eukaryotes cleavage of the RNA by an external protein Rho-independent terminator Distribution Different density of genes along a chromosome Different density of genes between chromosomes (exon-intron-exon)n structure of various genes histone total = 400 bp; exon = 400 bp b-globin total = 1,660 bp; exons = 990 bp HGPRT (HPRT) total = 42,830 bp; exons = 1263 bp factor VIII total = ~186,000 bp; exons = ~9,000 bp Genes Protein Coding RNA genes rRNA tRNA snRNA, snoRNA… ”Average” gene organization Single, unique genes consisting of exons interrupted by introns only Other gene organizations Dispersed gene segments brought together by genome reorganization in specialized cells Example: gene for bT-cell receptor protein in T-cells Light Chain Gene Families Germ line gene organization Lambda light chain genes; n=30 V1 L P L V2 P L Vn J 1 P C 1 J 2 E C 2 J 3 E C 3 J 4 E C 4 E Kappa light chain genes; n=300 L P V1 L P V2 L P Vn J 1 J 2 J 3 J 4 J 5 C E Light Chain Gene Families Gene rearrangement and expression L V 1 P L V 2 P V n L P L V 1 P L V 2 J4 J 5 P L V J DNA C Primary transcript RNA L V J C mRNA Translation DNA C E E RNA Processing C E DNA Rearrangement Transcription J 1 J 2 J 3 J 4 J 5 RNA L V J C Protein Transport to ER V J C Protein V C Heavy Chain Gene Family Germ line gene organization Heavy chain genes; Vn=1000, Dn=15 L P V1 L V2 P L D1 D2 D3 Vn Dn J1 J2 J3 J4 J5 P C C E C C C C 3 1 2 4 C C C 1 2 CH1 H CH2 CH3 CH4 Introns separate exons coding for H chain domains Heavy Chain Gene Family Gene rearrangement and expression L P V1 L V2 P D1 D2 D3 Vn L Dn C J1 J2 J3 J4 J5 C E P DJ rearrangement L V1 L V2 L D1 D2 J4 J5 Vn C C DNA P P P E VDJ rearrangement L V1 L P V2 D2 J4 P C J5 C DNA E Transcription L V2 D2 J4 C J5 E C Primary transcript RNA Other gene organizations Overlapping genes met val Gene 1 G T T T A T G GT A val tyr gly Gene 2 Other gene organizations Genes-within-genes It is not uncommon that short genes are located inside an intron of another gene Intron 26 of the NF1 gene contains three internal genes. Other gene organizations Gene families: functionally similar or identical genes repeated on the same or different chromosomes Example 1: genes for histones and (ribosomal) rRNA Example 2: The globin families Gene families defined by conserved amino acid motifs DEAD box. WD repeat families Clustered gene families Growth hormone aglobin Hox genes (multi) Olfactory receptors large 5 copies (67kb) 7 copies (50kb) 38 four clusters 1000 in 25 clusters Interspersed gene families Pax 9 copies Actin >20 copies Alu elements (repeats) 1.1 million LINE elements (L1) 200-500,000 Pseudogenes Nonfunctional copies of genes Formed by duplication of ancestral gene, or reverse transcription (and integration) Not expressed due to mutations that produce a stop codon (nonsense or frameshift) or prevent mRNA processing, or due to lack of regulatory sequences