* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Transcription start sites
Gene expression programming wikipedia , lookup
Oncogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Epigenetics of depression wikipedia , lookup
Genetic engineering wikipedia , lookup
Non-coding RNA wikipedia , lookup
Gene desert wikipedia , lookup
Genomic imprinting wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Transposable element wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genomic library wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression profiling wikipedia , lookup
Human genome wikipedia , lookup
Epitranscriptome wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Minimal genome wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genome editing wikipedia , lookup
Histone acetyltransferase wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenomics wikipedia , lookup
Transcription factor wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Primary transcript wikipedia , lookup
Introduction to Bioinformatics and University of Brawijaya Genomics2013 2013 4th December Austen Ganley INMS Understanding the Human Genome: Lessons from the ENCODE project 1 Glossary • • • • • • • • • Genome Genes DNA/RNA Protein Cell Transcription Chromatin Histones Nucleosomes • Non-coding RNA • Sequencing • Microarray • Transcription start site • Active/open • Inactive/repressi on transcriptional terminator transcriptional start site intron promoter exon Introduction • Individual scientists worked together • Aim was to understand 1% of the human genome (2007), and 100% (2012) • Looked at: • Transcription • Chromatin/transcription factors • Replication • Evolution Genes • Now estimated to be about 21,000 protein-coding genes (taking about 3% of the whole genome) • In addition, there are about 9,000 microRNAs, and about 10,000 long non-coding RNAs Transcription • Transcription was measured by two different methods: • Whole genome microarrays • RNA-sequencing Detecting transcription using tiled microarrays Transcription • Transcription was measured by two different methods: • Whole genome microarrays • RNA-sequencing • They found at least 62% of the whole genome is transcribed (remember, genes only account for about 3% of the whole genome) Transcriptional start sites • Goal is to identify the transcription start sites • Not easy to do! • Use a technique called CAGE (Cap Analysis Gene Expression) CAGE • Makes use of the 5’ CAP on mRNA • First, mRNA is reverse-transcribed, to form cDNA (RNA-DNA hybrid) • Then, biotin is attached to the 5’ CAP, and the cDNA is fragmented • The biotin fragments are isolated (representing the 5’ end of mRNA), and these fragments are sequenced • About 60,000 transcription start sites found • Only half of these match known genes • What do the other ones do? May explain high level of transcription • The transcription start sites are often far upstream of the gene start, and can overlap genes Overlapping Genes Transcriptional start sites from the DONSON gene • An overlapping gene, starting far upstream • The DONSON gene is a known gene • However, some transcripts start in the ATP50 gene, and include some ATP50 exons • Two genes are skipped out Chromatin: histones and nucleosomes • Nucleosomes are formed from DNA that is packaged around histones • Histones are a set of proteins that usually associate as an octamer www.mun.ca/biochem/courses/3107/Topic s/supercoiling.html www.palaeos.com/Eukarya/Eukarya.Origins.5.html Dnase I hypersensitive sites (DHS) Hebbes Lab, University of Portsmouth, UK Gilbert, Developmental Biology, Sinauer • DNase I preferentially digests nucleosomedepleted regions (DNase I hypersensitive sites) • These are associated with gene transcription • Chromatin is digested with DNase I: only digests nucleosome-free regions • The remaining DNA is isolated, and put on a microarray or sequenced • Find the open, active regions of the genome DNase I hypersensitive sites • In total, about 3 million DNase I hypersensitive sites in the genome, covering about 15% (versus about 40,000 genes covering about 4%) • Transcriptional start sites are regions of DNase I hypersensitivity, as expected • Most DNase I hypersensitive sites are not associated with transcriptional start site, though Genome Transcription start sites Genes Transcribed region DNase I hypersensitiv e region Histone Modification Effects • Modifications occur on the histone tails • They alter the strength of DNAhistone binding, and influence the binding of other proteins to the DNA • Thus they can activate or silence gene expression The “Histone Code” • The combination of histone modifications determine a gene’s transcriptional status – histone code • Some modifications are associated with active gene expression – – – – H3K4me2 H3K4me3 H3ac H4ac • Some with repression – H3K27me3 – H3K4me1 www.nature.com/nrm/index.html ChIP (Chromatin immunoprecipitation) • Method to find where your protein of interest is binding to • You cross-link the sample, and fragment the DNA into pieces • Immunoprecipitate using an antibody to your protein of interest • Reverse the cross-links, and isolate the DNA • To find where in the genome the protein was bound: • Hybridise the DNA to a microarray (ChIPchip) OR sequence it (ChIP-seq) www.rndsystems.com/product_detail_objectname_exactachip_ assayprinciple.aspx Histone modification profiles • They found that histone modifications associated with active transcription were found around transcription start sites • They found that histone modifications associated with gene repression were depleted around transcription start sites • This is as expected • Around DNase I hypersensitive sites not near transcription start sites, they found almost the opposite pattern Enrichment of active histone marks and depletion of inactive histone marks at a transcription start site Enrichment of inactive histone marks but little enrichment of active histone marks at a DNase I hypersensitive site Histone modification profiles • They also found other patterns • Combining all the results (plus results for transcription factor binding), they say that the human genome is divided into seven different types of chromatin states • Which state it is depends on what combination of histone modifications/transcription factor binding there is The seven chromatin states The seven chromatin states Promoter (red) Enhancer Gene body Inactive (yellow) (green) region (grey) Grand Summary Transcription: • a lot of non-coding transcription (~60% of the genome transcribed) – much more than needed just to transcribe all the genes Transcription start sites: • Twice as many transcription start sites as traditional “genes” • transcripts span large regions, even between genes DNase I hypersensitive sites: • more than just at transcription start sites • two types: those found both at TSS, and those found at other regions • these have different chromatin profiles ENCODE Overview: • genome can be generalised into seven different states • the function of some of these states is known – e.g. promoter Chromatin states: • the function of others is not known, but • The genome can be divided may explain the high level of into seven different types transcription and open chromatin • these are determined by the structure combination of histone modifications and transcription factor binding that occur Histone modifications: • active marks correlate with TSS/DHS • distal DHS have a different histone modification profile