Download Transcription start sites

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Oncogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Epigenetics of depression wikipedia , lookup

Genetic engineering wikipedia , lookup

Non-coding RNA wikipedia , lookup

Gene desert wikipedia , lookup

Genomic imprinting wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Transposable element wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genomic library wikipedia , lookup

Genomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene expression profiling wikipedia , lookup

Human genome wikipedia , lookup

Epitranscriptome wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

NEDD9 wikipedia , lookup

Minimal genome wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genome editing wikipedia , lookup

Histone acetyltransferase wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenomics wikipedia , lookup

Transcription factor wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Nucleosome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Primary transcript wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Transcript
Introduction
to Bioinformatics
and
University
of Brawijaya
Genomics2013
2013
4th December
Austen Ganley
INMS
Understanding the
Human Genome:
Lessons from the
ENCODE project
1
Glossary
•
•
•
•
•
•
•
•
•
Genome
Genes
DNA/RNA
Protein
Cell
Transcription
Chromatin
Histones
Nucleosomes
• Non-coding
RNA
• Sequencing
• Microarray
• Transcription
start site
• Active/open
• Inactive/repressi
on
transcriptional
terminator
transcriptional
start site
intron
promoter
exon
Introduction
• Individual scientists worked together
• Aim was to understand 1% of the human
genome (2007), and 100% (2012)
• Looked at:
• Transcription
• Chromatin/transcription factors
• Replication
• Evolution
Genes
• Now estimated to be about 21,000
protein-coding genes (taking about
3% of the whole genome)
• In addition, there are about 9,000
microRNAs, and about 10,000 long
non-coding RNAs
Transcription
• Transcription was measured by two
different methods:
• Whole genome microarrays
• RNA-sequencing
Detecting transcription
using tiled microarrays
Transcription
• Transcription was measured by two
different methods:
• Whole genome microarrays
• RNA-sequencing
• They found at least 62% of the
whole genome is transcribed
(remember, genes only account for
about 3% of the whole genome)
Transcriptional start sites
• Goal is to identify the transcription start
sites
• Not easy to do!
• Use a technique called CAGE (Cap
Analysis Gene Expression)
CAGE
• Makes use of the 5’ CAP on mRNA
• First, mRNA is reverse-transcribed, to
form cDNA (RNA-DNA hybrid)
• Then, biotin is attached to the 5’ CAP,
and the cDNA is fragmented
• The biotin fragments are isolated
(representing the 5’ end of mRNA), and
these fragments are sequenced
• About 60,000
transcription start
sites found
• Only half of these
match known
genes
• What do the other
ones do? May
explain high level
of transcription
• The transcription
start sites are often
far upstream of
the gene start, and
can overlap genes
Overlapping Genes
Transcriptional start sites from the DONSON gene
• An overlapping gene, starting far upstream
• The DONSON gene is a known gene
• However, some transcripts start in the ATP50
gene, and include some ATP50 exons
• Two genes are skipped out
Chromatin: histones and nucleosomes
• Nucleosomes are formed
from DNA that is packaged
around histones
• Histones are a set of
proteins that usually
associate as an octamer
www.mun.ca/biochem/courses/3107/Topic
s/supercoiling.html
www.palaeos.com/Eukarya/Eukarya.Origins.5.html
Dnase I hypersensitive sites (DHS)
Hebbes Lab, University of Portsmouth, UK
Gilbert, Developmental Biology, Sinauer
• DNase I preferentially
digests nucleosomedepleted regions (DNase
I hypersensitive sites)
• These are associated
with gene transcription
• Chromatin is digested
with DNase I: only digests
nucleosome-free regions
• The remaining DNA is
isolated, and put on a
microarray or sequenced
• Find the open, active
regions of the genome
DNase I hypersensitive sites
• In total, about 3 million DNase I
hypersensitive sites in the genome,
covering about 15% (versus about 40,000
genes covering about 4%)
• Transcriptional start sites are regions of
DNase I hypersensitivity, as expected
• Most DNase I hypersensitive sites are not
associated with transcriptional start site,
though
Genome
Transcription
start sites
Genes
Transcribed
region
DNase I
hypersensitiv
e region
Histone
Modification
Effects
• Modifications occur
on the histone tails
• They alter the
strength of DNAhistone binding, and
influence the binding
of other proteins to
the DNA
• Thus they can
activate or silence
gene expression
The “Histone Code”
• The combination of histone modifications determine a
gene’s transcriptional status – histone code
• Some modifications are associated with active gene
expression
–
–
–
–
H3K4me2
H3K4me3
H3ac
H4ac
• Some with repression
– H3K27me3
– H3K4me1
www.nature.com/nrm/index.html
ChIP (Chromatin
immunoprecipitation)
• Method to find where your protein of
interest is binding to
• You cross-link the sample, and fragment
the DNA into pieces
• Immunoprecipitate using an antibody to
your protein of interest
• Reverse the cross-links, and isolate the
DNA
• To find where in the genome the protein
was bound:
• Hybridise the DNA to a microarray (ChIPchip) OR sequence it (ChIP-seq)
www.rndsystems.com/product_detail_objectname_exactachip_
assayprinciple.aspx
Histone modification profiles
• They found that histone modifications
associated with active transcription
were found around transcription start sites
• They found that histone modifications
associated with gene repression were
depleted around transcription start sites
• This is as expected
• Around DNase I hypersensitive sites not
near transcription start sites, they found
almost the opposite pattern
Enrichment of active
histone marks and
depletion of inactive
histone marks at a
transcription start site
Enrichment of inactive
histone marks but little
enrichment of active
histone marks at a DNase
I hypersensitive site
Histone modification profiles
• They also found other patterns
• Combining all the results (plus results for
transcription factor binding), they say that
the human genome is divided into seven
different types of chromatin states
• Which state it is depends on what
combination of histone
modifications/transcription factor binding
there is
The seven chromatin states
The seven chromatin states
Promoter (red)
Enhancer Gene body
Inactive
(yellow)
(green)
region (grey)
Grand Summary
Transcription:
• a lot of non-coding transcription
(~60% of the genome
transcribed) – much more than
needed just to transcribe all the
genes
Transcription start sites:
• Twice as many transcription
start sites as traditional
“genes”
• transcripts span large
regions, even between genes
DNase I hypersensitive sites:
• more than just at transcription
start sites
• two types: those found both at
TSS, and those found at other
regions
• these have different chromatin
profiles
ENCODE
Overview:
• genome can be generalised into seven
different states
• the function of some of these states is
known – e.g. promoter
Chromatin states:
• the function of others is not known, but • The genome can be divided
may explain the high level of
into seven different types
transcription and open chromatin
• these are determined by the
structure
combination of histone
modifications and transcription
factor binding that occur
Histone modifications:
• active marks correlate with
TSS/DHS
• distal DHS have a different
histone modification profile