Download The genetic code of gene regulatory elements

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genomic imprinting wikipedia , lookup

List of types of proteins wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Community fingerprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene expression wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

RNA-Seq wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene therapy wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome evolution wikipedia , lookup

Gene nomenclature wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene desert wikipedia , lookup

Transcript
The genetic code of gene regulatory elements
Ivan Ovcharenko
Computational Biology Branch
National Center for Biotechnology Information
National Institutes of Health
October 23, 2008
Outline
• Gene deserts and distant gene regulation
• Genetic encryption of gene regulation
• Heart regulatory code
• Regulation of regulators:
how transcription factors regulate themselves
The Genome Sequence: The Ultimate Code of Life
3 billion letters
~ 45% is “junk” (repetitive elements)
~ 3% is coding for proteins
gene regulatory elements (REs) reside
SOMEWHERE in the rest ~50%
Hirschprung disease is associated with a noncoding SNP
RET
Comparative Sequence Analysis
Biologically functional regions in the genome tend to stay conserved throughout the evolution.
Therefore, by aligning homologous sequences from different, but related species we can identify
Evolutionary Conserved Regions (ECRs) with a putative functional importance
1880th
1920th
1950th
2000th
In vivo validation of a regulatory element
E1
IL4
IL13
RAD50
IL5
E
1
E1 genome
deletion
wild type
•
•
Pg/ml
wt
wt
wt
•
deletion
deletion
0
0
IL4
IL13
Loots et al., Science, 2001
deletion
•
0
IL5
Regulatory elements (REs) orchestrate temporal
and spatial expression of genes
Genetic deletion of E1 element decreased the
expression of IL4, IL13, and IL5 3-fold
REs were identified for a handful of genes in the
human genome
Knock out of a single candidate RE can take up to
2 years…
Сomparative genomics to predict regulatory elements
E1
•
•
•
Functionally important elements in genomes mutate at slower rates than the
neutrally evolving background
– 1% of sequence is conserved between humans and fish
– 75% of genes are conserved between humans and fish
RE E1 is highly conserved in human and mouse genomes.
Comparative genomics can be utilized to prioritize functional elements
Gene Deserts in the Human Genome
Gene deserts = 25% of the human genome sequence
Human chromosome 13
Gene deserts
~500 gene deserts in the human genome
~50% of the HSA13 consists of gene deserts
Gene deserts are NOT enriched in repetitive
elements
DACH gene deserts on chromosome 13
1.3Mb
Over 1,000 human/mouse ncECRs
430 kb
0.9Mb
Phylogenetic conservation of DACH gene deserts
Nobrega et al., Science, 2004
ncECR
Prom
Reporter gene
Dichotomy in the evolutionary conservation of gene deserts
~200 stable
gene deserts
- DACH
- OTX2
- SOX2
-vs-
- Deletion does not lead to a phenotype
~300 variable
gene deserts
Function of genes flanking gene deserts
stable gene deserts
- transcription
- regulation of transcription
- regulation of metabolism
- development
etc.
variable gene deserts
Distant gene regulation
1. Do stable gene deserts harbor distant regulatory elements?
gene
RE
2. Does the distance between a RE and a gene matter?
gene
RE
gene
RE
gene
OR
RE
Chromosomal stability of gene deserts
gene
RE
Distant REs are distance-independent
gene
RE
gene
RE
gene
OR
RE
Gene deserts :: Summary
25% of the human genome consists of gene deserts
There are 2 classes of gene deserts – stable and variable gene deserts
Stable gene deserts are evolutionarily protected from chromosome rearrangements
indicating presence of distant regulatory elements
Experimental validation of deeply conserved sequences in some stable gene deserts
confirms their enhancer activity
Gene regulation based on distant regulatory elements does not probably depend on
the distance between a regulatory element and the gene it regulates
Outline
• Gene deserts and distant gene regulation
• Genetic encryption of gene regulation
• Heart regulatory code
• Regulation of regulators:
how transcription factors regulate themselves
Patterns of transcription factor binding sites
define biological functions of REs
•
•
•
Protein A
Transcription factor (TF) proteins bind to very
short (6 to 10 nucleotides) binding sites (TFBS)
Combinatorial binding of multiple TFs to a RE
defines a specific pattern of gene expression
Correlating patterns of TFBS in REs with
biological function will decode gene regulation
Protein B
Protein C
GENE
DNA
TFBS
TFBS
TFBS
aCTGACT
CTGACTgaaaa
gaaaaCTGATATTG
CTGATATTGacagt
acagtTTGTTGTTG
TTGTTGTTGttaa
ttaa
REGULATORY ELEMENT (RE)
Limbs
regulatory
module A
Reporter Gene
CNS
regulatory
module B
Reporter Gene
Computational identification of TFBS
Protein A
Protein B
Protein C
GENE
TFBS
TFBS
TFBS
aCTGACT
CTGACTgaaaa
gaaaaCTGATATTG
CTGATATTGacagt
acagtTTGTTGTTG
TTGTTGTTGttaa
ttaa
REGULATORY ELEMENT (RE)
-- Transcription factor bindings sites are very short (~ 6-10 bp)
-- Computational predictions of transcription factor binding sites are overwhelmed with
false positives
Deciphering the genetic code of enhancers
ncECR
ncECR
ncECR
GENE
ncECR
ncECR
Enhancer Identification (EI) method
GNF Expression Atlas 2
15k human genes
79 human tissues
Tissue A
300 highly expressed genes
Selecting candidate
enhancers based on
noncoding conservation
Mapping conserved
TFBS in candidate
enhancers
15K genes
3,000 lowly expressed genes
Legend
Predicting tissue-specific
enhancers
Assigning tissue
weights to TFs
Highly expressed gene
Lowly expressed gene
Promoter
3+ top noncoding ECR
X
No prediction
TFBS
TFBS signatures in
candidate enhancers
defining tissue-specific
gene expression
X
X
TF1
-vs-
TF2
X XX
…
TFN
1.4
tissue1
0.5
tissue2
…
tissueM
-1.2
ina t rac h
l ga ea
ng
lion
s ki
t hy n
mu
pa
ncr s
ad
ren
ea
s
bo a l gla
ne
nd
ma
rro
pa
ncr
ut e w
ea
r
t ic u s
isle
t on ts
f et gu e
al l
iv e
s ke
r
let
l
un
al
g
mu
s
pr o c le
st a
te
he
ad
a
ip o rt
cy t
k id e
ne
ca
rdi
t hy y
ac
my r oid
br o
oc
n
c
BM
yt e
hia
s
-CD l e
p
ith
l
71
i
v
er
+ e elia
l
arl
y e c ells
r
wh yt hro
ole
id
B M bloo
-CD d
PB
B
34
- BD M-C
+
t es
CA D 33
+ m t is
4+
de
ye
n
s m tritic loid
oo
th c ells
mu
s cl
PB
e
p
-C
D1 lac en
t
P B 9+
B a
-C
D8 ce lls
+
s p Tce ll
ina
s
wh l cor
d
ole
f et brain
al b
rai
n
t rig
em
Tissue-specificity noncoding signatures
100%
90%
Overexpressed
Neutral
80%
70%
60%
50%
40%
30%
20%
10%
0%
Known skeletal muscle enhancer
DACH brain/CNS enhancers
EI Performance
Precision
(the fraction of predicted tissue-specific enhancers that are correct):
liver
heart
79 human tissues
Associating TFs with tissue specificities
Database of enhancers in the human genome
~4k heart candidate enhancers
EI
Optimization
~8k cancer candidate enhancers
http://www.dcode.org/EI
Gene regulatory code :: Summary
Combination of comparative genomics, gene expression data, and TFBS clustering
provides a tool to define the sequence code of tissue-specific enhancers
Over 7,000 candidate tissue-specific enhancers had been predicted in the human
genome
Enhancers for several tissues (including heart and liver) were predicted with high
precision
Prediction of TFs associated with the regulation of tissue-specific expression
provides the means to move from microarray expression data directly to the
identification of active TFs