Download network - bioinf leipzig

Document related concepts

Epitranscriptome wikipedia , lookup

Nucleosome wikipedia , lookup

Epigenetics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Non-coding RNA wikipedia , lookup

Transposable element wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Point mutation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Minimal genome wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genomic imprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

MicroRNA wikipedia , lookup

Gene desert wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Epigenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genome evolution wikipedia , lookup

Primary transcript wikipedia , lookup

Gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Long non-coding RNA wikipedia , lookup

NEDD9 wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Transcript
Networks in Biology
Gene Regulatory Networks (GRNs)
Dr. Katja Nowick
[email protected]
www.nowick-lab.info
Networks in Biology
Networks in cells (molecular networks):
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
Identity of the nodes (vertices)
and meaning of the links (edges)
depends on the studied network
Characteristics of biological networks
• Node degree distribution follows a power law
• Small world characteristics
• Hierarchical and modular organization
• Overrepresentation of certain network motifs
• Preferential attachment
• Are dynamic
Typical parameters analyzed in a network
• Node degree (hubinesss)
• Neighborhood
• Centralization
• Clustering coefficient
• Centrality (Betweenness Centrality, Closeness Centrality)
Why are cells different from each other?
Examples of Gene Regulation Networks
• Stem cell differentiation regulation
Nodes: Genes, including transcription factors (TFs)
Links: Interactions: who regulates expression of whom
Directional or bidirectional
Activating or repressing
6
Feed-back and other loops
MacArthur et al., PLoS ONE 3: e3086 (2008)
Examples of Gene Regulation Networks
• TF network of E.coli
Ca. 20% of all interactions in E.coli
Here nodes are operons (genes on the same mRNA)
Links: TF X regulates operon Y
Examples of Gene Regulation Networks
• TF network of drosophila embryonic development
Transcription + translation (gene expression)
! TFs are also proteins  some generated proteins regulate new genes  network
Examples of Gene Regulation Networks
• Stem cell differentiation regulation
1. Nodes: Genes, including transcription factors (TFs)
Links: Interactions: who regulates expression of whom
Directional or bidirectional
Activating or repressing
10
Feed-back and other loops
MacArthur et al., PLoS ONE 3: e3086 (2008)
TFs regulate expression of other genes
TF
Promoter
Gene
TFs regulate expression of other genes
Promoter
Gene
Many TFs have to come together to start/stop transcription of a target
Transcription factors (TFs)
~ 1500 TFs in human genome
Tubby
Structural
AF-4
Dwarfin
ZNF
ZNF
AP-2
Paired Box
BHLH
117
762
TEA
BHLH
BZip
GCM
HOX
HOX
T-Box
Trp cluster
Β-Scaffold
NHR
FOX
Pocket domain
199
E2F
Jumonji
Other
Bromodomain
RFX
Heat shock
Methyl-CpG-binding
Modified after Messina et al., 2004
Some TFs bind DNA as dimers
Many TFs have to come together to start/stop transcription of a target
bHLH: basic helix loop helix TFs
bZip: beta zipper TFs
NR: nuclear receptors
Homo-dimers or hetero-dimers  added complexity
Environmental signals trigger the GRN
Environmental signals trigger the GRN
- Activators -
Environmental signals trigger the GRN
- Repressors -
TFs are often hubs in the GRNs
• TFs and their target genes
TF
TF
TF
TF
TF
TF
TF
TF
TF binding to DNA
Promoter
Gene
TF Binding sites (TFBS): short sequence motifs, degenerate
Enhancers are sites on the DNA helix that are bound to by activators in order to loop the
DNA bringing a specific promoter to the initiation complex. Enhancers are much more
common in eukaryote than prokaryotes, where only a few examples exist (to date).
Silencers are regions of DNA sequences that, when bound by particular transcription
factors, can silence expression of the gene.
TFs recognize specific sites/motifs in DNA
• TFs bind short sequence motifs
• Motifs are degenerated
TFs interact to regulate their targets
• TFs cooperate to regulate their targets
TF
TF
TF
Promoter
Gene
TF
TF
TF
TF
TF
TFs interact to regulate their targets
• Co-occurrence of TF binding sites in the genome
Encode 2012
Complex TF interactions
• Summary
• TFs bind as monomers, homo-dimers, or hetero-dimers
• Multiple TFs (~7-10) cooperate to regulate gene expression
• TFs regulate the expression of other TFs
• Feedback loops, autoregulation …
•  It makes sense to represent this complexity in a network
TFs: what is known and what not
Not only TFs regulate gene expression
General TFs
RNA polymerase II transcriptioninitiation complex
Specific TFs
Activate or repress expression of
particular genes
GRFs
Cofactors
Bridge between specific and
general TFs; activate or repress
Chromatin remodeler
Make DNA accessible or
inaccessible
miRNAs
*GRN = Gene Regulatory Factor
Bind to mRNA to degrade them
Epigenetic control of gene expression
• Chromatin remodeler
Examples of epigenetic/histone modifications
*
*
Temporal changes of the epigenome
Interactions between TFs and histone
modifications
• Histone modifications influence chromatin states
• Chromatin states influence binding of TFs
• TFs interact with enzymes that modify histones
Not only TFs regulate gene expression
General TFs
RNA polymerase II transcriptioninitiation complex
Specific TFs
Activate or repress expression of
particular genes
GRFs
Cofactors
Bridge between specific and
general TFs; activate or repress
Chromatin remodeler
Make DNA accessible or
inaccessible
miRNAs
*GRN = Gene Regulatory Factor
Bind to mRNA to degrade them
miRNAs
• = small non-coding RNA molecule (ca. 22 nucleotides)
• > 1000 miRNAs in the human genome
A primary miRNA (pri-miRNA) transcript is encoded in the cell's DNA and
transcribed in the nucleus, processed by an enzyme Dosha and exported into the
cytoplasm where it is further processed by Dicer. After strand separation, the
mature miRNA represses protein production either by blocking translation or
causing transcript degradation.
Interactions between TFs, miRNAs, other
ncRNAs, and histone modifications
• Neurogenesis
Interactions between TFs, miRNAs, other
ncRNAs, and histone modifications
• TFs bind as monomers, homo-dimers, or hetero-dimers
• Multiple TFs (~7-10) cooperate to regulate gene expression
• TFs regulate the expression of other TFs
• Feedback loops, autoregulation …
•  Network
• Add epigenetic modifications
~375 Mio
interactions
• Add ncRNAs
•  Even more complex networks
~5000
ncRNAs
Why are tissues different from each other?
Cell states are defined by gene expression
How is a gene activated or repressed (at a certain time and location)?
 So let’s talk about the links now
Examples of Gene Regulation Networks
• Stem cell differentiation regulation
Nodes: Genes, including transcription factors (TFs)
2. Links: Interactions: who regulates expression of whom
Directional or bidirectional
Activating or repressing
36
Feed-back and other loops
MacArthur et al., PLoS ONE 3: e3086 (2008)
Cell states are defined by gene expression
How is a gene activated or repressed (at a certain time and location)?
 Goal: discover which gene is regulated by which TF
How do we get the information for the links?
Network construction based on literature
• Manual
• Semi-automated (i.e. preBIND)
• Natural Language Processing (NLP) (i.e. PathwayStudio)
preBIND
Donaldson I, et al. BMC Bioinformatics. 4:11 (2003)
38
Is the network encoded in the DNA?
TFs bind to specific motifs
 It should be possible to predict TF target genes by reading the DNA
http://fasta.bioch.virginia.edu/cshl/
Experimental approaches
Experimental approaches
Experimental approaches
Experimental approaches
Experimental approaches
Experimental approaches
• Expensive
• Time consuming
• For one research group only feasible for a few TFs
A collection of TFBS can be found in databases: Jasper, Transfac
Motif databases
• Jaspar: http://jaspar.genereg.net/
• http://www.gene-regulation.com
/pub/databases.html
How good is a motif?
To score a single site s for match to a motif W, we use Pr(s |W )
How good is a motif?
• Scoring motif matches
• Pr (s | W) is the key idea.
However, some statistical mashing is done on this.
Consider a genome that is very A/T rich:
Pr(A) = 0.45, Pr(T) = 0.45, Pr (C) = 0.05, Pr(G) = 0.05
We saw that Pr (ACACGTT | W) = 0.048
In fact Pr (ACATGTT | W) = 0.048 too.
• Compute the probability of each site under the above “background model”:
Pr (ACACGTT ) = 0.45x0.05x0.45x0.05x0.05x0.45x0.45 =0.0000051.
So Pr (ACACGTT | W) = 0.048 is 9364 times Pr (ACACGTT)
Similarly, Pr (ACATGTT) is 0.0000461.
So Pr (ACATGTT | W) = 0.048 is 1040 times Pr (ACATGTT)
• Pr (ACACGTT | W) is 9364 times Pr (ACACGTT)
Pr (ACATGTT | W) is 1040 times Pr (ACATGTT)
In other words, if we compare how well “W explains the site”
to how well “random background explains it”, then ACACGTT stands out.
How good is a motif?
• The Log Likelyhood Ratio (LLR) score
Given a motif W, background nucleotide frequencies Wb, and a site s,
LLR score of s = log (Pr(s |W) / Pr(s |Wb )
Good scores > 0.
Bad scores < 0.
Finding the TF target gene
• So, what to do with the motif now?
Find motif matches in DNA
Typically people designate the gene closest to the motif as TF’s target
Motif discovery
We assumed that we have experimental characterization of a TFs binding specificity (the motif)
What if we don’t?
We can try computational motif discovery
Motif discovery – Option 1
Try to find the motif given the promoter regions of
the five genes G1, G2, … G5
Motif discovery – Option 2
Motif discovery – some algorithms
Idea: Find a motif with many (significantly more than
expected by chance) matches in the given sequences
Motif discovery – some tools
Is the network encoded in the DNA?
TFs bind to specific motifs
It should be possible to predict TF target genes by reading the DNA
Is this really so simple?
• For most TFs is the binding site not known
• Since TFBS are degenerated, hard to predict
how efficient the TF really binds
• How far away can the binding site be from
the promoter?
• Multiple TFs might compete for the same
binding site
• Is the nearest gene really the target gene?
• Does the binding event have an effect at all?
• …
Does the TF binding really have an effect?
Problem: TFs bind at many places
But is indeed a gene regulated by the binding event?
Combine motif finding experiments with experiments changing the TF expression
(perturbartion experiments)
• Chromatin immuno-precipitation (ChIP)-Seq
• Overexpression or knock-down of TFs in cell lines,
followed by RNA-Seq
+ -
Inferring networks from perturbations
Sachs et al. Science. 2005 308:523-9
Reverse engineering the topology of regulatory molecular biological networks can be done
through the analysis of a set of perturbations. Picture: reversed engineering of the hierarchy of a
cell signaling network using multiple perturbations and a statistical method called Bayesian
60
networks inference.
Inferring Networks from Time Series
Microarrays
Zou M, Conzen SD. Bioinformatics. 2005 21(1):71-9.
Regulatory interactions can also be inferred directly from data = reverse engineering of
biological pathways/networks from data. In the example above time-series expression data61is
used to infer a directed and signed graph based on delayed correlations.
Why are tissues different from each other?
• Summary
GRNs are hierarchical
Top layer
Kernels
Initial TFs
Hierarchy
Core layer
Bottom layer
Differentiation batteries
Terminal TFs
GRNs are hierarchical - Yeast
Top layer
Kernels
Initial TFs
Hierarchy
Core layer
Bottom layer
Differentiation batteries
Terminal TFs
Yeast regulatory network of 13385
regulatory interactions among 4503
genes, which includes 158 TFs and
4369 target genes .
The model based on experimental evidence in yeast organizes TFs in a stratified nature of three distinct layers: the top,
core, and bottom layers.
TFs within a layer are highly interconnected and share similar properties.
TFs of the different layers regulate distinct sets of targets genes.
The three layers are also connected by a central skeleton, a feed-forward structure that utilizes the TFs of the top layer to
regulate TFs of the core layer, and TFs of the core layer to regulate TFs of the bottom layer.
The core layer is characterized by the highest number of TFs and hubs and is important for signal propagation for the
regulation of almost all targets.
GRNs are hierarchical - Development
Bottom layer
Differentiation batteries
Terminal TFs
Hierarchy
Core layer
e.g. drosophila development
Top layer
Kernels
Initial TFs
Developmental biologists have proposed a concept that concentrates on the timely order of events in developmental
pathways.
In this system modules are classified as kernels, plug-ins, input-output switches and differentiation batteries.
Modules can be thought of fulfilling one specific function .
Kernels are the initial modules of the network that impact most other parts of the net-work. They are, for instance, involved
in the initiation of the development of certain body parts.
Differentiation batteries may play a role in terminal steps of the differentiation of body parts and do generally not affect
other parts of the network.
GRNs are hierarchical – cell fate
Top layer
Kernels
Initial TFs
Hierarchy
Core layer
Bottom layer
Differentiation batteries
Terminal TFs
Terminal selector TFs (acting either alone or in
synergistic combination) activate downstream
target genes directly via terminal selector motifs
and also autoregulate their own expression via
Hobert O PNAS 2008;105:20067-20071
those motifs.
Autoregulated expression of a terminal selector is critical to maintain the differentiated
features of the cell.
Downstream targets of terminal selectors (X) define differentiated properties of a neuron, such
as neurotransmitter receptor, ion channels, adhesion proteins etc. Targets may also include TFs
that regulate specific “subroutines.”
TFs that are induced by terminal selectors may also cooperate with terminal selector proteins in
a feed-forward loop configuration to jointly control specific terminal genes.