Download Systems biology/network biology for complex diseases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Node of Ranvier wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcript
Systems biology/network biology for complex diseases Valborg Gudmundsdo;r Helle Krogh Pedersen 21.05.2014 Program •  What are networks – what are biological networks? •  The concept of networks –  Basic graph theory and network properKes •  Case story of network biology applied to diabetes research •  IntroducKon to Cytoscape •  Cytoscape tutorial –  Integrate GWAS, gene expression and protein-­‐protein interacKon data What is Systems Biology ? “Systems biology is a biology-­‐based inter-­‐
disciplinary field of study that focuses on complex interac:ons within biological systems, using a more holis1c perspec1ve (holism instead of the more tradi:onal reduc:onism) approach to biological and biomedical research. … One of the outreaching aims of systems biology is to model and discover emergent proper1es.” [Wikipedia] Emergent properKes: >
The whole is more than the sum of its parts HolisKc perspecKve Figure source: hUp://dmsc.unicz.it/schede.php?id=125 Why we need networks Molecules of life do not funcKon in isolaKon … … but interact in complex networks. Figure source: hUp://dmsc.unicz.it/schede.php?id=125 NETWORKS ARE ALL AROUND US… Network = nodes + edges •  Nodes (verKces) are the objects in the network •  Edges are the links/interacKon in the network Node Edge Node A COMMON LANGUAGE!
friend!
Movie 1!
Mary!
Peter!
brothers!
friend!
co-worker!
Albert!
Albert!
Protein 1!
Actor 2!
Actor 1!
Movie 3!
Movie 2!
Actor 4!
Actor 3!
Protein 2!
Protein 5!
Protein 9!
Edges=4"
Nodes=4"
slide by Barabási Network Example: London’s metro Nodes: Metro staKons Edges: Subway lines between staKons Network example: European highways Nodes: CiKes Edges: Highway-­‐roads Network example: Facebook Source: hUps://www.facebook.com/notes/facebook-­‐engineering/visualizing-­‐friendships/469716398919 Nodes: People Edges: (Facebook) Friendship Network Example: Food web Nodes: Animals/plants Edges: ‘I’m your food’ Source: hUp://erincourtneytheresa.wikispaces.com/Ecosystems+Chapter Network Example: Metabolic networks Nodes: Metabolites Edges: Enzyme-­‐catalysed reacKons which transform one metabolite into another Network Example: Protein-­‐protein interacKon Nodes: Protein Edges: Physical/funcKonal interacKons Interaction networks in
molecular biology
• 
Protein-protein interactions
•  Gene regulation
–  e.g. DNA-proteins interaction
• 
• 
• 
• 
• 
• 
Genetic interactions
Metabolic reactions
Co-expression interactions
Text mining interactions
Association networks
Signaling
NETWORK TERMINOLOGY Un-­‐directed vs directed graphs Un-­‐directed Protein-­‐protein interacKon, facebook-­‐friends Directed Protein-­‐DNA interacKon, email-­‐correspondence, metabolic networks, food-­‐webs Un-­‐weighted vs edge-­‐weighted graphs Un-­‐weighted facebook-­‐friends Edge-­‐weighted Protein-­‐Protein interacKon (confidence score), phone-­‐networks, metabolic networks (flux) Un-­‐weighted vs node-­‐weighted graphs Un-­‐weighted Node-­‐weighted IntegraKon of node-­‐aUributes Other graphs Self-­‐interacDon Protein-­‐interacKons (with homo-­‐dimers) MulDgraphs (undirected) Social networks, A network integraKng PPI and PDI MulKgraph /mulKlayer graphs Figure source: hUp://dmsc.unicz.it/schede.php?id=125 BiparKte graphs U V Diseasome (disease, genes), Cusine (ingredients, recipes), Hollywood actor (actors, movies) Example of biparKte network DISEASOME !
PHENOME"
GENOME"
Gene network!
Goh, Cusick, Valle, Childs, Vidal & Barabási, PNAS (2007)!
Disease network!
NETWORK PROPERTIES The father of Graph Theory is considered to be Leonhard Euler The father of Graph Theory is considered to be
Leonhard Euler
The bridges of Königsberg
Problem: Find a walk through the city that
cross each bridge once and only once.
Euler proved that the problem has no solution using a
graph representation and proved further that a solution
to the problem would require all vertices to have even
degree
(degree of vertex: number of edges (i.e. bridges)
touching the vertex (land mass)).
7
7
DTU
DTU Informatik,
Informatik, Danmarks
Danmarks Tekniske
Tekniske Universitet
Universitet
Leonhard Euler
(1707-1783)
Nonparametric
Nonparametric Bayesian
Bayesian Models
Models for
for Complex
Complex Networks
Networks
15.08.2013
15.08.2013
Slide by Morten Mørup Network measures A.  Node degree or connecDvity (k) –  Average degree (<k>) B.  Shortest path (l) –  Average path length (<l>) –  Network diameter C.  Degree distribuDon P(k) D.  Clustering coefficient (C) –  average clustering coefficient (<C>) Network measures A.  Node degree or connecDvity (k) –  Average degree (<k>) B.  Shortest path (l) –  Average path length (<l>) –  Network diameter C.  Degree distribuDon P(k) D.  Clustering coefficient (C) –  average clustering coefficient (<C>) Node degree or connectivity (k)
<k> average degree of network
(Barabási and Oltvai, 2004)
Network measures A.  Node degree or connecDvity (k) –  Average degree (<k>) B.  Shortest path (l) –  Average path length (<l>) –  Network diameter C.  Degree distribuDon P(k) D.  Clustering coefficient (C) –  average clustering coefficient (<C>) Path Shortest Path 1 1 l1!4
2 2 5 5 l1!5
l1!4 = 3
l1!4
3 4 A sequence of nodes such that each node is connected to the next node along the path by a link. l1!5 = 2
3 4 The path with the shortest length between two nodes (distance). slide by Barabási Diameter Average Path Length 1 1 2 5 2 l1!4 = 3
5 (l1!2 + l1!3 + l1!4 +
+ l1!5 + l2!3 + l2!4 +
+ l2!5 + l3!4 + l3!5 +
3 4 The longest shortest path in a graph 3 4 + l4!5 ) /10 = 1.6
The average of the shortest paths for all pairs of nodes. slide by Barabási Network measures A.  Node degree or connecDvity (k) –  Average degree (<k>) B.  Shortest path (l) –  Average path length (<l>) –  Network diameter C.  Degree distribuDon P(k) D.  Clustering coefficient (C) –  average clustering coefficient (<C>) Degree distribuKon P(k) •  The probability that a randomly selected node has exactly k links (i.e. has degree k) N = total number of nodes Nk = number of nodes with degree k P(k) = Nk/N P(1) = 6/10 P(k)
0.6!
0.5!
0.4!
0.3!
0.2!
0.1!
1!
2!
3!
4!
k!
Network measures A.  Node degree or connecDvity (k) –  Average degree (<k>) B.  Shortest path (l) –  Average path length (<l>) –  Network diameter C.  Degree distribuDon P(k) D.  Clustering coefficient (C) –  average clustering coefficient (<C>) Clustering coefficient (C)
What portion of your neighbors
are connected?
nI
2n I
CI =
=
( k % k ⋅ (k − 1)
&& ##
'2$
k: neighbors of I
nI: edges between node I’s neighbors
nA = 1
kA = 5
CA = (2*1)/(5*(5-1)) = 0.1
Clustering coefficient (C) C : Clustering coefficient of a node –  C in [0 -­‐ 1] –  0:None of the neighboring genes are connected –  1: All of the the neighboring genes are interconnected Clustering Coefficient
Ravasz et al. Science 297, 1551 (2002)
<C>: average clustering coefficient of a network Characterize the overall tendency to form clusters/groups Characteristic Path Length
C(k): average clustering oefficient of all
all npossible
odes with k links Average
shortest cpath
between
pairs
of nodes
Many real networks: C(k) ~ k-­‐1 ! hierarchical network 6
Network measures A.  Node degree or connecDvity (k) –  Average degree (<k>) B.  Shortest path (l) –  Average path length (<l>) –  Network diameter C.  Degree distribuDon P(k) D.  Clustering coefficient (C) –  Average clustering coefficient (<C>) –  Average clustering coefficient of all nodes with k links (C(k)) Network measures A.  Node degree or connecDvity (k) –  Average degree (<k>) B.  Shortest path (l) –  Average path length (<l>) –  Network diameter C.  Degree distribuDon P(k) D.  Clustering coefficient (C) –  Average clustering coefficient (<C>) –  Average clustering coefficient of all nodes with k links (C(k)) Depend on number of nodes and links in network Network measures A.  Node degree or connecDvity (k) –  Average degree (<k>) B.  Shortest path (l) –  Average path length (<l>) –  Network diameter C.  Degree distribuDon P(k) D.  Clustering coefficient (C) –  Average clustering coefficient (<C>) –  Average clustering coefficient of all nodes with k links (C(k)) Depend on number of nodes and links in network Independent of network size Network models/topology
Barabási AL, Oltvai ZN. Network biology: understanding the cell's
functional organization. Nat Rev Genet. 2004 Feb;5(2):101-13
The degree distribution for scale-free
networks follows a power-law
P(k) ~ k-γ
γ : degree exponents •  The smaller the number, the more important are the hubs ”Networks with a power degree distribuKon are called scale-­‐
free, a name that is rooted in staKsKcal physics literature. It indicates that absence of a typical node in the network (one that could be used to characterize the rest of the nodes).” Barabási AL, Oltvai ZN. Network biology: understanding the cell's
functional organization. Nat Rev Genet. 2004 Feb;5(2):101-13
Many real world networks have a similar architecture: Scale-­‐free networks WWW, Internet (routers and domains), electronic circuits, computer sowware, movie actors, coauthorship networks, sexual web, instant messaging, email web, citaKons, phone calls, metabolic, protein interacDon, protein domains, brain funcKon web, linguisKc networks, comic book characters, internaKonal trade, bank system, encrypKon trust net, energy landscapes, earthquakes, astrophysical network… slide by Barabási Network topology plays an important role in this robustness Even if ~80% of nodes fail, the remaining ~20% sKll maintain network connecKvity % of essenKal proteins t s. Features of scale-­‐free topology Robustness against random failure 21% # of links 62% But consKtute ~93% of all proteins •  Hubs are important components of the network -­‐ if you destroy the hubs, the network will fall apart •  The network is robust to random aUacks -­‐ most such aUacks would delete lowly connected nodes H. Jeong, S. P. Mason, A.-­‐L. Barabási and Z. N. Oltvai. Lethality and centrality in protein networks. Nature 411, 41-­‐42 (2001) In human cells: Essen1al genes, not the disease genes, encode hubs (Based on data from 2007) Absence is associated with embryonic lethality Barabási, A.-­‐L., Gulbahce, N., & Loscalzo, J. (2011). Network medicine: a network-­‐based approach to human disease. Nature Reviews. GeneKcs, 12(1), 56–68. doi:10.1038/nrg2918 Features of scale-­‐free topology ‘Small world effect’ •  That any 2 nodes can be connected with relaKvely few links •  Also a property of random networks •  Scale-­‐free networks are ultra small àLocal perturbaKons can reach the whole network very quickly Two further results offer direct evidence that network growth is responsible for the observed topological
features. The scale-free model (BOX 2) predicts that the
nodes that appeared early in the history of the network
are the most connected ones15. Indeed, an inspection of
the metabolic hubs indicates that the remnants of the
RNA world, such as coenzyme A, NAD and GTP, are
among the most connected substrates of the metabolic
network, as are elements of some of the most ancient
•  pathways,
Growth as glycolysis and the tricarmetabolic
such
. In the
of the
protein
interacboxylic acid cycle
–  17Add a ncontext
ew node with m links tion networks, cross-genome comparisons have found
PreferenKal aUachment that, on• average,
the evolutionarily
older proteins have
more links to – 
other
proteins
than
theirto younger
New nodes prefer link to counhighly 45,46
terparts . Thisconnected offers directnempirical
odes evidence for
preferential attachment.
2
Origin of scale-­‐free networks ‘rich-­‐gets-­‐richer’ mechanism Origin of hierarchical
scale-­‐free networks
topology in Motifs,• modules
and
b
Proteins
Before duplication
Genes
Cellular functions
likely
to beDcarried
out in a highly
Genes
the c1are
ell: Gene uplicaDon modular manner . In general, modularity refers to a
group of physically or functionally linked molecules
(nodes)• thatEvidence: work togetherIn toPachieve
a (relatively) disAfter duplication
PI, e
voluKonary 1,6,8,47
. Modules are seen in many systems,
tinct function
older proteins have, on average, Proteins
for example, circles of friends in social networks or weblinks to other proteins sites that aremore devoted
to similar
topics
on the World
Wide Web. Similarly,
in many
engineered
syscompared to ycomplex
ounger proteins. tems, from a modern aircraft to a computer chip, a
Figure 3 | The origin of the scale-free topology and hubs
highly modular structure is a fundamental design
in biological networks. The origin of the scale-free topology
attribute.
The barabási & Albert model. Barabási & Albert, Science 286, 509 (1999)"
in complex networks can be reduced to two basic
Barabási ltvai ofZN. Network bof
iology: understanding the cell's funcKonal organizaKon. Nat Rev Genet. 2004 Feb;5(2):101-­‐13 BiologyAL, isOfull
examples
modularity.
Relatively
mechanisms: growth and preferential attachment. Growth
invariant protein–protein and protein–RNA complexes
Personal interacKon network at our center Personal interaction network at CBS
Technical University of Denmark
daily
weekly
monthly
Hanne and Bjørn Married today Kristoffer: Head of systems administraDon Søren: Center director Juncker, Juncker,
Fausbøll & de &Lde
ichtenberg, 2004 Fausbøll
Lichtenberg, 2004
Thomas Skøt Jensen
Introduction to Systems Biology, September 15th 2009
Key Challenge: Community detecKon Personal interaction network at CBS
Technical University of Denmark
daily
weekly
Research group monthly
Juncker, Juncker,
Fausbøll & de &Lde
ichtenberg, 2004 Fausbøll
Lichtenberg, 2004
Thomas Skøt Jensen
Introduction to Systems Biology, September 15th 2009
General assumpKon in network biology: topological, funcKonal and disease modules overlap Barabási, A.-­‐L., Gulbahce, N., & Loscalzo, J. (2011). Network medicine: a network-­‐based approach to human disease. Nature Reviews. GeneKcs, 12(1), 56–68. doi:10.1038/nrg2918 Community detecKon Figure 3
-log P-value
(a) Causal network identification
•  100s-­‐1000s of methods for finding modules in networks Derive gene scores from SNPs
–  fast vs slow –  overlapping vs non-­‐overlapping w
u
v
z
s
–  single-­‐scale vs mulK-­‐scale t
•  Many methods are implemented as cytoscape apps. –  MCODE •  Non-­‐edge-­‐weighted networks, topology based (b)
–  ClusterONE •  Edge-­‐weighted networks, topology based –  jAcKveModules Chromosome A
M
in
y
Chromosome B
Seed and extend: jActiveModules and NE
Greedily add highest weight nodes starting at see
G5
2
•  Node-­‐weight, seed and extend 1
G1
G4
3 G2
G3
4
G6
= seed node
Leiserson et al, Curr Opin Genet Dev, 2013 Summary Networks are made up by nodes and edges "
The degree of a node is the number edges to/from the node Cluster coefficient is a measure of the connecKon density in a subgraph High cluster coefficients are found in protein complexes and modules ‘Real’ networks are owen found to be ‘scale-­‐free’ –  High clustering coefficient and short average path length •  Scale free networks are very robust • 
• 
• 
• 
• 
Want to learn more ? Readings for today
Barabasi and Oltvai, 2004
Quick reference
REVIEWS
NETWORK BIOLOGY:
UNDERSTANDING THE CELL’S
FUNCTIONAL ORGANIZATION
Albert-László Barabási* & Zoltán N. Oltvai‡
A key aim of postgenomic biomedical research is to systematically catalogue all molecules and
their interactions within a living cell. There is a clear need to understand how these molecules and
the interactions between them determine the function of this enormously complex machinery, both
in isolation and when surrounded by other cells. Rapid advances in network biology indicate that
cellular networks are governed by universal laws and offer a new conceptual framework that could
potentially revolutionize our view of biology and disease pathologies in the twenty-first century.
PROTEIN CHIPS
Similar to cDNA microarrays,
this evolving technology
involves arraying a genomic set
of proteins on a solid surface
without denaturing them. The
proteins are arrayed at a high
enough density for the
detection of activity, binding
to lipids and so on.
*Department of Physics,
University of Notre Dame,
Notre Dame, Indiana 46556,
USA.
‡
Department of Pathology,
Northwestern University,
Chicago, Illinois 60611,
USA.
e-mails: [email protected];
[email protected]
doi:10.1038/nrg1272
Reductionism, which has dominated biological research
for over a century, has provided a wealth of knowledge
about individual cellular components and their functions. Despite its enormous success, it is increasingly
clear that a discrete biological function can only rarely
be attributed to an individual molecule. Instead, most
biological characteristics arise from complex interactions between the cell’s numerous constituents, such as
proteins, DNA, RNA and small molecules1–8. Therefore,
a key challenge for biology in the twenty-first century is to
understand the structure and the dynamics of the complex intercellular web of interactions that contribute to
the structure and function of a living cell.
The development of high-throughput data-collection
techniques, as epitomized by the widespread use of
microarrays, allows for the simultaneous interrogation
of the status of a cell’s components at any given time.
In turn, new technology platforms, such as PROTEIN CHIPS
or semi-automated YEAST TWO-HYBRID SCREENS, help to determine how and when these molecules interact with each
other. Various types of interaction webs, or networks,
(including protein–protein interaction, metabolic, signalling and transcription-regulatory networks) emerge
from the sum of these interactions. None of these networks are independent, instead they form a ‘network of
networks’ that is responsible for the behaviour of the
cell. A major challenge of contemporary biology is to
embark on an integrated theoretical and experimental
NATURE REVIEWS | GENETICS
programme to map out, understand and model in quantifiable terms the topological and dynamic properties of the
various networks that control the behaviour of the cell.
Help along the way is provided by the rapidly developing theory of complex networks that, in the past few
years, has made advances towards uncovering the organizing principles that govern the formation and evolution
of various complex technological and social networks9–12.
This research is already making an impact on cell biology.
It has led to the realization that the architectural features
of molecular interaction networks within a cell are shared
to a large degree by other complex systems, such as the
Internet, computer chips and society. This unexpected
universality indicates that similar laws may govern most
complex networks in nature, which allows the expertise
from large and well-mapped non-biological systems to be
used to characterize the intricate interwoven relationships
that govern cellular functions.
In this review, we show that the quantifiable tools of
network theory offer unforeseen possibilities to understand the cell’s internal organization and evolution,
fundamentally altering our view of cell biology. The
emerging results are forcing the realization that, notwithstanding the importance of individual molecules,
cellular function is a contextual attribute of strict
and quantifiable patterns of interactions between the
myriad of cellular constituents. Although uncovering
the generic organizing principles of cellular networks
VOLUME 5 | FEBRUARY 2004 | 1 0 1
Network topology
Barabási AL, Oltvai ZN. Network biology:
understanding the cell's functional organization.
Nat Rev Genet. 2004 Feb;5(2):101-13
21
CBS, Technical University of Denmark
A helping hand for
understanding the
methods and data
Course 27040 – Introduction to Systems Biology
21/02/2013
Or sKll not convinced: hUps://www.youtube.com/watch?v=nJmGrNdJ5Gw