Download e-Self-study

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

RNA-Seq wikipedia , lookup

Protein adsorption wikipedia , lookup

Metabolism wikipedia , lookup

List of types of proteins wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Mitogen-activated protein kinase wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Gene regulatory network wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
STBC2023 – Introduction to Bioinformatics
Intermolecular Interactions &
Biological Pathways
M. Firdaus Raih
Room 1166, Bangunan Sains Biologi
Office Hours: Wednesdays
Phone: 0389215961 Email: [email protected]
Ver. 02-Mar-09-1
Pre-session Questions
•
•
•
•
•
•
Do biological macromolecules act alone?
What are intermolecular interactions?
What are biological pathways?
How do we identify biological interactions?
Can these interactions and pathways be organised?
Where can information on intermolecular interactions
and biological pathways be found?
• Can these interactions and pathways be visualised?
• Can these interactions and pathways be searched and
compared?
• What can be derived from the compiled information on
interactions/pathways?
Do biological macromolecules act alone?
• What are the macromolecules involved?
• Do these macromolecules act alone?
• Think of the central dogma of molecular biology and
relate it to the questions here.
Do biological macromolecules act alone?
• What are the macromolecules involved?
DNA, RNA, Proteins and carbohydrates/sugars
• Do these macromolecules act alone?
NO.
• For example in the central dogma:
• DNA replication – would involve interaction of a protein (DNA
polymerase) with DNA (being copied).
• RNA synthesis would involve interactions of protein to DNA and RNA.
• Protein synthesis involves interaction of protein with RNA.
• For many biological functions to take place, intermolecular
interactions are involved.
What are intermolecular interactions?
• What are intermolecular interactions?
• What are biological pathways?
• Where is all this leading to?
What are intermolecular interactions?
• What are intermolecular interactions?
Interactions between different molecules – we are interested in the functions or
outcomes of these interactions
• What are biological pathways?
Interactions result in reactions  the outcomes of the reactions are involved in
other interactions and so on – these series of reactions form a pathway
Can be defined as: a modular unit of interacting molecules to fulfill a cellular
function.
ie. Metabolic pathway. Can you think of other examples?
• Where is all this leading to?
The post genomics age – there is now a huge amount of data available. Individual
biochemical components of organisms can be assembled into a representative
model for an organism  systems biology
What are intermolecular interactions?
Challenges of the post genomics age:
–
Data integration: integrate diverse biological information (amidst constant
data generation)
•
•
•
•
•
Scientific literature, existing body of knowledge about cellular systems
Genomic sequences
Protein sequences, motifs, and structures
Expression data from microarray, dbEST, and RT-PCR
Protein-protein interaction data from large-scale screening
– Functional discovery: assign functions to the 60,000+ human genes
• Only 5% of known genes have assigned function
• We have no clue what the function for the majority of discovered genes
• In addition to genes themselves – the epigenome is also believed to play an
important role.
• Without understanding function, no drug discovery can be done in either small
molecule, or in biopharmaceuticals.
• Will be the focus of next 20-years of life-science research.
Where does the data come from?
• Generally, the data on interactions come from the (wet) laboratory.
• Numerous methods are available to study different types of
macromolecular interactions.
• These studies are then collated into pathways.
• It then becomes necessary to organise/store, integrate and visualise
interactions/pathways data.
• The integrated data also needs to be able to be interrogated and
compared.
Databases of Interactions and Pathways
• The level of complexity for pathway databases are higher when
compared to databases previously looked at.
– The genome
• 4 bases
• 3 billion bp total
• 3 billion bp/cell, identical
– The proteome
• 20 amino acids
• ~60K genes, ~200K proteins
• ~10K proteins/cell; different cells/conditions, different expressions
– The pathome
• ~200K reactions
• ~20K pathways
• ~1K pathways/cell; different cells/conditions, different expressions
Databases of Interactions and Pathways
• Homology (evolutionary relationships) is perhaps the most important
assumption for bioinformatics analysis at the sequence level.
• For pathways, evolutionary relationships also applies.
• For pathways we look at:
• Evolution of a simple pathway to a more complex pathway
• Duplication, diversification and modular re-use of pathways
• This can be done by first building the pathways, followed by
comparisons of the pathways. This is not unlike sequence and
structure comparisons albeit with its own uniqueness.
Databases of Interactions and Pathways
• Why study pathways?
To answer fundamental questions about biology.
• What is the minimum pathways for life?
• How does a new function arise?
• How can answers to these fundamental questions be applied?
Some examples:
Complete information of pathways will give rise to knowledge of what is
essential and what is redundant. Such knowledge can then be tapped for
discovery and/or design of drugs.
The uniqueness of individual pathomes can be tapped for concepts such as
personalized medicine.
Databases of Interactions and Pathways
What are the data types, file formats, data
structures/representations for pathway databases?
– Data types:
• Sequence, interaction, ligand, literature, annotations (functions,
mechanisms etc.), structures, structural interactions etc.
– Data (file) formats – mainly text
• Such as FASTA format for sequences, PDB format for structures and
miscellaneous text files containing data of interactions etc, files containing
data for relationships.
– These individual bits of data can be represented visually; while the
data can be structured and arranged as tables in relational
databases.
Databases of Interactions and Pathways
Relational database implementation example (with only protein nodes shown).
Protein_Table
Gene_Table
gene_id
chromosome
start
stop
seq_id
cellular location
seq_txt
gene_id
gene_id
motif_id
description
regular
expresssion
HMM_matrix
motif_id
Pathway_Table
protein A
protein B
pathway_id
literature_id
pathway_id
pathway_name
description
species
curator
entry_data
Info flow direction
seq_id
Motif_Def_Table
protein=seq_id
Interaction_Table
literature_id
Protein_Motifs
Literature_Table
motif_id
seq_id <fk>
literature_id
author
journal
pub_date
PDF_file
pathway_id
Visualisation of Interactions and Pathways
• Lab data can be collated and built into a pathway.
• In general - pathways can be visualised symbolically.
A circle indicates a protein or a non-protein biomolecule.
A symbol in between indicates the nature of molecule-molecule interaction.
Visualisation of Interactions and Pathways
• Generally we can visualise a pathway as a network.
• In Graph theory:
Relationships between data can be visualised using constructs
referred to as graphs.
A graph consists of a set of nodes (vertices) and connected by
edges (which define the relationship(s) between the nodes).
Visualisation of Interactions and Pathways
Arcs / Edges
Vertices / Nodes
• In Graph theory:
Relationships between data can be visualised using constructs
referred to as graphs.
A graph consists of a set of nodes (vertices) and connected by
edges (which define the relationship(s) between the nodes).
Visualisation of Interactions and Pathways
R
Y
L
I
• In Graph theory:
Relationships between data can be visualised using constructs
referred to as graphs.
A graph consists of a set of nodes (vertices) and connected by
edges (which define the relationship(s) between the nodes).
Visualisation of Interactions and Pathways
• Another feature of graphs in graph theory is that they can be directed
or undirected. Therefore the relationships between graphs can be
represented using vectors.
• How can a pathway visualisation be a graph?
• What other examples of graphs can you think of?
Visualisation of Interactions and Pathways
• How can graphs be represented for the computer?
Visualisation of Interactions and Pathways
6
c
b
2
8
a
4
10
d
• How can graphs be represented for the computer?
Visualisation of Interactions and Pathways
6
c
b
2
8
a
4
10
d
• How can graphs be represented for the computer?
The graphs can be represented by matrices.
a
a
b
c
d
b
c
8
d
4
6
10 2
Adjacency matrix
a c (8), d (4)
b
c b (6)
d c (2), b (10)
Adjacency list
Curation of Pathway Databases
• Database entries can come from:
• Raw experimental data
• Other databases / existing datasets
• Annotations / supplementary information
 The data can in themselves contain errors in addition to introduced errors.
• Database curation involves not only annotation and integration of new data but
also measures which include quality control and data integrity safeguards.
• Typically, databases can be populated via high-throughput means which can then
be curated by computer programs or expert manual curation via appropriate
interfaces. The same concepts are also applied for pathway databases.
• Many databases are relational type databases, however there are some which may
employ other approaches such as XML.
• In addition to curation, some databases curate and validate data at the same time.
Databases of Interactions and Pathways
Relational database implementation example (with only protein nodes shown).
Protein_Table
Gene_Table
gene_id
chromosome
start
stop
seq_id
cellular location
seq_txt
gene_id
gene_id
motif_id
description
regular
expresssion
HMM_matrix
motif_id
Pathway_Table
protein A
protein B
pathway_id
literature_id
pathway_id
pathway_name
description
species
Info flow direction
seq_id
Motif_Def_Table
protein=seq_id
Interaction_Table
pathway_id
curator
entry_data
literature_id
Protein_Motifs
Literature_Table
motif_id
seq_id <fk>
literature_id
author
journal
pub_date
PDF_file
In this example:
A pathway database can be built from an integration of data from external databases.
Other data tables other than the ‘Pathway Table’ is sourced from existing data sources.
Data for the ‘Pathway Table’ is curated manually either from raw data or collated data/information. A
‘curator’ field enables identification of the curator for a particular entry.
Navigating Pathway Databases
• Overall search strategy
• Relevancy and specificity of purpose of the database used – growing number
of databases; some are specific for particular pathways, diseases, organisms
while some are well established for well known metabolic pathways;
Searches must therefore be targeted at the correct resource.
• Several methods can be used to navigate pathway databases:
• Browsing
• Hyper-links, menus – based on either protein name/family, ligand name, substrates,
pathway name etc.
• Hyperlinked images / pathway diagrams to browse pathways reaction by reaction.
• Directed searches
•
•
•
•
•
Boolean type searches / ID or keyword searches
Sequence database search
Profile searches
Motif searches
Structure matching searches (ie. Small molecule comparisons to identify similar ligands)
Searching & Comparing Pathways
• Comparisons of pathway graphs can be used to search for similar pathways.
• Comparisons can be done using several established algorithms and will depend on
the type of comparison carried out.
• Example: the graph comparisons can be carried out as graph/subgraph
isomorphism problems.
Can you present how this can be done in concept?
Searching & Comparing Pathways
• Comparisons of pathway graphs can be used to search for similar pathways.
• Comparisons can be done using several established algorithms and will depend on
the type of comparison carried out.
• Example: the graph comparisons can be carried out as graph/subgraph
isomorphism problems.
Can you present how this can be done in concept?
Arcs / Edges
Algorithm compares subgraph
Vertices / Nodes
Input pattern
Searching & Comparing Pathways
• Comparisons of pathway graphs can be used to search for similar pathways.
• Comparisons can be done using several established algorithms and will depend on
the type of comparison carried out.
• Example: the graph comparisons can be carried out as graph/subgraph isomorphism
problems.
How can the previous subgraph isomorphism representation be made as an input
for computation?
Can you find other ways on how pathway searches and comparisons can be
carried out?
Can you find other applications of graph theory in bioinformatics?
Investigating Interactions
• To identify which approach is best fitted for your purposes, several questions need
to be posed and answered: Examples • What is it ultimately that you are interested in:
• Components of a single interaction?
• Interactions involving members of a family of proteins?
• Some strategies which can be considered:
• Interested in a single interaction?
• Extract the information available for the interaction of interest; identify the components and
other information of interest.
• Is structural interaction data available? Visualize and explore the interactions at atomic level.
What is involved in this interaction at atomic level?
• Are there existing inhibitors? Find other similar inhibitors perhaps by small molecule structure
comparison methods.
• Interested in understanding interactions for a family of protein, specific sets or
in general – getting the macro information out.
• What is important at this level? Generalisation – can be presented as statistics.
• For example:
(1) Identify all protein structures which are bound to DNA.
(2) Get statistics for the binding: which residues prefer to bind DNA, which motifs, etc?
What can such information be used for?
Building Pathway Databases: in Practice
Pathway Databases: Examples
Pathway Databases: in Practice
Applying Pathway Information
What can pathway information be used for?
Applying Pathway Information
What can pathway information be used for?
• Understanding pathways in order to inhibit / intervene.
• To build up background data for simulation purposes.
Useful for?
Applying Pathway Information
What can pathway information be used for?
• Understanding pathways in order to inhibit / intervene.
• To build up background data for simulation purposes.
Useful for:
• Drug discovery and design.
• Experimental design and refinements.
• Whole cell/organism simulations.
• Synthetic biology.
• Personalized medicine.
Applying Pathway Information
What can pathway information be used for?
• Understanding pathways in order to inhibit / intervene.
• To build up background data for simulation purposes.
Useful for:
• Drug discovery and design.
• Experimental design and refinements.
• Whole cell/organism simulations.
• Synthetic biology.
• Personalized medicine.
Self Study and Self Assessment
• The self study module for this series of lectures on analyses on
intermolecular interactions and biological pathways will be available for
download from SPIN. Format of the file is powerpoint show (.pps).
• The self assessment quiz is accessible from within the SPIN interface.
• Both these materials are for self assessment and self study use and
DOES NOT contribute to your final grades for this course.
• Also explore the references and texts listed in the course information file
and reading list.
• Explore resources made available via this self-study material.
Guide
• This is a electronic self study and self assessment module which is based on the
lectures which cover the topic – Intermolecular Interactions & Biological
Pathways of the STBC2023 – Introduction to Bioinformatics course.
• To navigate this module, use the buttons provided mostly on the bottom right
hand corner of the page or in some slides, the bottom left hand corner. The Home
icon button will automatically set the slide back to the key questions which we
are trying to answer with this course material. Several pages have hyperlinks
which navigate immediately to either specific slides OR navigate away from this
module via the default web browser. To return, simply click back this file. Not
clicking on the buttons properly will result in normal powerpoint slideshow mode
progression of the slides as opposed to navigating to the directed pages.
• Practicals and self assessment questions to gauge your comprehension of a given
concept or practical session are also provided throughout. Please attempt the
practicals and the questions on your own before resorting to the solutions or
answers provided.
Further Reading
Recommended Textbook (Lesk, 2nd Ed.)
• Basics – Chapter 1
– Pages 1-59
• Sequence alignments – Chapter 5, Chapter 1
– Pages 242-270
– Pages 21-59
Other Textbooks
• Baxevanis & Oullette, 3rd edition
– Chapters 5-7
• Pevsner