Download Gene expression clustering using gene ontology and biological

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics in stem-cell differentiation wikipedia , lookup

Point mutation wikipedia , lookup

Transposable element wikipedia , lookup

Oncogenomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

X-inactivation wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Copy-number variation wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Long non-coding RNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Gene therapy wikipedia , lookup

The Selfish Gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

NEDD9 wikipedia , lookup

Gene desert wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene nomenclature wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Gene expression clustering using
gene ontology and biological networks
Student:
Peter Leontev, SPbAU
Scientific advisor:
Son Pham, UCSD
Gene expression data source
●
Microarrays
●
RNA-seq
●
Single-cell RNA-seq
2
Gene expression data representation
●
CPM, FPKM, RPKM, TPM
●
Heatmap/PCA/tSNE/...
3
Clustering in a nutshell
●
Similar items should fall into the same clusters whereas
dissimilar items should fall in different clusters.
●
There is no single best criterion for obtaining a partition
because no single and precise definition of cluster exist.
●
As a result, there are tremendous amount of methods.
4
Proximity measure
●
Gene expresion values can be formalized as numerical
vectors:
●
There are many proximity metrics such as L1 and L2
norms, Mahalanobis distance, correlation, etc.
5
Clustering by genes, samples and biclustering
6
Two popular clustering algorithms
Hierchical
clustering
(HC) allows us to cluster
both genes and samples
in one picture and
see whole dataset.
7
Two popular clustering algorithms
K-means is a partitioning
method which requires
predefined number (K) of
clusters.
8
Disadvantages of proximity-based only algorithms
●
Use simple distance metrics for large-scale datasets.
●
Consider genes as independent entities.
●
Ignore known functions of genes.
Ref: When Is “Nearest Neighbor" Meaningful? Beyer et al, 1999
9
Embedding prior knowledge
●
Many clustering algorithms were developed that use
knowledge databases in the clustering process.
●
However, they follow the principle that integrating one
external source of knowledge to guide an algorithm is
enough.
10
Existing approaches (gene ontology)
●
A knowledge-based clustering algorithm driven by Gene
Ontology, Cheng et al, 2004.
●
Seeing the forest for the trees: using the Gene Ontology
to restructure hierarchical clustering, Dotan-Cohen et al,
2009.
11
Existing approaches (biological networks)
●
A network-assisted co-clustering algorithms to discover
cancer subtypes based on gene expression, Liu, 2014.
●
Improving clustering with metabolic pathway data, Diego
et al, 2015.
12
The goal
To combine different biological networks and gene ontology
to
integrate
them
into
gene
expression
(bi)clustering
algorithm.
13
Proposed clustering algorithm
14
Combining biological networks
●
If possible consider any network as directed and weighted.
●
Take into account as much as possible edge’s information.
15
Evaluation
Hypothesis: gene that show similar expression patterns have
common function and so may have the similar networks features
too.
Evaluation must answer the following questions:
1.
How similar gene expression values?
2.
Do we have consistency in the gene ontology terms?
3.
Can we infer some pathways?(*)
16
Dataset
●
Darmanis S, Sloan SA, Zhang Y,
Enge M et al. “A survey of human
brain transcriptome diversity at the
single cell level” (GSE67835).
17
Relevance
●
Do we have enough knowledge to cluster gene
expression data to determine cell types?
●
Do we have enough knowledge to cluster gene
expression data to determine cell subtypes?
18
Thank you!
Questions?
19