Download Differentially expressed genes selection via Laplacian regularized

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Differentially expressed genes selection via
Laplacian regularized low-rank representation
Qufu Normal University
[email protected]
Yaxuan Wang
2016.10.5
Contents
The study background
The Laplacian regularized low-rank
representation method(LLRR)
Results and discussion
The study background
What is The Cancer ?
Cancer can start any place in the body.
It starts when cells grow out of control and
crowd out normal cells. This makes it hard
for the body to work the way it should.
Cancer is not just one disease. There are many
types of cancer. Cancer can start in the lungs, the
breast, the colon, or even in the blood. Cancers are
alike in some ways, but they are different in the ways
they grow and spread.
With the development of DNA sequencing
technology and next generation sequencing
technology, a large number of genomic data were
generated.
In recent years, many methods were
proposed and applied to discover differentially
expressed genes, such as Robust principal
component analysis(RPCA), Penalized matrix
decomposition(PMD) and so on.
low-rank representation model
Low-rank representation method has attracted
a lot attentions due to its pleased efficacy in
exploring low-dimensional subspace structures
embedded in data.
Low-rank representation aims to find the
lowest rank to represent all joint data.
The low-rank representation model
D
DZ
E
The low-rank representation model
As it is difficult to solve the optimization problem due to the
discrete nature of the rank function and L0-norm. A convex
relaxation of the optimization is written as follow:
min rank ( Z ) + λ E 0 ,
Z ,E
s.t.
D = DZ + E.
min Z * + λ E 1
Z,E
s.t. D = DZ + E
The disadvantage of LRR
Because LRR method does not take into
account the non-linear genometric structures
within data, thus the locality and similarity
information among data may be missing in the
learning process. To improve LRR in this
regard, we introduce the graph regularization
into LRR method
The graph regularization
Given matrix
D∈
m× n
, define a symmetric weight matrix
⎧1
w ij = ⎨
⎩0
W∈
n× n
if y i ∈ Ν ( y j ) or y j ∈ Ν ( y i )
otherwise.
Graph embedding aims at describing each
vertex of the graph by a low-dimensional vector that
preserves affinity between the vertex, where the
similarity is measured by the edge weight:
2
min ∑ z i − z j wij ,
{z k }
ij
The degree matrix is define as D, which is a
diagnoal matrix.
dii = ∑ j wij .
The graph laplacian matrix is define as:
L = D − W.
It is easy to prove that the graph embedding can
be rewritten as:
min tr ( ZLZT ) .
LLRR model
We use augment Lagrangia method and
alternating direction method of multipliers to
solve the optimization problem.
min
Z * +λ E1+
Z ,E
s.t.
min
Z,E
s.t.
µ
2
tr ( ZLZT ) ,
X = XZ + E.
J * +λ E1+
µ
2
tr ( ZLZT ) ,
X = XZ + E, Z =J.
Ψ ( Z, J , E, Y1 , Y2 ) = J * + λ E 1 +
µ
tr ( ZLZT )
2
+ X − XZ − E, Y1 + Z − J , Y2
+
µ
2
X − XZ − E
2
F
Z k +1 = D1/η ( Z k − ∇ Z q ( Z k / η ) ) .
Ek +1
J k +1
Y1k ⎞
⎛
= Sλ / µ ⎜ X − XZ k +1 +
⎟,
µ ⎠
⎝
⎧
Y2 k
⎛
= max ⎨ Sλ / µ ⎜ Z k +1 +
µ
⎝
⎩
⎞ ⎫
⎟ ,0 ⎬ ,
⎠ ⎭
+
µ
2
2
Z−J F.
Identification of differentially expressed genes
After observation matrix has been decomposed by
using LLRR, sparse perturbation matrix can be
obtained. Therefore the differentially expressed genes
can be identified according to sparse matrix.
∧
∧
⎛∧ ∧
⎞
E = ⎜ e1 , e2 ,L , e m ⎟
⎝
⎠
∧
ej =
n
∧
∑e
ij
i =1
−
−
⎛− −
⎞
E = ⎜ e1 , e 2 ,L , e m ⎟ .
⎝
⎠
Datasets
The Cancer Genome Atlas(TCGA):TCGA is a
collaboration between the National Cancer Institute
(NCI) and the National Human Genome Research
Institute (NHGRI) that has generated comprehensive,
multi-dimensional maps of the key genomic changes
in 33 types of cancer. Pancreatic cancer(PAAD)
and cholangiocarcinoma(CHOL)
Results and disussion
Summary
We applied the LLRR method to discover
differentially expressed genes. And this method
outperforms other many method.
In the future, we can take full advantage of lowrank structure and combine other good structure to
develop better method to select differentially
expressed genes.