Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Differentially expressed genes selection via Laplacian regularized low-rank representation Qufu Normal University [email protected] Yaxuan Wang 2016.10.5 Contents The study background The Laplacian regularized low-rank representation method(LLRR) Results and discussion The study background What is The Cancer ? Cancer can start any place in the body. It starts when cells grow out of control and crowd out normal cells. This makes it hard for the body to work the way it should. Cancer is not just one disease. There are many types of cancer. Cancer can start in the lungs, the breast, the colon, or even in the blood. Cancers are alike in some ways, but they are different in the ways they grow and spread. With the development of DNA sequencing technology and next generation sequencing technology, a large number of genomic data were generated. In recent years, many methods were proposed and applied to discover differentially expressed genes, such as Robust principal component analysis(RPCA), Penalized matrix decomposition(PMD) and so on. low-rank representation model Low-rank representation method has attracted a lot attentions due to its pleased efficacy in exploring low-dimensional subspace structures embedded in data. Low-rank representation aims to find the lowest rank to represent all joint data. The low-rank representation model D DZ E The low-rank representation model As it is difficult to solve the optimization problem due to the discrete nature of the rank function and L0-norm. A convex relaxation of the optimization is written as follow: min rank ( Z ) + λ E 0 , Z ,E s.t. D = DZ + E. min Z * + λ E 1 Z,E s.t. D = DZ + E The disadvantage of LRR Because LRR method does not take into account the non-linear genometric structures within data, thus the locality and similarity information among data may be missing in the learning process. To improve LRR in this regard, we introduce the graph regularization into LRR method The graph regularization Given matrix D∈ m× n , define a symmetric weight matrix ⎧1 w ij = ⎨ ⎩0 W∈ n× n if y i ∈ Ν ( y j ) or y j ∈ Ν ( y i ) otherwise. Graph embedding aims at describing each vertex of the graph by a low-dimensional vector that preserves affinity between the vertex, where the similarity is measured by the edge weight: 2 min ∑ z i − z j wij , {z k } ij The degree matrix is define as D, which is a diagnoal matrix. dii = ∑ j wij . The graph laplacian matrix is define as: L = D − W. It is easy to prove that the graph embedding can be rewritten as: min tr ( ZLZT ) . LLRR model We use augment Lagrangia method and alternating direction method of multipliers to solve the optimization problem. min Z * +λ E1+ Z ,E s.t. min Z,E s.t. µ 2 tr ( ZLZT ) , X = XZ + E. J * +λ E1+ µ 2 tr ( ZLZT ) , X = XZ + E, Z =J. Ψ ( Z, J , E, Y1 , Y2 ) = J * + λ E 1 + µ tr ( ZLZT ) 2 + X − XZ − E, Y1 + Z − J , Y2 + µ 2 X − XZ − E 2 F Z k +1 = D1/η ( Z k − ∇ Z q ( Z k / η ) ) . Ek +1 J k +1 Y1k ⎞ ⎛ = Sλ / µ ⎜ X − XZ k +1 + ⎟, µ ⎠ ⎝ ⎧ Y2 k ⎛ = max ⎨ Sλ / µ ⎜ Z k +1 + µ ⎝ ⎩ ⎞ ⎫ ⎟ ,0 ⎬ , ⎠ ⎭ + µ 2 2 Z−J F. Identification of differentially expressed genes After observation matrix has been decomposed by using LLRR, sparse perturbation matrix can be obtained. Therefore the differentially expressed genes can be identified according to sparse matrix. ∧ ∧ ⎛∧ ∧ ⎞ E = ⎜ e1 , e2 ,L , e m ⎟ ⎝ ⎠ ∧ ej = n ∧ ∑e ij i =1 − − ⎛− − ⎞ E = ⎜ e1 , e 2 ,L , e m ⎟ . ⎝ ⎠ Datasets The Cancer Genome Atlas(TCGA):TCGA is a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) that has generated comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer. Pancreatic cancer(PAAD) and cholangiocarcinoma(CHOL) Results and disussion Summary We applied the LLRR method to discover differentially expressed genes. And this method outperforms other many method. In the future, we can take full advantage of lowrank structure and combine other good structure to develop better method to select differentially expressed genes.