Download Bayesian Factor Regression Models in the “Large p, Small n

Bayesian Factor Regression Models in the “Large p, Small n” Paradigm Mike West, Duke University Presented by: John Paisley Duke University Outline  Empirical Factor Regression (SVD)  Latent Factor Regression  Sparse Factor Regression Linear Regression & Empirical Factor Regression  Linear Regression  SVD Regression D is a diagonal matrix of singular values Empirical Factor Regression     By definition, Regression is now done in factor space using generalized shrinkage (ridge regression) priors on , e.g. RVM Problem of inversion: many-to-one mapping has is canonical “least-norm” inverse Example: Biscuit Dough Data     NIR spectroscopy reflectance values are predictors Response is fat content of dough samples 39 training, 39 testing: data are pooled and testing data responses treated as missing values to be imputed Top 16 factors used, based on size of singular values Example: Biscuit Dough Data (2)   Left: Fitted and predicted vs true values Right: Least-norm inverse of beta   ~ 1700 nm range is absorbance region for fat As can be seen, solution is not sparse Latent Factor Regression  Loosen   to Under proper constraints on B, this finds common structure in X and isolates idiosyncrasies to noise Now, variation in X has less effect on y   The implied prior is  When variance, Phi  0, this reverts to empirical linear regression Sparse Latent Factor Regression  WRT gene expression profiling, “multiple biological factors underlie patterns of gene expression variation, so latent factor approaches are natural – we imagine that latent factors reflect individual biological functions… This is a motivating context for sparse models.”   Columns of B represents the genes involved in a particular biological factor. Rows of B represent a particular gene’s involvement across biological factors. Example: Gene Expression Data     p = 6128 genes measured using Affymetrix DNA microarrays n = 49 breast cancer tumor samples k = 25 factors Factor 3 separates by red: estrogen receptor positive tumors blue: ER negative Example: Gene Expression Data  Comparison with results obtained using empirical SVD factors Conclusion  Sparse factor regression modeling is a promising framework for dimensionality reduction of predictors.  Only those factors that are relevant (e.g. factor 3) are of interest. Therefore, only those genes with non-zero values in that column of B are meaningful.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Bayesian Factor Regression Models in the “Large p, Small n