* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt - hkust cse
Data analysis wikipedia , lookup
Computer simulation wikipedia , lookup
Generalized linear model wikipedia , lookup
General circulation model wikipedia , lookup
Regression analysis wikipedia , lookup
Corecursion wikipedia , lookup
Pattern recognition wikipedia , lookup
Predictive analytics wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Data assimilation wikipedia , lookup
Page replacement algorithm wikipedia , lookup
L11: Uses of Bayesian Networks Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology http://www.cse.ust.hk/~lzhang/ Page 2 Outline  Traditional Uses  Structure Discovery  Density Estimation  Classification  Clustering  An HKUST Project Page 3 Traditional Uses   Probabilistic Expert Systems  Diagnostic  Prediction Example: BN for diagnosing “blue baby” over phone in a London Hospital Comparable to specialist, Better than others Page 4 Traditional Uses  Language for describing probabilistic models in Science & Engineering  Example: BN for turbo code Page 5 Traditional Uses  Language for describing probabilistic models in Science & Engineering  Example: BN from Bioinformatics Page 6 Outline  Traditional Uses  Structure Discovery  Density Estimation  Classification  Clustering  An HKUST Project Page 7 BN for Structure Discovery  Given: Data set D on variables X1, X2, …, Xn  Discover dependence, independence, and even causal relationship among the variable.  Example: Evolution trees Page 8 Phylogenetic Trees   Assumption  All organisms on Earth have a common ancestor  This implies that any set of species is related. Phylogeny   The relationship between any set of species. Phylogenetic tree  Usually, the relationship can be represented by a tree which is called a phylogenetic (evolution) tree  this is not always true Page 9 Phylogenetic Trees  Phylogenetic trees Time giant panda lesser panda moose goshawk duck vulture alligator Current-day species at bottom Page 10 Phylogenetic Trees  TAXA (sequences) identify species  Edge lengths represent evolution time  Assumption: bifurcating tree topology AAGGCCT AAGACTT AGCACTT AAGGCAT AGGGCAT AGCACAA TAGACTT TAGCCCA AGCGCTT Time Page 11 Probabilistic Models of Evolution  Characterize relationship between taxa using substitution probability: P(x | y, t): probability that ancestral sequence y evolves into sequence x along an edge of length t t5 x5 t1 s1 x7 t2 s2 t3 s3 x6 t6 t4 s4 P(X7), P(X5|X7, t5), P(X6|X7, t6), P(S1|X5, t1), P(S2|X5, t2), …. Page 12 Probabilistic Models of Evolution  What should P(x|y, t) be?  Two assumptions of commonly used models  There are only substitutions, no insertions/deletions (aligned)  One-to-one correspondence between sites in different sequences   Each site evolves independently and identically P(x|y, t) = ∏i=1 to m P(x(i) | y(i), t)  m is sequence length AAGGCCT AAGACTT AGCACTT AAGGCAT AGGGCAT TAGACTT TAGCCCA AGCACAA AGCGCTT Page 13 Probabilistic Models of Evolution  What should P(x(i)|y(i), t) be?  Jukes-Cantor (Character Evolution) Model [1969]  Rate of substitution a (Constant or parameter?)  A A rt C st G st T st C st rt st st G st st rt st T st st st rt Multiplicativity (lack of memory) rt = 1/4 (1 + 3e-4at) st = 1/4 (1 - e-4at) Limit values when t = 0 or t = infinity? P(c | a, t1  t2 )   P(b | a, t1 ) P(c | b, t2 ) b Page 14 Tree Reconstruction    Given: collection of currentday taxa AGGGCAT, TAGCCCA, TAGACTT, AGCACAA, AGCGCTT Find: tree  Tree topology: T  Edge lengths: t Maximum likelihood  Find tree to maximize P(data | tree) AGGGCAT AGCACAA TAGACTT TAGCCCA AGCGCTT Page 15 Tree Reconstruction  When restricted to one particular site, a phylogenetic tree is an LT model where  The structure is a binary tree and variables share the same state space.  The conditional probabilities are from the character evolution model, parameterized by edge lengths instead of usual parameterization.  The model is the same for different sites AAGGCCT AAGACTT AGCACTT AGGGCAT AGCACAA TAGACTT TAGCCCA AGCGCTT Page 16 Tree Reconstruction  Current-day Taxa: AGGGCAT, TAGCCCA, TAGACTT, AGCACAA, AGCGCTT  Samples for LT model. One Sample per site. The samples are i.i.d.  1st site: (A, T, T, A, A),  2nd site: (G, A, A, G, G),  3rd site: (G, G, G, C, C), AAGGCCT AAGACTT AGCACTT AGGGCAT AGCACAA TAGACTT TAGCCCA AGCGCTT Page 17 Tree Reconstruction  Finding ML phylogenetic tree == Finding ML LT model  Model space:  Model structures: binary tree where all variables share the same state space, which is known.  Parameterization: one parameter for each edge. (In general, P(x|y) has |x||y|-1 parameters).  The objective is to find relationships among variables.  Applying new LTM algorithms to Phylogenetic tree reconstruction? Page 18 Outline  Traditional Uses  Structure Discovery  Density Estimation  Classification  Clustering  An HKUST Project Page 19 BN for Density Estimation  Given: Data set D on variables X1, X2, …, Xn  Estimate: P(X1, X2, …, Xn) under some constraints  ..  Uses of the estimate:  Inference  Classification Page 20 BN Methods for Density Estimation   Chow-Liu tree with X1, X2, …, Xn as nodes  Easy to compute  Easy to use  Might not be good estimation of “true” distribution BN with X1, X2, …, Xn as nodes  Can be good estimation of “true” distribution.  Might be difficult to find  Might be complex to use Page 21 BN Methods for Density Estimation  LC model with X1, X2, …, Xn as manifest variables (Lowd and Domingos 2005)  Determine the cardinality of the latent variable using hold-out validation,  Optimize the parameters using EM.  ..  Easy to compute  Can be good estimation of “true” distribution  Might be complex to use (cardinality of latent variable might be very large) Page 22 BN Methods for Density Estimation  LT model for density estimation  Pearl 1988: As model over manifest variables, LTMs  Are computationally very simple to work with.  Can represent complex relationships among manifest variables. Page 23 BN Methods for Density Estimation  New approximate inference algorithm for Bayesian networks (Wang, Zhang and Chen, AAAI 08, JAIR 32: 879-900, 08 ) Sample sparse sparse Learn dense dense Page 24 Outline  Traditional Uses  Structure Discovery  Density Estimation  Classification  Clustering  An HKUST Project Page 25 Bayesian Networks for Classification  The problem:   Given data: Find mapping  (A1, A2, …, An) |- C  Possible solutions  ANN  Decision tree (Quinlan)  …  (SVM: Continuous data) A1 A2 … An C 0 1 … 0 T 1 0 … 1 F .. .. .. .. .. Page 26 Bayesian Networks for Classification  Naïve Bayes model  From data, learn  P(C), P(Ai|C)  Classification  arg max_c P(C=c|A1=a1, …, An=an)  Very good in practice Page 27 Bayesian Networks for Classification   Drawback of NB:  Attributes mutually independent given class variable  Often violated, leading to double counting. Fixes:  General BN classifiers  Tree augmented Naïve Bayes (TAN) models  Hierarchical NB  Bayes rule + Density Estimation  … Page 28 Bayesian Networks for Classification  General BN classifier  Treat class variable just as another variable  Learn a BN.  Classify the next instance based on values of variables in the Markov blanket of the class variable.  Pretty bad because it does not utilize all available information because of Markov boundary Page 29 Bayesian Networks for Classification  TAN model  Friedman, N., Geiger, D., and Goldszmidt, M. (1997). Bayesian networks classifiers. Machine Learning, 29:131-163.  Capture dependence among attributes using a tree structure.  During learning,  First learn a tree among attributes: use Chow-Liu algorithm  Add class variable and estimate parameters  Classification  arg max_c P(C=c|A1=a1, …, An=an) Page 30 Bayesian Networks for Classification  Hierarchical Naïve Bayes models  N. L. Zhang, T. D. Nielsen, and F. V. Jensen (2002). Latent variable discovery in classification models. Artificial Intelligence in Medicine, to appear.  Capture dependence among attributes using latent variables  Detect interesting latent structures besides classification  Algorithm in the step of DHC..  … Page 31 Bayesian Networks for Classification  Bayes Rule  .   Chow-Liu  LC model  LT Model Wang Yi: Bayes rule + LT model is for superior Page 32 Outline  Traditional Uses  Structure Discovery  Density Estimation  Classification  Clustering  An HKUST Project Page 33 BN for Clustering    Latent class (LC) model  One latent variable  A set of manifest variables Conditional Independence Assumption:  Xi’s mutually independent given Y.  Also known as Local Independence Assumption Used for cluster analysis of categorical data  Determine cardinality of Y: number of clusters  Determine P(Xi|Y): characteristics of clusters Page 34 BN for Clustering Clustering Criteria  Distance based clustering:   Minimizes intra-cluster variation and/or maximizes inter-cluster variation LC Model-based clustering:  The criterion follows from the conditional independence assumption  Divide data into clusters such that, in each cluster, manifest variables are mutually independent under the empirical distribution. Page 35 BN for Clustering  Local independence assumption often not true  LT models generalize LC models  Relax the independence assumption  Each latent variable gives a way to partition data… multidimensional clustering Page 36 ICAC Data // 31 variables, 1200 samples C_City: s0 s1 s2 s3 // very common, quit common, uncommon, .. C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s3 Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... //totally intolerable, intolerable, tolerable,... ….. -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0 -1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0 -1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0 …. Page 37 Latent Structure Discovery Y2: Demographic info; Y3: Tolerance toward corruption Y4: ICAC performance; Y7: ICAC accountability Y5: Change in level of corruption; Y6: Level of corruption (Zhang, Poon, Wang and Chen 2008) Page 38 Interpreting Partition  Y2 partition the population into 4 clusters  What is the partition about? What is “criterion”?  On what manifest variables do the clusters differ the most?  Mutual information:  The larger I(Y2, X), the more the 4 clusters differ on X Page 39 Interpreting Partition  Information curves:  Partition of Y2 is based on Income, Age, Education, Sex  Interpretation: Y2 --- Represents a partition of the population based on demographic information  Y3 --- Represents a partition based on Tolerance toward Corruption Page 40 Interpreting Clusters Y2=s0: Low income youngsters; Y2=s1: Women with no/low income Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income Page 41 Interpreting Clustering Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15% Interesting finding: Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus Y3=s0: Same attitude toward C-Gov and C-Bus People who are touch on corruption are equally tough toward C-Gov and C-Bus. People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv Page 42 Relationship Between Dimensions Interesting finding: Relationship btw background and tolerance toward corruption Y2=s2: ( good education and good income) the least tolerant. 4% tolerable Y2=s3: (poor education and average income) the most tolerant. 32% tolerable The other two classes are in between. Page 43 Result of LCA  Partition not meaningful  Reason:   Local Independence not true Another way to look at it  LCA assumes that all the manifest variables joint defines a meaningful way to cluster data  Obviously not true for ICAC data  Instead, one should look for subsets that do define meaningful partition and perform cluster analysis on them  This is what we do with LTA Page 44 Finite Mixture Models  Y: discrete latent variable  Xi: continuous  P(X1, X2, …, Xn|Y): Usually multivariate Gaussian  No independence assumption  Assume states of Y: 1, 2, …, k P(X1, X2, …, Xn) = P(Y=i)P(X1, X2, …, Xn|Y=i): Mixture of k Gaussian components Page 45 Finite Mixture Models  Used to cluster continuous data  Learning  Determine  k: number of clusters  P(Y)  P(X1, …, Xn|Y)  Also assume: All attributes define coherent partition  Not realistic  LT models are a natural framework for clustering high dimensional data Page 46 Outline  Traditional Uses  Structure Discovery  Density Estimation  Classification  Clustering  An HKUST Project Page 47 Observation on How Human Brain Does Thinking  Human beings often invoke latent variables to explain regularities that we observe.  Example 1  Observe Regularity:  Beers, Diapers often bought together in early evening  Hypothesize (latent) cause:  There must be a common (latent) cause  Identify the cause and explain regularity  Shopping by Father of Babies on the way home from work  Based on our understanding of the world Page 48 Observation on How Human Brain Does Thinking  Example 2  Background: At night, watch lighting throw windows of apartments in big buildings  Observe Regularity:  Lighting from several apartments were changing in brightness and color at the same times and in perfect synchrony.  Hypothesize common (latent) cause:  There must be a (late) common cause  Identify the cause and explain the phenomenon:  People watching the same TV channel.  Based on understanding of the world Page 49 Back to Ancient Time  Observe Regularity  Several symptoms often occur together  ‘intolerance to cold’, ‘cold limbs’, and ‘cold lumbus and back’  Hypothesize common latent cause:   There must be a common latent cause Identify the cause  Answer based on understanding of world at that time, primitive  Conclusion: Yang deficiency (阳虚)  Explanation: Yang is like the sun, it warms your body. If you don’t have enough of it, feel cold. Page 50 Back to Ancient Time  Regularity observed:  Several symptoms often occur together  Tidal fever (潮热),heat sensation in palm and feet (手足心热), palpitation (心慌 心跳), thready and rapid pulse (脉细数)  Hypothesize common latent cause:   There must be a common latent cause Identify the cause and explain the regularirt  Yin deficiency causing internal heart (阴虚内热)  Yin and Yang should be in balance. If Yin is in deficiency, Yang will be in excess relatively, and hence causes heat. Page 51 Traditional Chinese Medicine (TCM)  Claim   TCM Theories = Statistical Regularities + Subjective Interpretations How to justify the claim Page 52 A Case Study  We collected a data set about kidney deficiency (肾虚)  35 symptom variables, 2600 records Result of Data Analysis  Y0-Y34: manifest variables from data  X0-X13: latent variables introduced by data analysis  Structure interesting, supports TCM’s theories about various symptoms. Page 53 Page 54 Other TCM Data Sets  From Beijing U of TCM, 973 project       Depression Hepatitis B Chronic Renal Failure COPD Menopause China Academy of TCM  Subhealth  Diabetes  In all cases, results of LT analysis match relevant TCM Theories Page 55 Result on the Depression Data Page 56 Significance  Conclusion   TCM Theories = Statistical Regularities + Subjective Interpretations Significance  TCM theories are partially based on objective facts  Boast user confidence  Can help to lay a modern statistical foundation for TCM  Systematically identify statistical regularities about occurrence of symptoms, find natural partitions  Establish objective and quantitative diagnosis standards  Assist in double-blind experiments for evaluate and improve the efficacy of TCM treatment Page 57 Outline  Traditional Uses  Structure Discovery  Density Estimation  Classification  Clustering  An HKUST Project
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            