Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic of Probability Theory for Ph.D. students in Education, Social Sciences and Business (Shing On LEUNG and Hui Ping WU) (May 2015) This is a series of 3 talks respectively on: A. Probability Theory B. Hypothesis Testing C. Bayesian Inference Lecture 2: Hypothesis Testing 1 B. Hypothesis Testing Probability theory only gives us a basic framework. How can we proceed to make decisions? Classical Vs Bayesian Null Hypothesis: H0: score of boys = girls (say) Null Hypothesis TRUE Wrong Decision Accept Correct Type II error Type I Reject error Correct (Power) 2 Step 1: Construct a test statistics (t-value for t-test) Step 2: Get a null distribution of that test statistics (e.g. t-distribution, or Normal) Step 3: Construct a critical region (5%) under H0 Decision rules: Under H, create a critical region (usually 5%), if the data (summarized by a test statistics) fall into this region, we reject, and conclude the alternative (non-H0) is true. 3 Hypothesis Testing and Courtroom trial Null Hypothesis = Hypothesis of innocence, assume defendant is innocent first The defendant is convicted only if there is enough charging evidence. The hypothesis of innocence is only rejected when an error is very unlikely (i.e. 5% or even less) But, the error of the second kind (acquitting a person who committed the crime) (accepting a wrong hypothesis), can be quite large. Please refer to the following link for details http://en.wikipedia.org/wiki/Statistical_hypothesis_testing 4 The p-value Before the computer age, judge whether the observation fall or not fall into the critical region (having a pre-computed table to match). So, > or < 5%, or > or < 1%, > or < 0.1%, etc. Now, with computer, we can compute the observed significant value, the p-value (or p) If the p < 0.05, reject H0 (as the chance is rare) But, if p > 0.05, we cannot say we accept H0, because … If p = 0.06, what would you say? Marginal significant! Some regards p between 0.1 and 0.05 as marginal significant, but no consensus. Better look at the exact value of p, be statistical sensitive 5 A good statistician should not think binary (extreme opposite), i.e. black vs white, good vs bad, etc, but a matter of degree (otherwise, you are not good!) Statistics provides analysis (or indicators) to make decisions, not making decisions. 6 Types of Hypothesis There can be many (infinite) hypothesis We just highlight those commonly encountered in Education, Business and Social Sciences Study of relation / differences H0: No relation / no differences H1: otherwise, i.e. relation and differences t-test, ANOVA, correlation r=0, etc. Please refer to other sources These are well established test with (i) test statistics, (ii) null distributions and, of course (iii) critical region The above are only popular simple examples in Hypothesis Testing But not necessary for other cases 7 Review procedure in Hypothesis “generally” Step 1: Construct a test statistics (can be difficult) Step 2: Get a null distribution of that test statistics (most difficult) Step 3: Decision, via construct a critical region (5%) 8 t-test (complication behind t-test) (please refer to other sources) t-test is correct, but not T-test Step 1: construct the statistics, t Step 2: (i) variance known (normal distribution) (not realistic), (ii) variance unknown (t-distribution) (realistic) pdf, f(x) = Of course, when N is large (say N > 30), t can be approximated by Normal, but not other distributions Step 3: Decision, easy But, computers do all for you 9 Even for Hypothesis Testing with simple t-test (which is the simplest), there are some complication behind. Others are much more complicated. 10 Nuisance parameters (nuisance but can be important) In t-test, we are interested in µ, but σ is unknown. σ makes things complicated, and is called nuisance parameter There can be many nuisance parameters. For example, in EFA or CFA, we want to confirm, say, 3 factors with 30 variables. Number of nuisance parameters are at least 90 (=30x3) and 30 (=10x3) for factor loadings in EFA and CFA respectively! We haven’t yet counted the variances! We are not interested in particular values of parameters, but just want to confirm the factors. Number of parameters matter very much in complex modeling, say EFA, CFA, SEM, etc. Hence, likelihood ratio test (LRT), parametric bootstrapping, Bayesian analysis, are used. 11 Hypothesis Testing for model fitting H0: A specific model fit H1: otherwise How to get (i) a test statistics, and (ii) null distribution Likelihood and Likelihood Ratio Test (LRT) Likelihood L(θ/x) = Pr(x/θ) Usually, θ is not a single value (scalar), but many values (vector) Pr(x/θ) is the chance of the outcome given parameters. For example, given that a coin is fair (Pr(θ=0.5) and we flow it 10 times, what is the chance of getting at least 5 Heads, (Pr(x>=5/θ=0.5)) etc. Likelihood (not probability) is a function of parameters (θ) given our data (x). If we flow a coin 10 times and observed 6 Heads, what is the most possible value of θ (θ=chance of getting a Head). This is 12 not a probability. Two are mathematically (or numerically) the same but roles are different Maximum Likelihood Estimation (MLE or mle) This refers to finding out the most possible value of parameters θ, given data x And, this is done by maximizing L with respect to θ. What values of θ that gives a maximum value of L? It is one (popular) method of parameter estimations It is an parameter estimation method, not a test 13 Likelihood Ratio Test (LRT) (It is a test) H0: θ=θ0, Model 1 fit H1: θ=θ1, Model 2 fit (say) LRT is to compare H0 vs H1. LR (likelihood ratio) = L(θ0/x) / L(θ1/x) Usually we take the logarithm, ratio -> differences, log-likelihood ratio statistics Usually, θ0 fixes some parameter values, say 0 (e.g. correlation between two variable is zero, or factor loading equal to zero in CFA, etc) And, θ1 takes the most possible values where parameters is not fixed (e.g. MLE) So, usually, null is less likely than alternative, and the ratio is smaller than 1. If the ratio is too small, null is less likely than the alternative, we reject null. Too small = 5%. 14 Step 1: test statistics is constructed, but, step 2, the distribution? Usually, not known, quite complicated If N is large (plus other “regularity conditions”), - 2 log (LR) ~ χ2 (df), df = degree of free = difference of parameters between null and alternative A big “if” http://en.wikipedia.org/wiki/Likelihood-ratio_test Other than LRT, there are other tests, but usually complicated and also assume N is large, etc. Other ways, (i) parametric bootstrapping, (ii) Bayesian approach 15 Asymptotic vs exact p-values Usually, asymptotic (approximate) p-values are used, usually assuming Normal In some simple classroom problems, e.g. tossing a coin, some exact tests are provided, but not for complicated problems Parametric bootstrapping provides computer-generated p-values, which is close to the exact (or may be the best human being can do!) 16 Parametric bootstrapping Step 1: Have a real data Step 2: Estimate parameters (e.g. μ in Normal, or factor loadings in EFA or CFA) Step 3: Compute a statistics (t-test, or fit statistics of EFA, CFA, etc) *step 4, 5, 6 are to be repeated many times* Step 4: (*repeat) Generate a data from the model in Step 2 Step 5: (*repeat) Estimate parameters for data in Step 4 (as if Step 2 to Step 1) (most time consuming) Step 6: (*repeat) Repeat Step 3 but treat data in Step 4 and parameters in Step 5 Step 7: Repeat Step 4 to 6 to form an empirical null distribution Step 8: Compare the statistics in Step 3 and null in Step 7 (search for “parametric bootstrap” or otherwise or ask me later) 17 A computational intensive method Common Problem for most Ph.D. students If this specific model fit, it doesn't imply other models don't fit (common to parametric bootstrapping and Bayesian Inference) And, fit vs don’t fit, in many cases, is a matter of degree (unless is p-value is >0.8, or < 0.05) 18 Classical Pr(X/θ) Vs Bayesian Pr(θ/X) (Next lecture on Bayesian) Q&A Shing On LEUNG [email protected] Hui Ping WU [email protected] 19