Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the population Predict sample statistics based on population parameters (e.g. ≈ µ) Select random sample from population Compare observed sample data with predicted values 2 Step 1: State the Hypotheses The null hypothesis, H0, states that in the population there is no change, no difference, or no relationship H0: µtreatment = constant (e.g. µ) e.g. H0: µtreatment = 100 This is read as: “The null hypothesis is that the population mean of people receiving the treatment equals 100” H0 is that the treatment had no effect 3 1 H0 The null hypothesis must contain an equal sign of some sort (=, ≥, ≤) Statistical tests are designed to reject H0, never to accept it 4 H1: The Alternative Hypothesis The alternative hypothesis usually takes the following form: H1: µtreatment ≠ constant (e.g. µ) e.g. H1: µtreatment ≠ 100 This is read as: “The alternative hypothesis states that the population mean of people receiving the treatment does not equal 100” H1 is that the treatment had an effect 5 H0 and H1 Together, the null and alternative hypotheses must be mutually exclusive and exhaustive Mutual exclusion implies that H0 and H1 cannot both be true at the same time Exhaustive implies that each of the possible outcomes of the experiment must make either H0 or H1 true 6 2 Step 2: Set the Decision Criteria What sample means are consistent with H0 and what sample means are consistent with H1? Separate distribution of sample means into two sets of regions – one whose means are consistent with H0 and one whose means are consistent with H1 n = 25, µ = 100, σ = 15 for graph Extreme, lowprobability values if H0 is true 90 Sample means close to H0: highprobability values if H0 is true 95 100 105 Extreme, lowprobability values if H0 is true 110 7 α Level The α level (alpha level; level of significance) is a probability value that is used to define the very unlikely sample outcomes if H0 is true Psychologists usually adopt α = 0.05, although α = 0.01 and α = 0.001 are sometimes used The critical region is composed of the extreme sample values that are very unlikely (as specified by the α level) to be obtained if H0 is true 8 Critical Regions Since we can reject H0 two ways (extremely small or extremely large sample means), the α level is divided across the two tails of the distribution Find the z-score whose area above equals α / 2 z = 1.96 for α = 0.05 Find raw scores that correspond to that z score X = 100 + 1.96 · 3 = 105.9 X = 100 – 1.96 · 3 = 94.1 Extreme, lowprobability values if H0 is true, z = -1.96 90 Sample means close to H0: highprobability values if H0 is true 95 100 105 Extreme, lowprobability values if H0 is true, z = 1.96 110 9 3 Step 3: Collect Data & Compute Sample Statistics Randomly sample from population In this example, n = 25 Give the sample the treatment Measure the dependent variable Calculate the z score of sample mean in the sampling distribution In this example the sample statistics are, = 107, s = 14; population parameters from slide 7 (IQs) 10 Step 4: Make a Decision = 107; z = 2.33 If the sample mean’s zscore is in the extreme tails of the sampling distribution (e.g. in the critical region), reject H0; otherwise, fail to reject H0 Critical region is z > 1.96 or z < -1.96 for α = 0.05 The example z is 2.33. It is in the critical region. Therefore, reject H0 It is likely the case that the treatment had an effect Extreme, lowprobability values if H0 is true, z = -1.96 90 Sample means close to H0: highprobability values if H0 is true 95 100 Extreme, lowprobability values if H0 is true, z = 1.96 105 110 11 Reject H0 or Fail to Reject H0 The only decisions you ever make in hypothesis testing are Reject H0. or Fail to reject H0 No other decisions are possible Never reject H1 Never accept H1 Never accept H0 12 4 Type I (α) Error A type I (or α) error occurs when a researcher rejects H0 when H0 is really true Researcher concludes that the treatment had an effect when it did not This should happen with a probability equal to α 13 Type II (β) Errors A type II (or β) error occurs when a researcher fails to reject H0 when H0 is really false Researcher concludes that there is insufficient evidence to suggest that the treatment had an effect when in fact it does have an effect This should happen with a probability equal to β 14 β Unlike α, β is not directly set by the researcher β depends on the sample size (n) β depends on how much the treatment affects the dependent variable β depends on the variability of the data β depends on α 15 5 Type-I and Type-II Errors Ideally, we would like to minimize both TypeI and Type-II errors This is not possible for a given sample size When we lower the α level to minimize the probability of making a Type-I error, the β level will rise When we lower the β level to minimize the probability of making a Type-II error, the α level will rise 16 Type-I and Type-II Errors 17 Factors that Influence a Hypothesis Test The size of the mean difference The larger the mean difference is, the more likely you are to reject H0 The variability of the scores The more variable the scores are, the less likely you are to reject H0 The number of scores in the sample The larger the sample size, the more likely you are to reject H0 18 6 Assumptions of the z-Score Hypothesis Test Random sampling If the sample is not selected randomly from the population, it probably will not represent the population Independent observations σ does not change as a result of the treatment Distribution of sample means is normal 19 Directional vs Non-Directional Hypotheses The hypotheses we have been talking about are called non-directional hypotheses because they do not specify how the population mean should differ from the constant That is, they do not say that the population mean should be larger than the constant They only state that the population mean should differ from the constant Non-directional hypotheses are sometimes called two-tailed tests 20 Directional vs Non-Diretional Hypotheses Directional hypotheses include an ordinal relation between the population mean and the constant That is, they state that the population mean should be larger than the constant For directional hypotheses, the H0 and H1 are written as: H0: µtreatment ≤ constant H1: µtreatment > constant Directional hypotheses are sometimes called one-tailed tests 21 7 1 Tailed When performing a one tailed test, all of the critical region is in one tail of the distribution of sample means Do not divide α by two when finding the z score for the critical region This increases statistical power – the probability of correctly rejecting a false H0 22 1 Tailed vs. 2 Tailed 1 Tailed 2 Tailed α= .05, z = 1.65 Critical region in one tail -3 -2 -1 0 1 2 α=.05, z = -1.96 Critical region in two tails 3 -3 -2 -1 α=.05, z = 1.96 Critical region in two tails 0 1 2 3 23 Concerns about Hypothesis Testing Hypothesis testing focuses on the data, and not the hypothesis When we reject H0, we should really say “This specific sample mean is very unlikely (p < .05) if the null hypothesis is true Statistical significance ≠ practical significance The effect size can be small, but still be statistically significant if the sample size is sufficiently large 24 8 Effect Size A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of the size of the sample(s) being used Cohen’s d is a measure of effect size 25 Effect Size What is the effect size for the example on slide 5? Magnitude of d Evaluation of Effect Size d = 0.2 Small effect d = 0.5 Medium effect d = 0.8 Large effect This is a small effect 26 Statistical Power Statistical power is the probability that a statistical test will correctly reject a false H0 Probability that a statistical test will identify a treatment effect if one really exists Power = 1 – β = 1 – probability of a Type II error 27 9 Statistical Power Calculate before performing the study Need to know / estimate How much the treatment changes the dependent variable Sample size α σ, µ 28 Statistical Power Example How much the treatment changes the dependent variable Researchers hypothesize that having proper nutrition during the first two years will increase IQ by 3 points (notice – 1 tailed) µ = 100, σ = 15 Sample size n = 25 α = .05 29 Distribution of Sample Means If the treatment has no effect, by the central limit theorem, the distribution of sample means will have: a mean = population mean = 100 a standard deviation = σ/√n = 15 / √25 = 3 If the treatment has the hypothesized effect, the distribution of sample means will have a mean = population mean + effect of treatment = 100 + 3 = 103 a standard deviation = σ/√n = 15 / √25 = 3 add a constant to all scores does not change the standard deviation 30 10 z Score of Critical Region This is a one-tailed test with α = .05 Consult a table to find the z with an area above equal to .05 z = 1.65 31 Statistical Power Example 91 94 97 100 103 0 1 106 109 112 115 z 1.65 2 32 Statistical Power Example Power equals area to right of the z score for the critical region under the treatment distribution of sample means Areas to the right of the z score for the critical region correspond to rejecting H0 Areas under the treatment distribution of sample means correspond to a false H0 Both combined correspond to rejecting a false H0 = power 33 11 Statistical Power Example Find the z score in the treatment distribution of sample means that is at the same location as the z score for the critical region in the no treatment distribution of sample means ztreatment = zcritical region – zmean of treatment zmean of treatment = (103 – 100) / 3 = 1 ztreatment = 1.65 – 1 = 0.65 Power = area above z = 0.65 Power = .26 Only about a 1 in 4 chance of observing this effect 34 Factors that Influence Power Sample size As sample size increases, power increases α level As α decreases (fewer Type I errors), β increases (more Type II errors), and 1 – β (power) decreases Number of tails (directional vs non-directional) One tailed tests have more statistical power than two tailed tests. Can you explain why? 35 12