Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
5/6/12 Empirical Loop Descriptive Statistics Collect Data Z-scores: apples vs. oranges Research Design 1.8 1.6 Apple sizes 1.4 Orange sizes 1.2 1 0.8 Probability 0.6 0.4 0.2 0 Inferential Statistics Hypothesis -4 -3 -2 0.4 Mathematically: z = 0.25 0.2 1 SD X-µ σ 4 Mathematically: 0.3 z = -1.96 1 SD Result: the orange is bigger! 0.25 0.2 95% of data 0.15 points fall here 0.1 0.15 (Well, it’s bigger relative to other oranges than the apple is relative to other apples.) 68%0.1 of data points fall 0.05 here z = +1.96 z = X-µ σ 0.05 New York 0 -1 3 0.35 0.3 -2 2 Standard0.45 normal curve 0.35 -3 1 generally speaking Standard0.45 normal curve -4 0 Z-scores: apples vs. oranges Really rare -0.2 Fruit diameter in decimeters Z-scores: 0.4 -1 0 0 1 2 Fruit diameter in z-scores 3 4 -4 -3 -2 Really unlikely -1 0 1 2 3 4 Z-scores 1 5/6/12 So you can calculate a z-score for individual data points. Mathematically: z = Z-distribution of samples Sampling distribution the mean Standard normal of curve 0.45 X-µ σ 0.4 Mathematically: 0.35 Mean-centered score divided by standard deviation z = 0.3 You can also calculate a zdistribution for a sample, but it means something different. 0.25 Mathematically: _ X-µ- z = - X σX Mean-centered sample average divided by standard error X-µX- σ- X 0.2 95% of data Really unlikely to occur points fall for this distribution here 0.15 IMPORTANT: We can do this because know what the mean and standard error should be for 100 tosses of a fair coin. 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 Z-score: # of heads in samples of 100 coin tosses Z-test Z-value for z-test: z = X-µX- σ- X When you might use a z-test Standard error of estimate: • σ _ σX- = √Ν • • For the same population distribution, a larger sample size results in a smaller standard error. (The more observations, the more accurate your estimate is.) I’m a farmer. I have developed a new breed of Granny Smith apples, the Great Granny. I want to be able to say that my apples are notably bigger than regular Granny Smiths. You want to know whether UCSD undergrads’ GRE scores are higher than the national average. You have to know sigma (σ)! • • • • • Agricultural data: probably Standardized tests: definitely Reaction times in a lexical decision experiment? … Spatial frames of reference in residents of Papua New Guinea? … BOLD activation when looking at faces, houses, robots 2 5/6/12 area from 0 to z area from z to ∞ What would B be when z=0? What would C be when z=0? What if you don’t know sigma? • • • • Most of the time. Statistics to the rescue! If you don’t know sigma, you can estimate it from your own sample. • You have to correct for it, of course. (df) There’s a sampling distribution like the z distribution, but for unknown population σ: the t-distribution! • • The t distribution • Unlike z, there is a different t- distribution for each sample size. df df df For extremely large samples, it is the z-distribution. Usually, we don’t have samples big enough to get to the z-distribution, so we use t. Use just like z-distribution, but it has heavier tails. More on this next week. 3 5/6/12 Empirical Loop Hypothesis testing Standard0.45 normal curve Way. 0.4 Descriptive Statistics Collect Data Research Design 0.35 No way. 0.3 0.25 Probability 0.2 0.15 0.1 0.05 0 -3 -2 -1 distribution 0 distribution 1 3 4 Some Some probability probability of 2 the mean Blue line=your “baseline” Z-scores µ & σ Green=your sample -4 Inferential Statistics Hypothesis Binomial Distribution How many outcomes of “heads” will we get from N flips of a coin with weighting p? Introduction to Hypothesis Testing: The Binomial Test 17 Let’s try it ourselves… 18 4 5/6/12 Hypothesis Testing: Binomial Distribution Try it yourself at: http://www.adsciengineering.com/bpdcalc/index.php 19 Hypothesis Testing, translated: Neyman-Pearson paradigm for hypothesis testing: 1.Null hypothesis H0: What would be the situation if there’s no difference? 2. Alternative hypothesis H1: What would be the situation if there is a difference? 3. Define what numeric outcome would convince you that there is a difference 4. Collect observations (data) 5. If data are highly unlikely given the no-difference scenario, reject the null hypothesis (Yay! Usually.) 6. Otherwise, “retain” the null hypothesis 21 Neyman-Pearson paradigm for hypothesis testing: 1. Assume a probabilistic model for the data (the “null hypothesis”, H0) 2. Define an “alternative hypothesis” H1 (this can be very vague) 3. Define a “decision rule” which specifies which future observations will lead you to reject the null hypothesis 4. Collect observations (data) 5. If data highly unlikely in way that favors your alternative hypothesis, then reject the null hypothesis 6. Otherwise, “retain” the null hypothesis 20 Hypothesis Testing: NOTE: Retaining the null hypothesis H0 is NOT the same as proving that H0 is true. It simply means that we didn’t have enough evidence to reject it (e.g., we might have, given more data) This is analogous to when a jury declares someone “not guilty.” It does not mean that the person is innocent, only that there is not enough evidence to show that she/he is guilty. 22 5 5/6/12 Hypothesis Testing: Hypothesis testing Possible outcomes: Criterion Your decision True state of the world H0 is CORRECT H0 is INCORRECT Accept H0 Reject H0 Correct decision (1-α) Type I Error (α) Type II Error (β) Correct decision (1-β) Type I Error: We reject the null hypothesis even though it’s true Type I1 Error Type 1 Way (β) No way Type II Error: We don’t reject the null hypothesis even though it is NOT true 23 No way Some probability distribution Null hypothesis (H0) µ & σ Z-scores Hypothesis testing Green=your sample Hypothesis testing Way No way Error (α) Way No way No way Some probability distribution Null hypothesis (H0) µ & σ Z-scores Green=your sample No way Some probability distribution Null hypothesis (H0) µ & σ Z-scores Green=your sample 6 5/6/12 Hypothesis testing Hypothesis testing Way Way No way No way No way No way Some probability distribution Null hypothesis (H0) µ & σ Z-scores Green=your sample Some probability distribution Null hypothesis (H0) µ & σ Z-scores Hypothesis Testing: Hypothesis testing Criterion Way Type I1 Error (β) No way Type 1 Error (α) No way Some probability distribution Green=your sample Null hypothesis (H0) µ & σ Z-scores Green=your sample Your decision True state of the world H0 is CORRECT H0 is INCORRECT Accept H0 Reject H0 Correct decision (1-α) Type I Error (α) Type II Error (β) Correct decision (1-β) Type I Error: We reject the null hypothesis even though it’s true (“False positive”) Alpha (α): P(Type I Error) p-Value: The smallest α you could have used and rejected the null hypothesis given your data P is for probability 30 7 5/6/12 Hypothesis Testing: Your decision Accept H0 True state of the world H0 is CORRECT H0 is INCORRECT Correct decision (1-α) Type II Error (β) Reject H0 Type I Error (α) Hypothesis testing Criterion Correct decision (1-β) Type II Error: We retain the null hypothesis even though it’s false (“False negative”) Beta (β): P(Type II Error) Power: 1-β β We often don’t know what beta is because our alternative hypotheses are too vague. 31 1-β (power) 1-α α Some probability distribution Null hypothesis (H0) µ & σ Z-scores Green=your sample Hypothesis Testing: Trade off between Type I & Type II Error: The smaller your α, the larger your β Question: Is the simulated coin toss at an online casino biased? [You can achieve more power by accepting a greater chance of making a false positive] Effect of Sample Size on Type I and Type II Error: The bigger your sample size the more power you can achieve w/ fixed α 33 35 8 5/6/12 A Test: 1. Flip the coin twice 2. If the coin comes up heads both times, we decide it’s a biased magic store coin. Assume a Fair Coin: Decide “Fair” Probability Decide “Biased” α = P(Type I error)=.25 # of Heads 36 37 A Better Test: 1. Flip the coin four times 2. If the coin comes up heads all four times, we decide it’s a biased coin. Assume a Fair Coin: Decide “Fair” Probability α = P(Type I error)=.06 38 Decide “Cheat” # of Heads 39 9 5/6/12 Assume an unfair coin: Assessing power: Suppose P(Heads=0.6) We can only assess the actual power of a statistical test by imagining what the world might be like (other than like H0) 40 Hypothesis Testing: Effect of Increasing Sample Size: By increasing your sample size you can decrease beta without reducing alpha. Decide “Fair” Probability α = P(Type I error) = .06 β = P(Type II error) = .87 Power = 1-β = .13 Decide “Cheat” # of Heads 41 A Less Conservative Decision Rule: Decide “Fair” Probability Decide “Cheat” α = P(Type I error)=.31 # of Heads 42 43 10 5/6/12 Empirical Loop Hypothesis Testing: Descriptive Statistics Trade off between Type I and Type II Error:You can adjust your decision rule such that it decreases alpha but it will increase beta (and vice-versa). Collect Data Probability Inferential Statistics 44 Inferential Statistics Hypothesis Testing binomial test z-test Research Design Hypothesis 45 Lecture Outline • z-test Review & Cohen’s d • How many samples do we need? • How good is my estimate of the mean? • What do we do when we don’t know the standard deviation of the null hypothesis? 11 5/6/12 Is there a reason why some people perceive clockwise vs. counterclockwise rotation? http://www.news.com.au/perthnow/story/0,21598,22492511-5005375,00.html" Hypothesis Testing Form (n=56) Null Hyp. (H0) Data come from a normal distribution with μ=0.5, σ=0.5 Alt. Hyp. (H1) μ≠0.5 Tail of Test two-tailed Type of Test z-test Alpha Level α=0.05 Critical Value(s) mean S to N>.631 or mean S to N<.369 Observed Value 37/56=.661 S to N Decision Reject H0 p-value p=.0164 12 5/6/12 Cohen’s d: A unitless measure of effect size d= ! x "µ # µ = mean of H 0 " = standard deviation of H 0 x = sample mean (i.e., estimate of population mean) 0.20 small 0.50 medium 0.80 large see pg. 299 ! 13 5/6/12 Reporting our results: We found that participants were significantly more likely to perceive the dancer as spinning south to north (z(n=56)=2.40, p=.0164, d=.322). Central Limit Theorem For large n (e.g., 25-100), the sum (or mean) of n independent samples of random variable X is approximately normally distributed. 1 Flip Do UCSD students do better on the SAT than average? The mean and standard deviation of all SAT scores in the USA for 2007 is 1050 and 70 respectively. To find out if UCSD students do better than average on the SAT, I randomly select 49 students and collect their SAT scores. The mean of those 49 scores is 1090. Can I be 95% sure that UCSD students really are better? 14 5/6/12 z-Test Useful for testing hypotheses when: • you know the mean and the standard deviation of the null hypothesis • your data are normally distributed OR you have a sufficiently large sample size (e.g., 25-100) z-Test 1. Decide if you need to do a two-tailed, upper tailed, or lower tailed test. 2. Compute the mean of your data, X. 3. Compute the standard error of the mean of the distribution of the null hypothesis. 4. Convert X into a z-score. 5. If X exceeds your critical z-score, then reject the null hypothesis. 15 5/6/12 Cohen’s d A unitless measure of effect size: • magnitude of d doesn’t depend on sample size (unlike p-values) • useful for getting a sense of how big an effect is, whereas p-values give you a sense of how reliable an effect is Each of the following statements could inspire a hypothesis test. For each statement, would you use a two-tailed, upper-tailed, or lower-tailed. State H0 and H1. a) To increase rainfall, extensive cloud-seeding experiments are to be conducted and the results are to be compared with a baseline figure of 0.54 inches (SD=.11) of rainfall (the amount of rain when cloud seeding wasn’t done). b) Public health statistics indicate that American males gain an average of 23 pounds (SD=10) during the 20 year period after age 40. An ambitious weight-loss program, spanning 20 years, is being tested with a random sample of 40 year old men. c) A basketball coach wonders if listening to CDs of positive comments during sleep will affect a player’s performance. On the one hand, it may boost self-confidence and subsequently boost performance. On the other hand it may disturb their sleep and hinder their performance. 16