Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Confidence interval wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Omnibus test wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Experimental Statistics - week 2 Review Continued • Sampling Distributions – Chi-square –F • Statistical Inference – Confidence Intervals – Hypothesis Tests 1 Chi-Square Distribution (distribution of the sample variance) IF: • Data are Normally Distributed • Observations are Independent Then: ( n 1) S 2 2 n ( X i X )2 i 1 2 has a Chi-Square distribution with n - 1 degrees of freedom 2 Chi-square Distribution, Figure 7.10, page 357 3 4 5 F-Distribution IF: • S12 and S22 are sample variances from 2 samples • samples independent • populations are both normal Then: S12 / 12 S22 / 22 has an F-distribution with n1 and n2 df 6 F-distribution, Figure 7.10, page 357 7 8 9 (1-a)x100% Confidence Intervals for m Setting: • Data are Normally Distributed • Observations are Independent Case 1: known X za / 2 n m X za / 2 Case 2: unknown X ta / 2 n m X ta / 2 n n (df n 1 ) 10 CI Example An insurance company is concerned about the number and magnitude of hail damage claims it received this year. A random sample 20 of the thousands of claims it received this year resulted an average claim amount of $6,500 and a standard deviation of $1,500. What is a 95% confidence interval on the mean claim damage amount? Suppose that company actuaries believe the company does not need to increase insurance rates for hail damage if the mean claim damage amount is no greater than $7,000. Use the above information to make a recommendation regarding whether rates should be raised. 11 Interpretation of 95% Confidence Interval 100 different 95% CI plotted in the case for which true mean is 80 i.e. about 95% of these confidence intervals should “cover” the true mean 12 Concern has been mounting that SAT scores are falling. • 3 years ago -- National AVG = 955 • Random Sample of 200 graduating high school students this year (sample average = 935) (each the standard deviation is about 100) Question: Have SAT scores dropped ? Procedure: Determine how “extreme” or “rare” our sample AVG of 935 is if population AVG really is 955. We must decide: • The sample came from population with population AVG = 955 and just by chance the sample AVG is “small.” OR • We are not willing to believe that the pop. AVG this year is really 955. (Conclude SAT scores have fallen.) Hypothesis Testing Terminology Statistical Hypothesis - statement about the parameters of one or more populations Null Hypothesis ( H 0 ) - hypothesis to be “tested” (standard, traditional, claimed, etc.) - hypothesis of no change, effect, or difference (usually what the investigator wants to disprove) Alternative Hypothesis ( H a ) - null is not correct (usually what the hypothesis the investigator suspects or wants to show) 15 Basic Hypothesis Testing Question: Do the Data provide sufficient evidence to refute the Null Hypothesis? 16 Hypothesis Testing (cont.) Critical Region (Rejection Region) - region of test statistic that leads to rejection of null (i.e. t > c, etc.) Critical Value - endpoint of critical region Significance Level - probability that the test statistic will be in the critical region if null is true - probability of rejecting when it is true 17 Types of Hypotheses One-Sided Tests H 0 : m m0 H 0 : m m0 H a : m m0 H a : m m0 Two-sided Tests H 0 : m m0 H a : m m0 18 Rejection Regions for One- and Two-Sided Alternatives H 0 : m m0 vs. H a : m m0 Reject H 0 if t ta H 0 : m m0 vs. H a : m m0 a -ta Critical Value Reject H 0 if t ta H 0 : m m0 vs. H a : m m0 Reject H 0 if |t | ta / 2 19 A Standard Hypothesis Test Write-up 1. State the null and alternative 2. Give significance level, test statistic,and the rejection region 3. Show calculations 4. State the conclusion - statistical decision - give conclusion in language of the problem 20 Hypothesis Testing Example 1 A solar cell requires a special crystal. If properly manufactured, the mean weight of these crystals is .4g. Suppose that 25 crystals are selected at random from from a batch of crystals and it is calculated that for these crystals, the average is .41g with a standard deviation of .02g. At the a = .01 level of significance, can we conclude that the batch is bad? 21 Hypothesis Testing Example 2 A box of detergent is designed to weigh on the average 3.25 lbs per box. A random sample of 18 boxes taken from the production line on a single day has a sample average of 3.238 lbs and a standard deviation of 0.037 lbs. Test whether the boxes seem to be underfilled. 22 Errors in Hypothesis Testing Actual Situation Null is True Do Not Reject Ho Conclusion Reject Ho Null is False Correct Decision (1-a) Type II Error Type I Error Correct Decision (a) (Power) (b) (1-b) 23 H 0 : m m0 vs. H a : m m0 Reject H 0 if t ta Note: “Large negative values” of t make us believe alternative is true p-Value the probability of an observation as extreme or more extreme than the one observed when the null is true Suppose t = - 2.39 is observed from data for test above p-value -2.39 (observed value of t) 24 Note: -- if p-value is less than or equal to a, then we reject null at the a significance level -- the p-value is the smallest level of significance at which the null hypothesis would be rejected 25 Find the p-values for Examples 1 and 2 26 Two Independent Samples • Assumptions: Measurements from Each Population are – Mutually Independent Independent within Each Sample Independent Between Samples – Normally Distributed (or the Central Limit Theorem can be Invoked) • Analysis Differs Based on Whether the Two Populations Have the Same Standard Deviation 27 Two Types of Independent Samples • Population Standard Deviations Equal – Can Obtain a Better Estimate of the Common Standard Deviation by Combining or “Pooling” Individual Estimates • Population Standard Deviations Different – Must Estimate Each Standard Deviation – Very Good Approximate Tests are Available If Unsure, Do Not Assume Equal Standard Deviations 28 Equal Population Standard Deviations Test Statistic (y1 y2 ) (μ1 μ2 ) t= 1 1 sp n1 n2 where 2 2 ( n 1 ) s + ( n 1 ) s 1 2 2 s 2p= 1 n1+n2 2 s p= s 2p df = n1 + n2 - 2 29 Behrens-Fisher Problem If 1 2 y1 y2 ( m1 m 2 ) s12 n1 s22 ~t n2 30 Satterthwaite’s Approximate t Statistic If 1 2 y1 y2 ( m1 m 2 ) s12 n1 s22 t (i.e. approximate t) n2 ( a b) 2 s12 s22 df = , a , b 2 2 a b n1 n2 n1 1 n2 1 (Approximate t df) 31 Often-Recommended Strategy for Tests on Means Test Whether 1 = 2 (F-test ) – If the test is not rejected, use the 2-sample t statistics, assuming equal standard deviations – If the test is rejected, use Satterthwaite’s approximate t statistic NOTE: This is Not a Wise Strategy – the F-test is highly susceptible to non-normality Recommended Strategy: – If uncertain about whether the standard deviations are equal, use Satterthwaite’s approximate t statistic 32 Example 3: Comparing the Mean Breaking Strengths of 2 Plastics Question: Is there a difference between the 2 plastics in terms of mean breaking strength? Plastic A: nA=35 , y A=28.3 , s A=3.3 Plastic B: nA=40 , y A=26.7 , s A=4.9 Assumptions: Mutually independent measurements Normal distributions for measurements from each type of plastic Equal population standard deviations 33 New diet -- Is it effective? Design: 50 people: randomly assign 25 to go on diet and 25 to eat normally for next month. Assess results by comparing weights at end of 1 month. Diet: No Diet: XD X ND SD S ND Run 2-sample t-test using guidelines we have discussed. Is this a good design? 34 Better Design: Randomly select subjects and measure them before and after 1-month on the diet. Subject Before After Difference 1 2 : 150 210 : 147 195 : 3 15 : n 187 190 -3 Procedure: Calculate differences, and analyze differences using a 1-sample test “Paired t-Test” 35 Example 4: International Gymnastics Judging Question: Do judges from a contestant’s country rate their own contestant higher than do foreign judges? Data: Contestant Native Judge Foreign Judges 1 2 3 4 5 6 7 8 9 10 11 12 6.8 4.5 8.0 7.2 8.7 4.5 6.6 5.8 6.0 8.8 8.7 4.4 6.7 4.3 8.1 7.2 8.3 4.6 5.4 5.9 6.1 9.1 8.7 4.3 i.e. test H 0 : m N m F H a : mN mF 36