Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inductive probability wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Omnibus test wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Worksheet on Statistical Tests Testing Proportions A genetic marker is expected to be present in approximately 1/3 of the population. A simple random sample of 70 individuals is tested, and 20 turn out to have the marker. Do a test for the Null Hypothesis p = 1: 3 H0 : p = 1 3 H1 : p ≠1 3 • Choose a level, and reject or not the hypothesis on the basis of your chosen level • Find the p-value for this test Solutions The sample mean for this sample is 0.28571 p q p 2 We assume p = 1, which implies a standard deviation of p(1 ¡ p) = 1 ¢ 2 = for the Bernoulli random 3 3 3 3 q p(1 ¡ p) variable and a “standard error” (that’s n ) of 0.05634 Assuming that we can use the normal approximation, the z-score for our observations turns out to be −0.8452 which corresponds to a p-value of 0.19901 That’s very high, which implies that we cannot reject the null hypothesis at any reasonable significance level. Indeed, the acceptance regions for the usual levels are 90 left 0.24066 90 right 0.42601 95 left 0.22290 95 right 0.44376 99 left 0.18820 99 right 0.47846 all of which include our sample mean p-value when working with tables The p-value above was calculated by a spreadsheet. If you are using tables, you will have to look up the probability corresponding to the z-score ¡0:845. This is rounded to three decimal places, and tables are rounded to two decimal places, but you could take the midpoint between 0.84 and 0.85. Using the negative score table, or subtracting the read for 0.84 and 0.85 in the positive score table from 1, we get 0.1991 which misses the computer value by .0001. “Real” Data BMI A table (taken from another textbook) lists values of the Body Mass Index for a sample of females. The sample is unlikely to have been a true simple random one, and some of the data are by definition extraneous since the BMI standard for teenagers and younger is different from the standard for adults 20 and over. After purging the table of these spurious entries, we have the following summaries: Sum Count Sum squares 874.1 33 24471.3 The BMI standard for adults is the following: BMI Weight Status Below 18.5 Underweight 18.5 – 24.9 Normal 25.0 – 29.9 Overweight 30.0 and Above Obese We want to test whether the mean female is overweight or not, according to this standard. 1. Null Hypothesis: The mean is not overweight H 0 : ¹ = 24:9 H 1 : ¹ > 24:9 2. Null Hypothesis: The mean is overweight H 0 : ¹ = 24:9 H 1 : ¹ < 24:9 Solutions The essential statistics for our sample are Mean Standard Error Standard Deviation Sample Variance 26.4879 1.11729 6.41837 41.1955 Recall: The “standard error” is psn, the term multiplying the critical t value in interval estimates and in computing critical values. x¹ ¡ ¹ We can compute the t-score, s ¢ p n, where µ is the mean indicated in the null hypothesis (in both our cases, 24.9). This turns out to be equal to 1.42118 corresponding to a p-value of 0.08247 Of course, since our t-score is positive, it cannot fall in the left tail, so if we are testing following test #2 we have no way of rejecting the null hypothesis (that the mean is overweight). If we are following test #1, where the critical region is the right tail, we are in the acceptance region at any level lower than 8.247%, like the usual 1%, 5%, or even 8%, that is, at these levels, we cannot reject the hypothesis that the mean of the population is not overweight. We would reject this hypothesis (hence, decide that there is enough evidence to suggest that the mean is overweight) if we tested at the 10% level p-value when working with tables The p-value above was calculated by a spreadsheet. If you are using tables, you will have a hard time coming up with a precise p-value, but you will be able to produce bounds. In our case, the t-score of 1.42118 lies (in the row for 32 degrees of freedom) between 1.694 (probability 0.05) and 1.309 (probability 0.1). Your answer would then be that the p-value lies between 0.05 and 0.1, which for acceptance/rejection purposes is often enough, as significance levels between these two are not as commonly used. Screws These are measurement made reportedly by the author on a box of screws. Assuming it is reasonable to treat this as a simple random sample for the manufacturer’s production (a shaky assumption), which is assumed to be normally distributed around the nominal value of .75 (not unreasonable), we can test if this is indeed the true mean: count Sums Sum of Squares H 0 : ¹ = :75 H 1 : ¹ ≠ :75 50 37.341 27.8944 Solution This is a two-tailed test, and the statistics for our sample are Mean 0.74682 Standard Error 0.00174 Standard Deviation 0.01232 Sample Variance 0.00015 The t-score resulting from these numbers (referring to an assumed true mean of 0.75) is 1.82491 which corresponds to a p-value of 0.03706 This is a two-tailed test, so that, at level α, each tail has an area (probability) of α/2: at 10%, the right tail has probability 5% (we reject the null hypothesis), at 8% it has a probability of 4% (we reject the null hypothesis), at 5% a probability of 2.5% (we cannot reject the null hypothesis). In other words, we will reject the null hypothesis (that the screws are of the nominal size) only at levels higher than about 7.4% p-value when working with tables The p-value above was calculated by a spreadsheet. If you are using tables, you will have a hard time coming up with a precise p-value, but you will be able to produce bounds. We should deal with 49 degrees of freedom, which are not usually listed explicitly. In our case, the t-score of 1.82491 lies approximately between 2.009 (probability 0.025) and 1.676 (probability 0.05) in the 50 d.o.f. row, and your probability bounds are the same if you look at the 45 and 40 d.o.f. row. Your answer would then be that the p-value lies between 0.025 and 0.05, which for acceptance/ rejection purposes is often enough, as significance levels between these two are not as commonly used. Freshmen weights (paired samples) The following data summarizes the weight of a group of freshmen in September and April. The two sets are clearly not independent, and the test that we can apply refers to the differences between the two measurements, which we will assume to be normally distributed (and, as usual, hopefully normal). count 67 sum weights September 4359 sum weights April 4438 sum of differences (equal to difference of the sums) 79 sum of squares of differences 1091 As usual, we can set up various tests, depending on our goals. For example, H 0 : d = 0; H 1 : d > 0 H 0 : d = 0; H 1 : d < 0 H 0 : d = 0; H 1 : d ≠ 0 Solutions The statistics for this data are Mean 0.45418 Standard Error 0.15388 Standard Deviation 1.25954 Sample Variance 1.58643 The t-score from these numbers is 2.95 which is high: we are going to reject the null hypothesis in case 1 (right tail is the critical region), or 3 (two-tailed test), since the p-value is .00219 That indicates that only if the right critical region is smaller than 0.22% (which is incompatible with any “normal” significance level) we would not reject the null hypothesis in the first and third test. The second test has a left tail as a critical region, so we would not reject the hypothesis that, essentially, says that the average difference of the weight is greater or equal to zero. p-value when working with tables The p-value above was calculated by a spreadsheet. If you are using tables, you will have a hard time coming up with a precise p-value, but you will be able to produce bounds. It is unlikely that any table will have a row for 66 degrees of freedom, but a quick inspection will show you that 2.95 will have to correspond to a p-value smaller than 0.005 for any number of degrees of freedom larger than 14, let alone for 66! Thus, your answer would be that the pvalue is lower than .005, which is usually more than enough for rejection/not rejection purposes Remark For practical purposes, we did not report the raw data, which consists of 67 pairs of data, but only the summaries, that is the summaries of the differences between each pair. This underlines the fact that “paired data” tests are simply run-of-the-mill t-tests, where the basic data are the differences between pairs of data points - but otherwise not any different from tests on simple samples. Traditional books (including our book) and software place this topic in the “two samples” chapter, but this is somewhat misleading, since, in fact, we are dealing with one sample (the collection of differences). We will quickly look at the protocol for true “two sample” tests - comparing two independent samples - and, although, in the end, this still results in a t-test (assuming normality), the statistic to be used is computed in a significantly different way.