Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 15 Nonparametric Methods(非参数统计) 1 Nonparametric Methods 15.1 The Sign Test: A Hypothesis Test about the Median(符号检验) 15.2 The Wilcoxon Rank Sum Test (Wilcoxon符号和检验) 1.1 Nonparametric Tests (非参数检验) A. One-Sample Mean Test Many tests are concern with testing some parameter under a certain distribution. Test H0 : 0 vs H1 : 0 under a normal population N( , 2 ) if 2 is known, the Z-test Z X 0 / n is recommended, where X is the sample mean and n is the sample size. 1.1 Nonparametric Tests B. Two-Sample Mean Tests Test H0 : 1 2 vs H1 : 1 2 under two respective normal populations N(1 , 12 ) and N(2 , 22 ) . If 1 2 are unknown a t-test is suggested. Comparing Means of Two Populations 1 2 are unknown In most cases the variances are unknown. t where ( X 1 X 2 ) ( μ1 μ 2 ) 1 1 s 2p n1 n2 ~ t n1 n2 2 (n1 1) s12 (n2 1) s 22 s pooled variance (n1 1) (n2 1) 2 p X 1 mean of the sample taken from population 1 s12 variance of the sample taken from population 1 n1 size of the sample taken from population 1 X 2 mean of the sample taken from population 2 s 22 variance of the sample taken from population 2 n2 size of the sample taken from population 2 1.1 Nonparametric Tests If the data are not normal distributed, the distribution of the t-statistic is unknown and depends the distribution of the populations. There are a huge amount of underlying distributions. Can we have some tests that are distribution free? The nonparametric test is one of such kinds of tests. Example 1.1 Delivery times A local pizza restaurant located close to a college campus advertises that their delivery time to a college dormitory is less than for a local branch of a national pizza chain. In order to determine whether this advertisement is valid, you and some friends have decided to order 10 pizzas from the local pizza restaurant and 10 pizzas from the national chain, all at different times. The delivery times in minutes PIZZATIME are shown. Example 1.1 Delivery times Testing for the difference in the mean delivery times Local 16.8 18.1 11.7 14.1 15.6 21.8 16.7 13.9 17.5 20.8 Chain 22.0 19.5 15.2 17.0 18.7 19.5 15.6 16.5 20.8 24.0 Example 1.1 Delivery times We can use t-test for this comparison if the delivery times are normal distributed. Since the distribution of delivery times is not normal distributed, we might have difficulty to use the t-test. Example 1.1 Delivery times We can consider the following way to compare these two restaurants Local 16.8 11.7 15.6 16.7 15.7 18.1 14.1 21.8 13.9 20.8 Chain 22.0 15.2 18.7 15.6 20.8 19.5 17.0 19.5 16.5 24.0 result + + + - + + + - + + 1.2 Sign Test (符号检验) If two restaurants have the same level of the delivery time, there is a half chance for “+” and another half for “-”. The number of “+”, denoted by T, follows the binomial distribution with p=0.5. The number of “-” also follows the binomial distribution with p=0.5. T=8 in this example. Review: Binomial Distribution A. Bernoulli trials A trial with only two outcomes (yes or no, success or fail, boy or girl, win or loss, 1 or 0) and related probabilities p and 1-p, is called a Bernoulli trial. B. Several Bernoulli trials Let X be the number of success in n independently identical Bernoulli trials . Random variable is said to follow a binomial distribution B(n;p). Review: Binomial Distribution C. Binomial probability distribution (二项概率分布) The probability of X=k is given by n! p k (1 p ) n k k!(n k )! P(X k) probabilit y of k successes given n and p P( X k ) where n number of observatio ns p probabilit y of success 1-p probabilit y of failure k number of successes in the sample (k 0 ,1, ,n) 1.2 Sign Test: Example 1.1 One tailed test H 0 : p 0.5 vs H1 : p 0.5 P(X 8) P(X 8) P(X 9) P(X 10) 0.055 p value P(X 8) 0.055 SPSS result: result Group 1 Group 2 Total B in om i al T es t Category 1.00 .00 N 8 2 10 Observed Prop. .80 .20 1.00 Test Prop. .50 Exact Sig. (2-tailed) .109 1.2 Sign Test: Example 1.1 Two tailed test H 0 : p 0.5 vs H1 : p 0.5 P(X 8) P(X 8) P(X 9) P(X 10) 0.055 p value 2P(X 8) 0.11 SPSS result: B in om i al T es t result Group 1 Group 2 Total Category 1.00 .00 N 8 2 10 Observed Prop. .80 .20 1.00 Test Prop. .50 Exact Sig. (2-tailed) .109 Example 1.2: Product Preference An Italian restaurant, close to a college campus, contemplated a new recipe for the sauce used on its pizza. A random sample of eight students was chosen, and each was asked to rate on a scale from 1 to 10 the tastes of the original sauce and the propose new one. The scores of the tests comparison are: Example 1.2: Product Preference We can’t use the t-test for this data as the score is not normal distributed. The statistic T, the number of “+”, follows B (7;0.5) as the score of case “G” is zero. This sample gives T=2 . 1.2 Sign Test: Example 1.2 One tailed test H 0 : p 0.5 H1 : p 0.5 There is no overall tendency to prefer one product to the other A majority prefer the new product (or fewer than 50% prefer the old product) p value P(X 2) P(X 0) P(X 1) P(X 2) 0.2266 SPSS result: VAR00005 Group 1 Group 2 Total B in om i al T es t Category .00 1.00 N 5 2 7 Observed Prop. .71 .29 1.00 Test Prop. .50 Exact Sig. (2-tailed) .453 1.2 Sign Test: Example 1.2 Two tailed test H 0 : p 0.5 vs H1 : p 0.5 p value 2P(X 2) 2 0.2266 0.4532 Also, note that p value P(X 2) P(X 5) 0.4532 SPSS result: B in om i al T es t VAR00005 Group 1 Group 2 Total Category .00 1.00 N 5 2 7 Observed Prop. .71 .29 1.00 Test Prop. .50 Exact Sig. (2-tailed) .453 Review: Binomial Distribution C. Properties of the binomial distribution The expectation of B(n;p) is np The variance of B(n;p) is The standard deviation of B(n;p) is np (1 p ) np(1 p) Review: Binomial Distribution D. Normal Approximation (Section 6.4 of the book) X ~ B(n; p) a np X np b np P ( a X b ) P np (1 p) np ( 1 p ) np ( 1 p ) a np b np P Z np (1 p) np ( 1 p ) b np a np - np (1 p) np (1 p) where () is the distribution function of N(0,1) Example 1.3 Customer Sales (Example 6.8, p. 213) A saleswoman makes initial telephone contact with potential customers in an effort to assess whether a follow-up visit to their homes is likely to be worthwhile. Her experience suggests that 40% of the initial contacts lead to follow-up visit. If she contacts 100 people by telephone, what is the probability that between 45 and 50 home visits will result? Solution to Example 1.3: Customer Sales Solution Let X be the number of follow-up visits. Then X has a binomial distribution with n=100 and p=0.40. Approximating the required probability gives 45 (100)(0.4) 50 (100 )( 0 . 4 ) P(45 X 50) P Z (100)(0.4)(0.6) (100 )( 0 . 4 )( 0 . 6 ) P(1.02 Z 2.04) (2.04) - (1.02) 0.9793 0.8461 0.1332 This probability is shown as an area under the standard normal curve below. Solution to Example 1.3: Customer Sales Number of Successes The continuity correction Since the binomial distribution is discrete and the normal distribution is continuous, it is common practice to use continuity correction in the approximation: b 0.5 np a - 0.5 np - P (a X b ) np (1 p) np (1 p) Return to Example 1.3 50 0.5 (100)(0.4) 45 - 0.5 (100)(0.4) P(45 X 50) (100)(0.4)(0.6) (100)(0.4)(0.6) (2.14) - (0.92) 0.9795 0.8208 0.1587 1.2 Sign test: normal approximation np 0.5n, np (1 p) 0.5 n The approximation test-statistic T* T* 0.5n z 0.5 n where T * corrected for continuity defined as follows: a. For a two-tail test T 0.5, if T T* T 0.5, if T b. For an upper tail test T * T 0.5 c. For an lower tail test T* T 0.5 Example 1.4 Ice Cream Solution: Use the normal approximation equations: Example 1.4 Ice Cream Z T* 40.5 48 1.53 since 40 48, T* 40.5 4.899 p value 2 0.0630 0.126 The SPSS output: B in om i al T es t VAR00002 Group 1 Group 2 Total Category 56.00 40.00 a. Based on Z Approximation. N 56 40 96 Observed Prop. .58 .42 1.00 Test Prop. .50 Asymp. Sig. (2-tailed) .125a 1.3 Sign test for single population median Example 1.5 The dean of the School of Business Administration at a particular university would like information about the starting incomes of recent college graduates. A random sample of 23 recent graduates indicated the following starting salaries: 29250 29900 28070 31400 31100 29000 33000 50000 28500 31000 34800 42100 33200 36000 65800 34000 29900 32000 31500 29900 32890 36000 35000 Do the data indicate that the median starting income differs from $35000? Solution: H0 : Median $35000 VS H1 : Median $35000 Solution to Example 1.5 Since the distribution of incomes is often skewed, the sign test is recommended. There is a half chance that the income is greater than $35,000 if the hypothesis is true. Let T be the number of the income > $35,000. N=23-1=22 as one data=$35,000. T=17 np 0.5n 0.5 22 11 0.5 22 2.345 T 0.5 11 Z 2.35 2.345 p value 2 0.0094 0.0188 SPSS output to Example 1.5 B in om i al T es t VAR00001 Group 1 Group 2 Total Category <= 35000 > 35000 N 17 5 22 Observed Prop. .77 .23 1.00 Test Prop. .50 Exact Sig. (2-tailed) .017 1.4 Wilcoxon Rank Sum Test Two population identical test Take a sample of size n1 from the first population, F1 (x) and a sample of size n 2 from the second population, F2 ( x ) We Want to test H0 : F1 F2 vs H1 : F1 F2 1.4 Wilcoxon Rank Sum Test The sign test does not use all the information from the data set. The sign test for the delivery time in Example 1.1 ignores the time length. The Wilcoxon rank sum test provides a method to incorporate information about the magnitude of the differences between two populations. 1.4 Wilcoxon Rank Sum Test Two samples are pooled and sorted them in ascending order. Let T denote the sum of the ranks of the observations from the first population. Wilcoxon Rank Sum Test: Example 1.1 Sort the Local data 11.7, 13.9, 14.1, 15.6, 16.7, 16.8, 17.5, 18.1, 20,8, 21.8 Sort the Chain data 15.2, 15.6, 16.5, 17.0, 18.7, 19.5, 19.5, 20,8, 22.0, 24.0 Sort the mixed data Rank Local 1 2 3 11.7 13.9 14.1 Chain Rank Local Chain 4 12 13 6 7 15.6 15.2 11 5 14 17.5 18.1 16.5 16 17 20.8 18.7 19.5 19.5 9 10 16.7 16.8 15.6 15 8 20.8 17.0 18 19 20 21.8 22.0 24.0 Wilcoxon Rank Sum Test: Example 1.1 Sum of the rank Tlocal 1 2 3 5.5 8 9 11 12 16.5 18 86 Test-statistic T Tlocal 86 Normal approximation 10(10 10 1) E (T ) 105, 2 86 105 Z -1.4363, 175 Var (T) 10 10(10 10 1) 175 12 p value 2 0.0755 0.151 SPSS output to Example 1.1 R an ks time group local chain Total N 10 10 20 Mean Rank 8.60 12.40 Sum of Ranks 86.00 124.00 T es t S ta ti s ti c sb time 31.000 86.000 -1.438 .150 Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) a Exact Sig. .165 [2*(1-tailed Sig.)] a. Not corrected for ties. b. Grouping Variable: group Example 1.6 Example 1.6 Solution: 80(80 80 1) E (T ) 6440 2 80 80(80 80 1) Var (T) 85867 12 7287 6440 Z 2.89 85867 p value 2 0.0019 0.0038