Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sufficient statistic wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Taylor's law wikipedia , lookup
Gibbs sampling wikipedia , lookup
Sampling Distribution of the Mean Central Limit Theorem Given population with distribution will have: A mean A variance Standard Error (mean) and ( x ) 2 the sampling ( x2 ) ( x ) 2 N N As N increases, the shape of the distribution becomes normal (whatever the shape of the population) Testing Hypothesis Known and Remember: We could test a hypothesis concerning a population and a single score by z Obtain x p(z ) and use z table We will continue the same logic Given: Behavior Problem Score of 10 years olds 50, 10 ( N 5) Sample of 10 year olds under stress H 0 50 x 56 H1 50 Because we know and , we can use the Central Limit Theorem to obtain the Sampling Distribution when H0 is true. Sampling Distribution will have x 50 , x2 x ( s tan dard error) 2 N 10 2 5 20 4.47 N We can find areas under the distribution by referring to Z table We need to know p( x ) 56 Minor change from z score z x NOW z x x or z x N With our data z 56 50 6 1.34 4.47 4.47 Changes in formula because we are dealing with distribution of means NOT individual scores. From Z table we find p( z ) 1.34 is 0.0901 Because we want a two-tailed test we double 0.0901 (2)0.0901 = 0.1802 0.1802 0.05 NOT REJECT H0 or is 0.0901 0.025 One-Sample t test Pop’n 2 with = known & 2 unknown we must estimate S2 Because we use S, we can no longer declare the answer to be a Z, now it is a t Why? Sampling Distribution of t 2 - S2 is unbiased estimator of - The problem is the shape of the S2 distribution positively skewed thus: S2 is more likely to UNDERESTIMATE 2 (especially with small N) thus: t is likely to be larger than Z (S2 is in denominator) t - statistic z x x x N and substitute S2 for x 2 N 2 x x x t S Sx S2 N N To treat t as a Z would give us too many significant results Guinness Brewing Company (student) Student’s t distribution we switch to the t Table when we use S2 Go to Table Unlike Z, distribution is a function of df with N , t z Degrees of Freedom For one-sample cases, df N 1 1 df lost because we used x (sample mean) to calculate S2 ( x x ) 0, all x can vary save for 1 2 Example: One-Sample Unknown Effect of statistic tutorials: Last 100 years: 76.0 this years: x 79.3 (no tutorials) (tutorials) N = 20, S = 6.4 H 0 : 76 t H 1 : 76 x Sample Mean - Pop' n mean Sx standard error x s N 79 .3 76 6 .4 20 3.3 1.43 2.31 Go to t-Table t-Table - not area (p) above or below value of t - gives t values that cut off critical areas, e.g., 0.05 t also defined for each df N=20 df = (N-1) = 20-1 = 19 Go to Table t.05(19) is 2.093 critical value 2.31 2.093 reject H 0 Factors Affecting Magnitude of t & Decision 1. 2. Difference between x and ( x ) the larger the numerator, the larger the t value Size of S2 as S2 decreases, t increases 3. Size of N as N increases, denominator decreases, t increases 4. 5. One-, or two-tailed test level Confidence Limits on Mean Point estimate Specific value taken as estimator of a parameter Interval estimates A range of values estimated to include parameter Confidence limits Range of values that has a specific (p) of bracketing the parameter. End Points = confidence limits. How large or small could be without rejecting H 0 if we ran a t-test on the obtained sample mean. Confidence Limits (C.I.) x x t S Sx N We already know x , S and N We know critical value for t at .05 t.05 (19) 2.093 We solve for 2.093 79 .3 79 .3 6.4 1.43 20 Rearranging 2.093 (1.43) 79.3 2.993 79.3 Using +2.993 and -2.993 upper 2.993 79.3 82.29 lower 2.993 79 .3 76 .31 C.I .95 76.31 82.29 Two Related Samples t Related Samples Design in which the same subject is observed under more than one condition (repeated measures, matched samples) Each subject will have 2 measures x1 and x 2 that will be correlated. This must be taken into account. Promoting social skills in adolescents Before and after intervention H 0 : 1 2 or 1 2 0 before after Difference Scores Set of scores representing the difference between the subject’s performance or two occasions ( x1 and x2 ) x1 x2 Difference( D) 18 5 19 13 12 4 17 6 12 17 26 3 x S 1 20 15 18 10 8 15 13.333 6.914 11 8 12 27 3 3 14 12 14 11 10 9 11.133 5.998 our data can be the D column H 0 : D 0 from 1 1 2 2 4 5 1 0 2 6 3 4 1 2 6 2.200 2.933 2 0 H0 we are testing a hypothesis using ONE sample Related Samples t x t Sx remember now D D 0 D 0 t SD SD N D N N = # of D scores Degrees of Freedom same as for one-sample case df our data = (N - 1) = (15 - 1) = 14 t 2.20 0 2.20 2.91 2.933 0.757 15 Go to table t.05 (14) 2.145 t 2.91 reject H 0 Advantages of Related Samples 1. Avoids problems that come with subject to subject variability. The difference between(x1) 26 and (x2) 24 is the same as between (x1) 6 and (x2) 4 (increases power) (less variance, lower denominator, greater t) 2. Control of extraneous variables 3. Requires fewer subjects Disadvantages 1. Order effects 2. Carry-over effects Two Independent Samples t H 0 : 1 2 0 Sampling distribution of differences between means 2 pop’ns Suppose: x1 1 2 1 draw pairs of samples: and and and x2 2 2 2 sizes N1, and N2 x1 and x2 and the differences record means x between x , and x ( x x )for each pair of 1 1 2 2 samples 1 repeat times x1 x2 x1 x2 Mean Difference x11 x21 x22 x11 x21 x12 x1 Mean Variance Standard Error 1 2 1 N1 1 N1 x2 2 N2 2 N2 2 2 x11 x21 x11 x21 1 2 12 N1 12 N1 22 N2 22 N2 Variance Sum Law Variance of a sum or difference of two INDEPENDENT variables = sum of their variances The distribution of the differences is also normal t Difference Between Means z ( x1 x2 ) ( 1 2 ) x x 1 2 ( x1 x2 ) ( 1 2 ) 12 N1 We must estimate 2 with 22 N2 s2 ( x1 x2 ) ( 1 2 ) t s x1 x2 Because H 0 : 1 2 0 ( x1 x2 ) t S x1 x2 or t ( x1 x2 ) 2 1 2 2 s s N1 N2 t ( x1 x2 ) 2 1 2 2 s s n1 n2 When is O.K. only when the N’s are the same size n1 n2 we need a better estimate of 2 We must assume homogeneity of variance ( 2 1 2 Rather than using s1 or we use their average. Because 22 ) s22 to estimate 2, n1 n2 we need a Weighted Average weighted by their degrees of freedom (n1 1) s (n2 1) s s n1 n2 2 2 p 2 1 2 2 ( s 2p ) Pooled Variance Now x1 x2 t s x1 x2 x1 x2 2 1 2 2 s s n1 n2 x1 x2 1 1 s n1 n2 2 p 1 1 n1 n2 come from formula for Standard Error s Degrees of Freedom 2 p two means have been used to calculate df (n1 1) (n2 1) n1 n2 2 df Example: x s2 Group 1 Group 2 17 13 17 18 21 17 18 13 22 14 18 13 16 18 15 19 18 16 20 14 21 13 16 15 15 14 16 16 20 15 15 13 17 17 15 18.00 15.25 5.286 3.671 Example: 18.00 – 15.25 We have numerator We need denominator Pooled Variance because ??????? n1 n2 2 2 ( n 1 ) s ( n 1 ) s 1 2 2 s 2p 1 n1 n2 2 14(5.286) 19(3.671) 15 20 2 74.004 69.749 4.356 33 Denominator becomes 1 1 4.356 15 20 4.356 4.356 15 20 = t x1 x2 1 1 s n1 n2 2 p (18.00 15.25) 4.356 4.356 15 20 2.75 0.5082 2.75 3.86 0.713 df (15 20 2) 33 Go to Table t.05 (33) 2.04 t 3.86 2.04 reject H 0 Summary If and 2 are known, then treat x s in Z score formula; x replaces z as x x n If is known and in sx 2 is unknown, then sD replaces x t s n If two related samples, then D replaces and s replaces sx x D 0 t sD n If two independent samples, and Ns are of equal size, then sD t is replaced by s1 s2 n1 n2 x1 x2 2 1 2 2 s s n n If two independent samples, and Ns are NOT equal, s2 then 1 and t s22 are replaced by s 2p x1 x2 1 1 s n1 n2 2 p