* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download slides - Courses
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Psychometrics wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Major Points • Formal Tests of Mean Differences • Review of Concepts: Means, Standard Deviations, Standard Errors, Type I errors • New Concepts: One and Two Tailed Tests • Significance of Differences Important Concepts • Concepts critical to hypothesis testing – Decision – Type I error – Type II error – Critical values – One- and two-tailed tests Decisions • When we test a hypothesis we draw a conclusion; either correct or incorrect. – Type I error • Reject the null hypothesis when it is actually correct. – Type II error • Retain the null hypothesis when it is actually false. Possible Scenarios Results show TRUE FALSE Null Hypothesis is actually TRUE FALSE Correct Acceptance Type II TypeI Correct Rejection Type I Errors • Assume there are no differences (null hypothesis is true) • Assume our results show that they are not same (we reject null hypothesis) • This is a Type I error – Probability set at alpha () • usually at .05 – Therefore, probability of Type I error = .05 Type II Errors • Assume there are differences (alternative hypothesis is true) • Assume that we conclude they are the same (we accept null hypothesis) • This is also an error – Probability denoted beta () • We can’t set beta easily. • We’ll talk about this issue later. • Power = (1 - ) = probability of correctly rejecting false null hypothesis. Critical Values • These represent the point at which we decide to reject null hypothesis. • e.g. We might decide to reject null when (p|null) < .05. – Our test statistic has some value with p = .05 – We reject when we exceed that value. – That value is the critical value. One- and Two-Tailed Tests • Two-tailed test rejects null when obtained value too extreme in either direction – Decide on this before collecting data. • One-tailed test rejects null if obtained value is too low (or too high) – We only set aside one direction for rejection. One- & Two-Tailed Example • One-tailed test – Reject null if number of red in Halloween candies is higher • Two-tailed test – Reject null if number of red in Halloween candies is different (whether higher or lower) Within subjects t tests • • • • Related samples Difference scores t tests on difference scores Advantages and disadvantages Related Samples • The same participant / thing give us data on two measures – e. g. Before and After treatment – Usability problems before training on PP and after training – Darts and Pros during same time period • With related samples, someone high on one measure probably high on other(individual variability). Cont. Related Samples--cont. • Correlation between before and after scores – Causes a change in the statistic we can use • Sometimes called matched samples or repeated measures Difference Scores • Calculate difference between first and second score – e. g. Difference = Before - After • Base subsequent analysis on difference scores – Ignoring Before and After data Difference between Darts and Pros TIME TIMENO 1January-June 1990 1.00 2Fe bruary-July1990 2.00 3March-August1990 3.00 4April-Se pte mbe r1990 4.00 5May-Octobe r1990 5.00 6June -Nove mbe r1990 6.00 7July-De ce mbe r1990 7.00 8August1990-January1991 8.00 9Se pte mbe r1990-Fe bruary1991 9.00 10Octobe r1990-March1991 10.00 11Nove mbe r1990-April1991 11.00 12De ce mbe r1990-May1991 12.00 13January-June 1991 13.00 14Fe bruary-July1991 14.00 15March-August1991 15.00 PROS 12.70 26.40 2.50 -20.00 -37.80 -33.30 -10.20 -20.30 38.90 20.20 50.60 66.90 7.50 17.50 39.60 DARTS .00 1.80 -14.30 -7.20 -16.30 -27.40 -22.50 -37.30 -2.50 11.20 72.90 16.60 28.70 44.80 71.30 Results • Pros got more gains than darts • Was this enough of a change to be significant? • If no difference, mean of computed differences should be zero – So, test the obtained mean of difference scores against m = 0. – Use same test as in one sample test t test D and sD = mean and standard deviation of differences. D m 8.22 8.22 t 6.85 sD 3.6 1.2 n 9 df = 100 - 1 = 9 - 1 = 8 Cont. t test--cont. • • • • With 99 df, t.01 = +2.62 (Table E.6) We calculated t = 2.64 Since 6.64 > 2.62, reject H0 Conclude that the Pros did get significantly more than Darts Advantages of Related Samples • Eliminate subject-to-subject variability • Control for extraneous variables • Need fewer subjects Disadvantages of Related Samples • • • • • Order effects Carry-over effects Subjects no longer naïve Change may just be a function of time Sometimes not logically possible Between subjects t test • Distribution of differences between means • Heterogeneity of Variance • Nonnormality Pros during ups and downs in DOW • Effect of fluctuations in DOW: did it effect Pros – Different question than previously • Now we have two independent groups of data – Pros during positive DOW, and Pros during negative DOW – We want to compare means of two groups Effect of changes in DOW TIME 1January-June1990 2February-July1990 3March-August1990 4April-September1990 5May-October1990 6June-November1990 7July-December1990 8August1990-January1991 9September1990-February1991 10October1990-March1991 11November1990-April1991 12December1990-May1991 13January-June1991 14February-July1991 15March-August1991 16April-September1991 TIMENO 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 PROS 12.70 26.40 2.50 -20.00 -37.80 -33.30 -10.20 -20.30 38.90 20.20 50.60 66.90 7.50 17.50 39.60 15.60 DARTS .00 1.80 -14.30 -7.20 -16.30 -27.40 -22.50 -37.30 -2.50 11.20 72.90 16.60 28.70 44.80 71.30 2.80 DJIA DowTrend 2.50 1 11.50 1 -2.30 0 -9.20 0 -8.50 0 -12.80 0 -9.30 0 -.80 0 11.00 1 15.80 1 16.20 1 17.30 1 17.70 1 7.60 1 4.40 1 3.40 1 Differences from within subjects test Cannot compute pairwise differences, since we cannot compare two random data points We want to test differences between the two sample means (not between a sample and population) Analysis • How are sample means distributed if H0 is true? • Need sampling distribution of differences between means – Same idea as before, except statistic is (X1 - X2) (mean 1 – mean2) Sampling Distribution of Mean Differences • Mean of sampling distribution = m1 - m2 • Standard deviation of sampling distribution (standard error of mean differences) = sX X 1 2 1 2 2 2 s s n1 n2 Cont. Sampling Distribution--cont. • Distribution approaches normal as n increases. • Later we will modify this to “pool” variances. Analysis--cont. • Same basic formula as before, but with accommodation to 2 groups. X1 X 2 X1 X 2 t 2 2 sX X s1 s2 n1 n2 1 2 • Note parallels with earlier t Degrees of Freedom • Each group has 5 data points. • Each group has n - 1 = 50 - 1 = 8 df • Total df = n1 - 1 + n2 - 1 = n1 + n2 - 2 50 + 50 - 2 = 98 df • t.01(98) = +2.62 (approx.) Assumptions • Two major assumptions – Both groups are sampled from populations with the same variance • “homogeneity of variance” – Both groups are sampled from normal populations • Assumption of normality – Frequently violated with little harm. Heterogeneous Variances • Refers to case of unequal population variances. • We don’t pool the sample variances. • We adjust df and look t up in tables for adjusted df. • Minimum df = smaller n - 1. – Most software calculates optimal df.