* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inference concerning one or two means (t-test and z
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Analysis of variance wikipedia , lookup
Resampling (statistics) wikipedia , lookup
The Nine-Step Version of Hypothesis Testing • State H0. • State Ha. • Specify α (the level of significance desired). • Specify the effect size (ES). • Specify the level of power desired. • Determine the proper size of the sample. • Collect and analyze the sample data. • Refer to a criterion to assess the sample evidence • Make a decision to retain or discard H0. Step4: Specifying the Effect Size • Deciding the smallest study finding that is worth considering. • Can be (small , medium, or large) • Depends on the kind of test (t-test, correlation, … etc) • e.g. in t-test (0.2, 0.5, 0.8 for small, medium, large respectively). • The effect size is the researcher's judgment of the meaningful deviation from the null hypothesis. • Example on excel. • Effect size calculator: http://www.uccs.edu/~faculty/lbecker/ Step 5: Specifying the Desired Level of Power • If after testing with the effect size, the H0 is still off, you need a level of power, i.e. H0 is off by the previously established effect size. • Power is a probability value (0-1). • The complement of power is the probability of type II error. • Usually set to .8 meaning that the chance of type II error (accepting H0 when it is false). • Power is not usually set greater than 90% (.9) for two reasons: – The probability that trivial deviations from H0 will be labeled statistically significant. – Puts too much demand on researcher to meet the required power when computing the sample size needed. Step 6:Determination of the Sample Size Needed • Using a calculator when you know the confidence interval, and the population size. • http://www.surveysystem.com/sscalc.htm • Another way to determine the sample size using the margin of error: • http://www.raosoft.com/samplesize.html Hypothesis Testing Using Confidence Interval • Used as an alternative to the critical value or the p-value. • Provides insight into why H0 was accepted or rejected. • Compute an interval around the sample data instead of a single value. • The alpha level has to be present (an α of 0.05 indicates a 95% interval). • Calculate the pinpoint of H0 , if it is out of the CI, then H0 is rejected. Otherwise, it is accepted. • The use of adding and subtracting the standard error to the sample statistic is not alpha-driven and it is in the 68% percent interval. Alphadriven is in the 95% interval. – However, it is traditional to use the standard error. Adjusting for an inflated Type I Error • The risk of inflated type I error is in place when the alpha level is used more than once in a test. That is, when H0 is tested more than once. What is an intuitive solution? • Solution: Adjust the alpha level: dividing alpha by the number of tests where alpha is used. AKA, The Bonferroni Technique. Example on page 196 of the text book. Warnings • Two types of effect size: – In the 9-step hypothesis testing, ES is predicted to be the minimum possible effect size prior to evaluating the study data. – In the 7-step hypothesis testing, ES is computed from the data. Be careful where you report yours! • The criteria for effect size (small, medium, large) depends on the study statistic. – For the mean (.2, .5, .8 are small, medium, and large) – For coefficient (.1, .3, .5 are small, medium, large) • The six-step hypothesis testing version is simplistic but unfortunately widely adopted. • Use Bonferroni Technique to reduce the possibility of the Inflated Type I Error. Chapter 9 Statistical Inferences Concerning Bivariate Coefficients Statistical Inferences Concerning Bivariate Coefficients • Making the educated guess based on the sample data only. • Statistical tests applied to correlation coefficients. • How Bonferroni technique is used in conjunction with correlation coefficients. • How to compare two or more correlation coefficients to see if they are significantly different. • How statistical tests can be applied to reliability and validity coefficients. Statistical Tests: Single Correlation Coefficient • Purpose of Inferential – Not being able to test the entire population. Infer form the sample data. • The Null Hypothesis – A null correlation hypothesis is usually implied as H0: p=0.00 • Deciding If r is Statistically Significant – By comparing the p-value associated with r against α (usually α is set to 0.05[5%]). – By comparing the calculated r from the sample to a table of critical values. Statistical Tests: Single Correlation Coefficient, cont’d • One-Tailed and Two-Tailed Tests on r – Most of the time, two-tailed is assumed – That is, testing for both negative and positive correlation • Tests on specific kinds of correlation – Spearman, Pearson, Phi, etc. – If r is indicated with no type, Pearson Product Moment is assumed Tests on Many Correlation Coefficients • Sometimes two or more correlations are inferentially tested in the same study. • Presented in various ways: • Tests on the Entries of a Correlation Matrix – Correlation coefficient does not test if the variables per se are correlatied. – Rather, it is the measurements of the variables that are correlated. • Tests on several correlation coefficients reported in the text. • The Bonferroni Adjustment Technique – Adjusting the level of p against α by the dividing the overall p by the number of correlations desired. – Holds down the chance Inflated Type I Error • Comparing two Correlation Coefficients Statistically • Use http://faculty.vassar.edu/lowry/rdiff.html, that uses Fisher transformation. r-to-z Chapter 10 Inference Concerning One or Two Means (t-tests & z-test) Inference Concerning a Single Mean • Single sample • The sample mean (𝑋) is in focus for inferential matters. • Two approaches: – Using confidence intervals – Using the mean to evaluate a null hypothesis The Inferential Purpose • µ is made based on the known value of 𝑿. • Interval Estimation – Confidence Interval is built around the mean. – CI indicates that the population mean (µ) will (probably) fall into the CI. – The accompanying (usually 95%) means that if many samples were drawn, the associated CIs will overlap the population mean (µ). – http://pirate.shu.edu/~wachsmut/Teaching/MATH1101/Testing/confidencemean.html – CI is affected by the sample size (n), the sample mean (𝑿) and the standard deviation (s). – In Excel, the function is: =CONFIDENCE(alpha, standard deviation of the sample, sample size) – Good article about the use of CONFIDENCE is here: http://support.microsoft.com/kb/828124 The Inferential Purpose, cont’d • Tests Concerning a Null Hypothesis – When H0 involves a pinpoint mean value, testing between the sample mean and H0 takes place. • H0: µ=a, where a is the pinpoint value chosen by the researcher. – If the difference is significant, H0 is rejected. – t-test and z-test are the most popular approaches for this kind of testing. – z-test is slightly biased, but the bias is ignorable when the sample size is at least . – In excel, t- tests is here: • http://www.youtube.com/watch?v=wGoMEYinf6Y • And http://www.wellesley.edu/Psychology/Psych205/onettest.html Inferences Concerning Two Means • Several points to consider: – Independent vs. correlated samples – Testing with two means (comparing samples) using confidence intervals. – Testing with two means (comparing samples) by setting up an H0. Inferences Concerning Two Means Independent vs. Correlated Samples • Correlated Samples: a relationship exists between each member of one sample and one and only one member of the other sample. – Test and re-test for the same group. – Matching: a member (or more) of the first sample is chosen for the second samples for a different test. – Biological twins split-up. • Independent Sample: No such relationship exists. • Inferences Concerning Two Means The Inferential Purpose • With the two different types of samples, when they are compared in terms of their means: – The inferences is applied to both populations from which the samples were drawn. – The inferences is made about the populations NOT the samples. Inferences Concerning Two Means Setting Up and Testing a Null Hypothesis • Usually, H0 is not stated. Assume it is that no difference between the means exists. H0: µ1 - µ2 =0, unless it is otherwise indicated. • Use t-test or F-test. • F- test gives the probability that the variance between the two samples is not significantly different. • t-test is similar but with slightly different outcome. • Excel Examples: t-Test: Two-Sample Assuming Unequal Variances X 1 2 3 4 5 6 7 8 Y 22 5 26 34 41 14 18 15 Data, F-test value was 0.000556 X Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail 4.5 6 8 0 8 -4.15157 0.001601 1.859548 0.003202 2.306004 Y 21.875 134.125 8 F-test, again F-test 0.160883202 IQ Group1 IQ Group 2 123 78 123 33 111 23 113 45 101 54 103 34 99 61 89 45 110 65 105 65 An F-test returns the two-tailed probability that the variances in array1 (IQ Group1) and array2 (IQ Group2) are not significantly different. Use this function to determine whether two samples have different variances. For example, given test scores from public and private schools, you can test whether these schools have different levels of test score diversity. In the example above (0.161) indicates that the probability that the variances in the two groups (112 and 299) are significantly different is high. ANOVA (Analysis of Variance) Anova: Single Factor SUMMARY Groups X Count Y ANOVA Source of Variation Between Groups Within Groups Total Most important values are F & P. F= Between Groups MS / Within Groups MS. Sum 8 36 8 175 SS df Average Variance 4.5 6 21.875 MS 1207.563 1 1207.563 980.875 14 70.0625 2188.438 15 134.125 F 17.2355 P-value 0.000979 Here is a good video explaining ANOVA: http://www.youtube.com/watch?v=A6j9oxAkQ3g F crit 4.60011 ANOVA Example Sum of Squares Differences between the averages for each level IQ Group1 IQ Group 2 123 78 123 33 111 23 113 45 101 54 103 34 99 61 89 45 110 65 105 65 Anova: Single Means Sum of squares = SS /DF Factor SUMMARY Groups IQ Group1 IQ Group 2 Count 10 10 Sum Average Variance 1077 107.7 112.4556 503 50.3 299.3444 ANOVA Source of Variation Between Groups Within Groups Total The variance within each level SS 16473.8 3706.2 df 20180 Degree of freedom: 1+1 = 2 Groups, 1+18+1= 20 subjects 1 18 MS F P-value F crit 16473.8 80.00874 4.83E-08 4.413873 205.9 19 The F statistic = Means Sum of Squares (between) / Means Sum of Squares (within) Interval Estimation with Two Means • Built around the difference in means to be used instead of the significance of the difference. • If the CI does not overlap ZERO, the difference is significant. Multiple Dependant Variables • Results Presented in Text – With respect to each variable, state the difference (t-test, t(N-2), and p-value). – F-test, you need F(1, N-2)=F-value In addition to p-value). – If comparing means, you may need to state the means and the standard deviations. • Results Presented in a Table – Looks like the correlation matrix. – State variables and conditions and put the means, the SDs, t-statistic in addition to the p-value for each variable. – The excerpt on page 242 is a good example. Use of Bonferroni Adjustment Technique • Usually by dividing the alpha level of (0.05) by the number of dependent variables each with its own H0. • When alpha is decreased to a lower level by the researcher, the technique is called Pseudo-Bonferroni. Effect Size Assessment and Power Analysis • Has to deal with the issue of ‘Practical Significance’ not just the statistical significance. • Effect size calculator: http://www.cemcentre.org/evidencebased-education/effect-size-calculator • Online calculator: http://www.uccs.edu/~faculty/lbecker/ • Cohen's d = M1 - M2 / spooled where spooled = [(s 1+ s 2) / 2] • TABLE ON PAGE 246 Post Hoc Power Analysis • An alternatives to the estimated effect size. • To clarify the results. • Usually done if the difference turned out to be insignificant. Comments • Insignificance does not mean H0 is true: • Why? • • – If there could be more than one conflicting null hypotheses. – If the measurement is not reliable. – Not doing a statistical power before making final conclusions. Overlapping Distributions – Even though the means can be significantly different, scores may overlap. – The standard deviation can show such case. The Typical Use of t-Test – t-Test is typically used for different purposes such as evaluating H0 with one or two means. – It can also be used to measure the difference between two correlations. • Practical vs. Statistical Significance • Type I and Type II error.