* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Statistical Guide - St. Cloud State University
Bootstrapping (statistics) wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Analysis of variance wikipedia , lookup
Categorical variable wikipedia , lookup
Omnibus test wikipedia , lookup
Statistical Guide Table of Contents Useful Tips for Survey Design........................................................................................................................ 2 Types of Questions.................................................................................................................................... 2 Hypothesis Testing ........................................................................................................................................ 3 Null Hypothesis (Ho ) ................................................................................................................................. 3 Alternative Hypothesis (Ha)....................................................................................................................... 3 Alpha () ................................................................................................................................................... 3 Chi-Square (χ2) .............................................................................................................................................. 4 Values to report: ....................................................................................................................................... 5 Example ..................................................................................................................................................... 5 Correlations (r) .............................................................................................................................................. 6 Values to report: ....................................................................................................................................... 6 Example ..................................................................................................................................................... 6 T-tests (t) ....................................................................................................................................................... 8 T-Test (Paired Samples/Within Cases) ...................................................................................................... 8 T-test (Independent/Group) ..................................................................................................................... 9 Levene’s Test for Equality of Variances Decision ...................................................................................... 9 Values to report: ....................................................................................................................................... 9 Example ..................................................................................................................................................... 9 Analysis of Variance (ANOVA) (F)................................................................................................................ 10 Values to report: ..................................................................................................................................... 11 Example ................................................................................................................................................... 11 Cronbach's Alphas (α) ................................................................................................................................. 12 Values to report: ..................................................................................................................................... 12 Example ................................................................................................................................................... 12 Regression ................................................................................................................................................... 13 Models: ................................................................................................................................................... 13 Values to report: ..................................................................................................................................... 14 Example ................................................................................................................................................... 14 1 Useful Tips for Survey Design Follow the 6 steps of successful surveys. For more detailed information about the six steps please refer to the following article: http://www.stcloudstate.edu/graduatestudies/current/culmProject/documents/WhitePapers_SurveySu ccess.pdf 1. Clearly define objective(s) and goal(s) of the survey. 2. Ask the right questions; get information needed to achieve survey objectives. Types of Questions • Multiple Choice with no response • Fill in the blank (ex: Age) • Categories with ranges (ex: Income) • Statement with range of agreement • Open-ended • Check all that apply 3. Select target audience (and method of communication to them). 4. Solicit participation. 5. Test the survey. 6. Execute and analyze. 2 Hypothesis Testing Formal procedures used to determine the probability that a given hypothesis is true. Six general steps 1. State the hypotheses. 2. Formulate a plan for analysis. Decide on level of significance (alpha) and sample size. 3. Select and compute the appropriate test statistic. 4. Interpret results. 5. Apply decision rule. 6. State conclusion. Null Hypothesis (Ho ) Hypothesis that observations made are the result of pure chance. In statistics, the only way of supporting your hypothesis is to refute the null hypothesis. Ho (ex: Ho: M1 = M2; the mean of group 1 is equal to the mean of group 2). Alternative Hypothesis (Ha) What the researcher wants to prove. Observations made are influenced by some non-random cause. Ha (ex: Ha: M1 ≠ M2; the mean of group 1 is not equal to the mean of group 2). In Hypothesis Testing, the goal is usually to reject Ho. Alpha () Alpha is the level of acceptable significance; you choose the alpha level that you will be using prior to data collection. Note: typical alpha levels are shown below. Decision to reject or accept Ho is based on Alpha and the appropriate test statistic when applied to the data. A result is deemed statistically significant if it is unlikely to have occurred by chance, and thus provides enough evidence to reject the null hypothesis. Levels of Alpha used in Research: p < .10 = 90% confident in the decision to reject Ho and conclude HA p < .05 (most common) = 95% confident p < .01 = 99% confident 3 Chi-Square (χ2) This test is used when you wish to examine the relationship between two categorical variables that contain two or more levels in each. χ2 compares the observed frequencies, or proportions of cases that occur in each of the levels within each category, with the values that would be expected if there was no association between the two variables Ho: There is not a relationship between “witnessed abduction” and “outcome status” Ha: There is a relationship between “witnessed abduction” and “outcome status” Chi-Square =157.614, sig. = .000 Therefore; Conclusion: Ha One of the assumptions of Chi-Square is the expected cell count. This means that you need at least 5 observations in every cell. The correction for this is done by subtracting 0.5 from the difference between each observed value and its expected value. This reduces the chi-squared value obtained and thus increases its p-value. *This is shown in the example above, under the table. 4 Values to report: Chi-Square statistics are reported with degrees of freedom and sample size in parentheses, the Pearson chi-square value (rounded to two decimal places), and the significance level. Example The percentage of outcome status differed based on whether or not the abduction was witnessed, 2(6, N = 186) = 157.61, p < .05. Therefore, there is a relationship between witnessed abduction and outcome status. In the cases of assumed abduction the outcome status two count (44) was a lot larger than the expected count (13.7). In the cases that were inapplicable the outcome status two count (2) was a lot smaller than the expected count (36.6). Additionally, the cases of assumed abduction and outcome status one had not observations therefore the count was zero which was a lot lower than the expected count (19.2). This shows where the differences are occurring based on the abduction type and outcome types. 5 Correlations (r) A correlation describes relationships between variables (it does not imply causation). The following is the range of an r value -1 ≤ r ≤ +1 *sign and size matter. The closer r is to ±1, the stronger the relationship between the variables. The sign ± indicates the direction of the relationship. A negative relationship indicates that as one variable increases (decreases) the other variable decreases (increases). A positive relationship indicates that as one variable increase (decreases) the other variable also increases (decreases). General rule of thumb for interpretation: r value of ± .7 to ± 1.0 - Indicates a strong relationship r value of ± .35 to ± .69 - Moderately strong relationship r value of < ± .35 - Weak or no relationship Values to report: Correlation (r) and significance level (p). Example Referring back to the hypothesis testing procedure, the following example will illustrate the procedure in its most basic form. 1. State the hypotheses. a. The null hypothesis (Ho) is that there is no relationship between Total FY and FY Years. The alternative hypothesis (Ha) is that there is a statistically significant relationship between Total FY and FY Years. 6 2. Formulate a plan for analysis. Decide on level of significance (sig.) and sample size. a. A correlation between the students survey responses on variables Total FY and FY Years will be ran. The sample size will be approximately 732 students (n=732). The alpha level that will be used to test the significance of the relationship will be p < .05. 3. Select and compute the appropriate test statistic. a. This is what we do for you! The results for this example are shown above. 4. Interpret results. a. Total FY and FY Years were moderately strong in their correlation (r =.577) and since the Sig.(2-tailed) is .000, you can state that with 99% confidence. 5. Apply decision rule. a. The Sig.(2-tailed) is .000 which indicates that p < .05 and the population r value (r = .577 based on sample data) is between + .35 to + .69 indicating a moderate positive relationship between Total FY and FY Years. 6. State conclusion. a. The null hypothesis is rejected (the alternative hypothesis is accepted) because, a moderately strong positive relationship between Total FY and FY Years was found, r = .57, p < .05. 7 T-tests (t) T-tests can also be used to assesses whether the means of two groups are statistically different from each other. T-Test (Paired Samples/Within Cases) A paired sample T-test is used when each observation in one group is paired with a related observation in the other group (i.e. pre-test and post-test by same group of individuals). Ho : The mean of Initial_interest is equal to the mean of subinter_vs_Initlnt Ha : The mean of Initial_interest is not equal to the mean of subinter_vs_Initlnt Pair 1: t = -12.173, sig. = .000 If sig. > α , HO If sig. < α , Ha thus, rejecting the null hypothesis (Support of HA) Conclusion: The mean of Initial_interest (M = 33.22 ) is not equal to the mean of Subinter_vs_InitInt (M = 51.54) 8 T-test (Independent/Group) An Independent T-test is used to compare the means of two different groups that is, the groups are independent from one another. Ho : Men’s GlobScore = Female’s Globscore Ha : Men’s GlobScore ≠ Female’s Globscore Levene’s Test for Equality of Variances Decision There are two different formulas for computing a T-test value. One formula uses the variance from each of the two groups because they are not statistically equal. The other formula for t is based on a pooled variance from the two groups because they are equal. To determine which t-test formula should be used, either the “equal variances assumed” formula or the “equal variances not assumed” formula, the Levene’s Test is used. Use the F value and Sig. level under the Levene’s Test for Equality of Variances for the steps below: If Sig. > α, then equal variances is assumed and If Sig. ≤ α, then equal variances is not assumed Since Levene’s Test shows (F = 8.29, Sig = .007), then to determine Ha, you must use equal variances not assumed. In conclusion, t = 1.257, p > .238, thus, you are unable to reject Ho (*see Example section below for written explanation) Values to report: Means (M) and standard deviations (SD) for each group, t value (t), degrees of freedom (in parentheses next to t), and significance level (p). Example Men’s GlobScore was not (M = 44.85, SD = 3.93) significantly higher than women’s GlobScore, (M = 41.67, SD = 7.23), t(9.69) = 1.26, p = ns. 9 Analysis of Variance (ANOVA) (F) ANOVAs are used in designs that involve two or more groups and a comparison of groups means is required. ANOVA will tell whether the responses significantly vary across the groups, but not precisely which group is significantly different from the others. If significance is found, posttests (e.g. Tukey) can be computed to determine where the differences are. The example below is for a One-way ANOVA (between subjects) Hypotheses: Ho : µ1 = µ2 = µ3 Note: µ = population mean Ha : at least one mean is different from the others Descriptives barks N Mean Std. Deviation Std. Error 95% Confidence Interval for Minimum Maximum Mean Lower Bound Upper Bound pug 10 4.2000 1.61933 .51208 3.0416 5.3584 1.00 6.00 lab 10 10.5000 1.84089 .58214 9.1831 11.8169 8.00 14.00 golden 10 7.8000 1.03280 .32660 7.0612 8.5388 6.00 9.00 30 7.5000 3.01433 .55034 6.3744 8.6256 1.00 14.00 retriever Total ANOVA barks Sum of Squares Between Groups Within Groups Total df Mean Square 199.800 2 99.900 63.700 27 2.359 263.500 29 F 42.344 Sig. .000 10 Multiple Comparisons Dependent Variable: barks Tukey HSD (I) type (J) type Mean Difference Std. Error Sig. (I-J) lab lab pug golden retriever pug Upper Bound -6.30000 .68691 .000 -8.0031 -4.5969 -3.60000 * .68691 .000 -5.3031 -1.8969 6.30000 * .68691 .000 4.5969 8.0031 2.70000 * .68691 .001 .9969 4.4031 3.60000 * .68691 .000 1.8969 5.3031 -2.70000 * .68691 .001 -4.4031 -.9969 golden retriever lab Lower Bound * pug golden retriever 95% Confidence Interval *. The mean difference is significant at the 0.05 level. Values to report: Means (M) and standard deviations (SD) for each group, F value (F), degrees of freedom (numerator, denominator; in parentheses separated by a comma next to F), and significance level (p). Example The number of barks per day were significantly different based on the three types of dogs, F(2, 27) = 42.34, p < .05. The Tukey HSD post hoc comparison showed that labs (M = 10.50, SD = 1.84), golden retrievers (M = 7.80, SD = 1.03), and pugs (M = 4.20, SD = 1.62) all were significantly different from one another based on the number of barks per day. 11 Cronbach's Alphas (α) Cronbach’s alpha is a coefficient of internal consistency and is used as an estimate of reliability, typically in surveys. Case Processing Summary Agreeableness N Valid Cases % 6 100.0 Excluded 0 .0 Total 6 100.0 a a. Listwise deletion based on all variables in the procedure. Reliability Statistics Cronbach's Alpha N of Items .934 3 Values to report: The number of items that make up the subscale, and the associated Cronbach's alpha. Example The agreeableness subscale consisted of 3 items (a = .93). 12 Regression Regression is used for estimating the relationship between a continuous dependent variable (DV) and one or more continuous or discrete independent, or predictor variables. Builds a predictive model of the dependent variable based on the information provided in the independent variable(s). The value of the regression (simple and multiple) lies in its capacity to estimate the relative importance of one or several hypothesized predictors and its ability to assess the contribution of the combined variable(s) to change in the DV. The (Adj.) R2 value is an indicator of how well the model fits the data. It indicates the amount (percent) of variability in the dependent variable that can be explained by the regression model. Models: Simple: Uses 1 independent variable (predictor) in the equation to predict the dependent variable. Multiple: Forces all predictors (2 or more) into equation to build best representation of relationship between the dependent variable and the independent variables in the equation. Stepwise: Builds a step-by-step regression model to predict the DV by examining the set of IVs and using the most significant variable remaining in the list. *See example on next page for more information. 13 *The example below is an example of stepwise regression and in the column on the left labeled model, the numbers 1 and 2 refer to the steps in the regression. Values to report: Results should at least present the unstandardized or standardized slope (beta), whichever is more interpretable given the data (beta for comparing within study and b for generalizations across studies), along with the t-test and the corresponding significance level. (Degrees of freedom for the t-test is N-k-1 where k equals the number of predictor variables.) It is also customary to report the percentage of variance explained along with the corresponding F test. Example In the first step, campus climate significantly predicted FFYears, = .138, t(571) = 3.33, p < .05. In the second step of the regression, when income code was add, the model accounted for 3% of the variance in FFYears (R2 = .03, F(2, 571) = 9.53, p < .05) which was significantly more variance, R2Change = .013, F(1,571) = 7.82, p < .05. Prediction Model (2): FFYears = -.536 + .014 (CampusClimate) + .047 (IncomeCode) (t = 3.23, sig. = .001) (t = 2.796, sig. = .005) 14