Download Statistical Guide - St. Cloud State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Categorical variable wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistical Guide
Table of Contents
Useful Tips for Survey Design........................................................................................................................ 2
Types of Questions.................................................................................................................................... 2
Hypothesis Testing ........................................................................................................................................ 3
Null Hypothesis (Ho ) ................................................................................................................................. 3
Alternative Hypothesis (Ha)....................................................................................................................... 3
Alpha () ................................................................................................................................................... 3
Chi-Square (χ2) .............................................................................................................................................. 4
Values to report: ....................................................................................................................................... 5
Example ..................................................................................................................................................... 5
Correlations (r) .............................................................................................................................................. 6
Values to report: ....................................................................................................................................... 6
Example ..................................................................................................................................................... 6
T-tests (t) ....................................................................................................................................................... 8
T-Test (Paired Samples/Within Cases) ...................................................................................................... 8
T-test (Independent/Group) ..................................................................................................................... 9
Levene’s Test for Equality of Variances Decision ...................................................................................... 9
Values to report: ....................................................................................................................................... 9
Example ..................................................................................................................................................... 9
Analysis of Variance (ANOVA) (F)................................................................................................................ 10
Values to report: ..................................................................................................................................... 11
Example ................................................................................................................................................... 11
Cronbach's Alphas (α) ................................................................................................................................. 12
Values to report: ..................................................................................................................................... 12
Example ................................................................................................................................................... 12
Regression ................................................................................................................................................... 13
Models: ................................................................................................................................................... 13
Values to report: ..................................................................................................................................... 14
Example ................................................................................................................................................... 14
1
Useful Tips for Survey Design
Follow the 6 steps of successful surveys. For more detailed information about the six steps please refer
to the following article:
http://www.stcloudstate.edu/graduatestudies/current/culmProject/documents/WhitePapers_SurveySu
ccess.pdf
1. Clearly define objective(s) and goal(s) of the survey.
2. Ask the right questions; get information needed to achieve survey objectives.
Types of Questions
•
Multiple Choice with no response
•
Fill in the blank (ex: Age)
•
Categories with ranges (ex: Income)
•
Statement with range of agreement
•
Open-ended
•
Check all that apply
3. Select target audience (and method of communication to them).
4. Solicit participation.
5. Test the survey.
6. Execute and analyze.
2
Hypothesis Testing
Formal procedures used to determine the probability that a given hypothesis is true.
Six general steps
1. State the hypotheses.
2. Formulate a plan for analysis. Decide on level of significance (alpha) and sample size.
3. Select and compute the appropriate test statistic.
4. Interpret results.
5. Apply decision rule.
6. State conclusion.
Null Hypothesis (Ho )
Hypothesis that observations made are the result of pure chance.
In statistics, the only way of supporting your hypothesis is to refute the null hypothesis.
Ho (ex: Ho: M1 = M2; the mean of group 1 is equal to the mean of group 2).
Alternative Hypothesis (Ha)
What the researcher wants to prove.
Observations made are influenced by some non-random cause.
Ha (ex: Ha: M1 ≠ M2; the mean of group 1 is not equal to the mean of group 2).
In Hypothesis Testing, the goal is usually to reject Ho.
Alpha ()
Alpha is the level of acceptable significance; you choose the alpha level that you will be using prior to
data collection. Note: typical alpha levels are shown below.
Decision to reject or accept Ho is based on Alpha and the appropriate test statistic when applied to the
data.
A result is deemed statistically significant if it is unlikely to have occurred by chance, and thus provides
enough evidence to reject the null hypothesis.
Levels of Alpha used in Research:
p < .10 = 90% confident in the decision to reject Ho and conclude HA
p < .05 (most common) = 95% confident
p < .01 = 99% confident
3
Chi-Square (χ2)
This test is used when you wish to examine the relationship between two categorical variables that
contain two or more levels in each.
χ2 compares the observed frequencies, or proportions of cases that occur in each of the levels within
each category, with the values that would be expected if there was no association between the two
variables
Ho: There is not a relationship between “witnessed abduction” and “outcome status”
Ha: There is a relationship between “witnessed abduction” and “outcome status”
Chi-Square =157.614, sig. = .000 Therefore; Conclusion: Ha
One of the assumptions of Chi-Square is the expected cell count. This means that you need at
least 5 observations in every cell. The correction for this is done by subtracting 0.5 from the
difference between each observed value and its expected value. This reduces the chi-squared
value obtained and thus increases its p-value. *This is shown in the example above, under the
table.
4
Values to report:
Chi-Square statistics are reported with degrees of freedom and sample size in parentheses, the Pearson
chi-square value (rounded to two decimal places), and the significance level.
Example
The percentage of outcome status differed based on whether or not the abduction was witnessed, 2(6,
N = 186) = 157.61, p < .05. Therefore, there is a relationship between witnessed abduction and outcome
status. In the cases of assumed abduction the outcome status two count (44) was a lot larger than the
expected count (13.7). In the cases that were inapplicable the outcome status two count (2) was a lot
smaller than the expected count (36.6). Additionally, the cases of assumed abduction and outcome
status one had not observations therefore the count was zero which was a lot lower than the expected
count (19.2). This shows where the differences are occurring based on the abduction type and outcome
types.
5
Correlations (r)
A correlation describes relationships between variables (it does not imply causation).
The following is the range of an r value -1 ≤ r ≤ +1 *sign and size matter.
The closer r is to ±1, the stronger the relationship between the variables.
The sign ± indicates the direction of the relationship. A negative relationship indicates that as one
variable increases (decreases) the other variable decreases (increases). A positive relationship indicates
that as one variable increase (decreases) the other variable also increases (decreases).
General rule of thumb for interpretation:
r value of ± .7 to ± 1.0 - Indicates a strong relationship
r value of ± .35 to ± .69 - Moderately strong relationship
r value of < ± .35 - Weak or no relationship
Values to report:
Correlation (r) and significance level (p).
Example
Referring back to the hypothesis testing procedure, the following example will illustrate the procedure
in its most basic form.
1. State the hypotheses.
a. The null hypothesis (Ho) is that there is no relationship between Total FY and FY
Years. The alternative hypothesis (Ha) is that there is a statistically significant
relationship between Total FY and FY Years.
6
2. Formulate a plan for analysis. Decide on level of significance (sig.) and sample size.
a. A correlation between the students survey responses on variables Total FY and FY
Years will be ran. The sample size will be approximately 732 students (n=732). The
alpha level that will be used to test the significance of the relationship will be p <
.05.
3. Select and compute the appropriate test statistic.
a. This is what we do for you! The results for this example are shown above.
4. Interpret results.
a. Total FY and FY Years were moderately strong in their correlation (r =.577) and since
the Sig.(2-tailed) is .000, you can state that with 99% confidence.
5. Apply decision rule.
a. The Sig.(2-tailed) is .000 which indicates that p < .05 and the population r value (r =
.577 based on sample data) is between + .35 to + .69 indicating a moderate positive
relationship between Total FY and FY Years.
6. State conclusion.
a. The null hypothesis is rejected (the alternative hypothesis is accepted) because, a
moderately strong positive relationship between Total FY and FY Years was found, r
= .57, p < .05.
7
T-tests (t)
T-tests can also be used to assesses whether the means of two groups are statistically different from
each other.
T-Test (Paired Samples/Within Cases)
A paired sample T-test is used when each observation in one group is paired with a related observation
in the other group (i.e. pre-test and post-test by same group of individuals).
Ho : The mean of Initial_interest is
equal to the mean of
subinter_vs_Initlnt
Ha : The mean of Initial_interest is not
equal to the mean of
subinter_vs_Initlnt
Pair 1:
t = -12.173, sig. = .000
If sig. > α , HO
If sig. < α , Ha
thus, rejecting the null hypothesis (Support of HA)
Conclusion:
The mean of Initial_interest (M = 33.22 ) is not equal to the mean of Subinter_vs_InitInt (M = 51.54)
8
T-test (Independent/Group)
An Independent T-test is used to
compare the means of two
different groups that is, the groups
are independent from one another.
Ho : Men’s GlobScore = Female’s Globscore
Ha : Men’s GlobScore ≠ Female’s Globscore
Levene’s Test for Equality of Variances Decision
There are two different formulas for computing a T-test value. One formula uses the variance from each
of the two groups because they are not statistically equal. The other formula for t is based on a pooled
variance from the two groups because they are equal.
To determine which t-test formula should be used, either the “equal variances assumed” formula or the
“equal variances not assumed” formula, the Levene’s Test is used.
Use the F value and Sig. level under the Levene’s Test for Equality of Variances for the steps below:
If Sig. > α, then equal variances is assumed and If Sig. ≤ α, then equal variances is not assumed
Since Levene’s Test shows (F = 8.29, Sig = .007), then to determine Ha, you must use equal variances not
assumed. In conclusion, t = 1.257, p > .238, thus, you are unable to reject Ho (*see Example section
below for written explanation)
Values to report:
Means (M) and standard deviations (SD) for each group, t value (t), degrees of freedom (in parentheses
next to t), and significance level (p).
Example
Men’s GlobScore was not (M = 44.85, SD = 3.93) significantly higher than women’s GlobScore, (M =
41.67, SD = 7.23), t(9.69) = 1.26, p = ns.
9
Analysis of Variance (ANOVA) (F)
ANOVAs are used in designs that involve two or more groups and a comparison of groups means is
required.
ANOVA will tell whether the responses significantly vary across the groups, but not precisely which
group is significantly different from the others. If significance is found, posttests (e.g. Tukey) can be
computed to determine where the differences are.
The example below is for a One-way ANOVA (between subjects)
Hypotheses:
Ho : µ1 = µ2 = µ3 Note: µ = population mean
Ha : at least one mean is different from the others
Descriptives
barks
N
Mean
Std. Deviation Std. Error
95% Confidence Interval for
Minimum
Maximum
Mean
Lower Bound
Upper Bound
pug
10
4.2000
1.61933
.51208
3.0416
5.3584
1.00
6.00
lab
10
10.5000
1.84089
.58214
9.1831
11.8169
8.00
14.00
golden
10
7.8000
1.03280
.32660
7.0612
8.5388
6.00
9.00
30
7.5000
3.01433
.55034
6.3744
8.6256
1.00
14.00
retriever
Total
ANOVA
barks
Sum of Squares
Between Groups
Within Groups
Total
df
Mean Square
199.800
2
99.900
63.700
27
2.359
263.500
29
F
42.344
Sig.
.000
10
Multiple Comparisons
Dependent Variable: barks
Tukey HSD
(I) type
(J) type
Mean Difference
Std. Error
Sig.
(I-J)
lab
lab
pug
golden retriever
pug
Upper Bound
-6.30000
.68691
.000
-8.0031
-4.5969
-3.60000
*
.68691
.000
-5.3031
-1.8969
6.30000
*
.68691
.000
4.5969
8.0031
2.70000
*
.68691
.001
.9969
4.4031
3.60000
*
.68691
.000
1.8969
5.3031
-2.70000
*
.68691
.001
-4.4031
-.9969
golden retriever
lab
Lower Bound
*
pug
golden retriever
95% Confidence Interval
*. The mean difference is significant at the 0.05 level.
Values to report:
Means (M) and standard deviations (SD) for each group, F value (F), degrees of freedom (numerator,
denominator; in parentheses separated by a comma next to F), and significance level (p).
Example
The number of barks per day were significantly different based on the three types of dogs, F(2, 27) =
42.34, p < .05. The Tukey HSD post hoc comparison showed that labs (M = 10.50, SD = 1.84), golden
retrievers (M = 7.80, SD = 1.03), and pugs (M = 4.20, SD = 1.62) all were significantly different from one
another based on the number of barks per day.
11
Cronbach's Alphas (α)
Cronbach’s alpha is a coefficient of internal consistency and is used as an estimate of reliability, typically
in surveys.
Case Processing Summary
Agreeableness
N
Valid
Cases
%
6
100.0
Excluded
0
.0
Total
6
100.0
a
a. Listwise deletion based on all variables in the
procedure.
Reliability Statistics
Cronbach's Alpha
N of Items
.934
3
Values to report:
The number of items that make up the subscale, and the associated Cronbach's alpha.
Example
The agreeableness subscale consisted of 3 items (a = .93).
12
Regression
Regression is used for estimating the relationship between a continuous dependent variable (DV) and
one or more continuous or discrete independent, or predictor variables.
Builds a predictive model of the dependent variable based on the information provided in the
independent variable(s).
The value of the regression (simple and multiple) lies in its capacity to estimate the relative importance
of one or several hypothesized predictors and its ability to assess the contribution of the combined
variable(s) to change in the DV.
The (Adj.) R2 value is an indicator of how well the model fits the data. It indicates the amount (percent)
of variability in the dependent variable that can be explained by the regression model.
Models:
Simple: Uses 1 independent variable (predictor) in the equation to predict the dependent
variable.
Multiple: Forces all predictors (2 or more) into equation to build best representation of
relationship between the dependent variable and the independent variables in the equation.
Stepwise: Builds a step-by-step regression model to predict the DV by examining the set of IVs
and using the most significant variable remaining in the list.
*See example on next page for more information.
13
*The example below is an example of stepwise regression and in the column on the left labeled
model, the numbers 1 and 2 refer to the steps in the regression.
Values to report:
Results should at least present the unstandardized or standardized slope (beta), whichever is more
interpretable given the data (beta for comparing within study and b for generalizations across studies),
along with the t-test and the corresponding significance level. (Degrees of freedom for the t-test is N-k-1
where k equals the number of predictor variables.) It is also customary to report the percentage of
variance explained along with the corresponding F test.
Example
In the first step, campus climate significantly predicted FFYears, = .138, t(571) = 3.33, p < .05. In the
second step of the regression, when income code was add, the model accounted for 3% of the variance
in FFYears (R2 = .03, F(2, 571) = 9.53, p < .05) which was significantly more variance, R2Change = .013,
F(1,571) = 7.82, p < .05.
Prediction Model (2):
FFYears = -.536 + .014 (CampusClimate) + .047 (IncomeCode)
(t = 3.23, sig. = .001)
(t = 2.796, sig. = .005)
14