Download Inference concerning one or two means (t-test and z

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Analysis of variance wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
The Nine-Step Version of Hypothesis Testing
•
State H0.
•
State Ha.
•
Specify α (the level of significance desired).
•
Specify the effect size (ES).
•
Specify the level of power desired.
•
Determine the proper size of the sample.
•
Collect and analyze the sample data.
•
Refer to a criterion to assess the sample evidence
•
Make a decision to retain or discard H0.
Step4: Specifying the Effect Size
• Deciding the smallest study finding that is worth considering.
• Can be (small , medium, or large)
• Depends on the kind of test (t-test, correlation, … etc)
• e.g. in t-test (0.2, 0.5, 0.8 for small, medium, large respectively).
• The effect size is the researcher's judgment of the meaningful
deviation from the null hypothesis.
• Example on excel.
• Effect size calculator: http://www.uccs.edu/~faculty/lbecker/
Step 5: Specifying the Desired Level of Power
• If after testing with the effect size, the H0 is still off, you need a level of
power, i.e. H0 is off by the previously established effect size.
• Power is a probability value (0-1).
• The complement of power is the probability of type II error.
• Usually set to .8 meaning that the chance of type II error (accepting H0
when it is false).
• Power is not usually set greater than 90% (.9) for two reasons:
– The probability that trivial deviations from H0 will be labeled statistically
significant.
– Puts too much demand on researcher to meet the required power when
computing the sample size needed.
Step 6:Determination of the Sample Size Needed
• Using a calculator when you know the
confidence interval, and the population size.
• http://www.surveysystem.com/sscalc.htm
• Another way to determine the sample size
using the margin of error:
• http://www.raosoft.com/samplesize.html
Hypothesis Testing Using Confidence Interval
• Used as an alternative to the critical value or the p-value.
• Provides insight into why H0 was accepted or rejected.
• Compute an interval around the sample data instead of a single value.
• The alpha level has to be present (an α of 0.05 indicates a 95% interval).
• Calculate the pinpoint of H0 , if it is out of the CI, then H0 is rejected.
Otherwise, it is accepted.
• The use of adding and subtracting the standard error to the sample
statistic is not alpha-driven and it is in the 68% percent interval. Alphadriven is in the 95% interval.
– However, it is traditional to use the standard error.
Adjusting for an inflated Type I Error
• The risk of inflated type I error is in place
when the alpha level is used more than once
in a test. That is, when H0 is tested more than
once.
What is an intuitive solution?
• Solution: Adjust the alpha level: dividing alpha
by the number of tests where alpha is used.
AKA, The Bonferroni Technique. Example on
page 196 of the text book.
Warnings
• Two types of effect size:
– In the 9-step hypothesis testing, ES is predicted to be the minimum possible
effect size prior to evaluating the study data.
– In the 7-step hypothesis testing, ES is computed from the data.
Be careful where you report yours!
• The criteria for effect size (small, medium, large) depends on the study
statistic.
– For the mean (.2, .5, .8 are small, medium, and large)
– For coefficient (.1, .3, .5 are small, medium, large)
• The six-step hypothesis testing version is simplistic but unfortunately
widely adopted.
• Use Bonferroni Technique to reduce the possibility of the Inflated Type I
Error.
Chapter 9
Statistical Inferences Concerning
Bivariate Coefficients
Statistical Inferences Concerning Bivariate
Coefficients
• Making the educated guess based on the sample data
only.
• Statistical tests applied to correlation coefficients.
• How Bonferroni technique is used in conjunction with
correlation coefficients.
• How to compare two or more correlation coefficients
to see if they are significantly different.
• How statistical tests can be applied to reliability and
validity coefficients.
Statistical Tests: Single Correlation Coefficient
• Purpose of Inferential
– Not being able to test the entire population. Infer form the
sample data.
• The Null Hypothesis
– A null correlation hypothesis is usually implied as H0: p=0.00
• Deciding If r is Statistically Significant
– By comparing the p-value associated with r against α (usually α
is set to 0.05[5%]).
– By comparing the calculated r from the sample to a table of
critical values.
Statistical Tests: Single Correlation Coefficient,
cont’d
• One-Tailed and Two-Tailed Tests on r
– Most of the time, two-tailed is assumed
– That is, testing for both negative and positive correlation
• Tests on specific kinds of correlation
– Spearman, Pearson, Phi, etc.
– If r is indicated with no type, Pearson Product Moment is assumed
Tests on Many Correlation Coefficients
•
Sometimes two or more correlations are inferentially tested in the same study.
•
Presented in various ways:
•
Tests on the Entries of a Correlation Matrix
– Correlation coefficient does not test if the variables per se are correlatied.
– Rather, it is the measurements of the variables that are correlated.
•
Tests on several correlation coefficients reported in the text.
•
The Bonferroni Adjustment Technique
–
Adjusting the level of p against α by the dividing the overall p by the number of correlations desired.
–
Holds down the chance Inflated Type I Error
•
Comparing two Correlation Coefficients Statistically
•
Use http://faculty.vassar.edu/lowry/rdiff.html, that uses Fisher
transformation.
r-to-z
Chapter 10
Inference Concerning One or Two
Means (t-tests & z-test)
Inference Concerning a Single Mean
• Single sample
• The sample mean (𝑋) is in focus for inferential
matters.
• Two approaches:
– Using confidence intervals
– Using the mean to evaluate a null hypothesis
The Inferential Purpose
• µ is made based on the known value of 𝑿.
• Interval Estimation
– Confidence Interval is built around the mean.
– CI indicates that the population mean (µ) will (probably) fall into the CI.
– The accompanying (usually 95%) means that if many samples were drawn, the
associated CIs will overlap the population mean (µ).
– http://pirate.shu.edu/~wachsmut/Teaching/MATH1101/Testing/confidencemean.html
– CI is affected by the sample size (n), the sample mean (𝑿) and the standard
deviation (s).
– In Excel, the function is: =CONFIDENCE(alpha, standard deviation of the
sample, sample size)
– Good article about the use of CONFIDENCE is here:
http://support.microsoft.com/kb/828124
The Inferential Purpose, cont’d
•
Tests Concerning a Null Hypothesis
– When H0 involves a pinpoint mean value, testing between the sample
mean and H0 takes place.
• H0: µ=a, where a is the pinpoint value chosen by the researcher.
– If the difference is significant, H0 is rejected.
– t-test and z-test are the most popular approaches for this kind of testing.
– z-test is slightly biased, but the bias is ignorable when the sample size is at
least .
– In excel, t- tests is here:
• http://www.youtube.com/watch?v=wGoMEYinf6Y
• And http://www.wellesley.edu/Psychology/Psych205/onettest.html
Inferences Concerning Two Means
• Several points to consider:
– Independent vs. correlated samples
– Testing with two means (comparing samples)
using confidence intervals.
– Testing with two means (comparing samples) by
setting up an H0.
Inferences Concerning Two Means
Independent vs. Correlated Samples
• Correlated Samples: a relationship exists
between each member of one sample and one
and only one member of the other sample.
– Test and re-test for the same group.
– Matching: a member (or more) of the first sample is
chosen for the second samples for a different test.
– Biological twins split-up.
• Independent Sample: No such relationship exists.
•
Inferences Concerning Two Means
The Inferential Purpose
• With the two different types of samples, when
they are compared in terms of their means:
– The inferences is applied to both populations from
which the samples were drawn.
– The inferences is made about the populations
NOT the samples.
Inferences Concerning Two Means
Setting Up and Testing a Null Hypothesis
• Usually, H0 is not stated. Assume it is that no difference between the
means exists. H0: µ1 - µ2 =0, unless it is otherwise indicated.
• Use t-test or F-test.
• F- test gives the probability that the variance between the two samples is
not significantly different.
• t-test is similar but with slightly different outcome.
• Excel Examples:
t-Test: Two-Sample Assuming Unequal Variances
X
1
2
3
4
5
6
7
8
Y
22
5
26
34
41
14
18
15
Data, F-test value was 0.000556
X
Mean
Variance
Observations
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
4.5
6
8
0
8
-4.15157
0.001601
1.859548
0.003202
2.306004
Y
21.875
134.125
8
F-test, again
F-test
0.160883202
IQ Group1 IQ Group 2
123
78
123
33
111
23
113
45
101
54
103
34
99
61
89
45
110
65
105
65
An F-test returns the two-tailed probability
that the variances in array1 (IQ Group1) and
array2 (IQ Group2) are not significantly
different. Use this function to determine
whether two samples have different variances.
For example, given test scores from public and
private schools, you can test whether these
schools have different levels of test score
diversity.
In the example above (0.161) indicates that the
probability that the variances in the two
groups (112 and 299) are significantly different
is high.
ANOVA (Analysis of Variance)
Anova: Single Factor
SUMMARY
Groups
X
Count
Y
ANOVA
Source of
Variation
Between
Groups
Within
Groups
Total
Most important values are F & P.
F= Between Groups MS / Within Groups MS.
Sum
8
36
8
175
SS
df
Average
Variance
4.5
6
21.875
MS
1207.563
1
1207.563
980.875
14
70.0625
2188.438
15
134.125
F
17.2355
P-value
0.000979
Here is a good video explaining ANOVA:
http://www.youtube.com/watch?v=A6j9oxAkQ3g
F crit
4.60011
ANOVA Example
Sum of
Squares
Differences between
the averages for each
level
IQ Group1 IQ Group 2
123
78
123
33
111
23
113
45
101
54
103
34
99
61
89
45
110
65
105
65
Anova: Single
Means Sum
of squares =
SS /DF
Factor
SUMMARY
Groups
IQ Group1
IQ Group 2
Count
10
10
Sum
Average Variance
1077
107.7 112.4556
503
50.3 299.3444
ANOVA
Source of Variation
Between Groups
Within Groups
Total
The variance within
each level
SS
16473.8
3706.2
df
20180
Degree of freedom: 1+1
= 2 Groups,
1+18+1= 20 subjects
1
18
MS
F
P-value
F crit
16473.8 80.00874 4.83E-08 4.413873
205.9
19
The F statistic = Means
Sum of Squares
(between) / Means Sum
of Squares (within)
Interval Estimation with Two Means
• Built around the difference in means to be
used instead of the significance of the
difference.
• If the CI does not overlap ZERO, the difference
is significant.
Multiple Dependant Variables
• Results Presented in Text
– With respect to each variable, state the difference (t-test, t(N-2), and
p-value).
– F-test, you need F(1, N-2)=F-value In addition to p-value).
– If comparing means, you may need to state the means and the
standard deviations.
• Results Presented in a Table
– Looks like the correlation matrix.
– State variables and conditions and put the means, the SDs, t-statistic
in addition to the p-value for each variable.
– The excerpt on page 242 is a good example.
Use of Bonferroni Adjustment Technique
• Usually by dividing the alpha level of (0.05) by the number of
dependent variables each with its own H0.
• When alpha is decreased to a lower level by the researcher,
the technique is called Pseudo-Bonferroni.
Effect Size Assessment and Power Analysis
• Has to deal with the issue of ‘Practical Significance’ not just
the statistical significance.
• Effect size calculator: http://www.cemcentre.org/evidencebased-education/effect-size-calculator
• Online calculator: http://www.uccs.edu/~faculty/lbecker/
• Cohen's d = M1 - M2 / spooled
where spooled = [(s 1+ s 2) / 2]
• TABLE ON PAGE 246
Post Hoc Power Analysis
• An alternatives to the estimated effect size.
• To clarify the results.
• Usually done if the difference turned out to be
insignificant.
Comments
•
Insignificance does not mean H0 is true:
•
Why?
•
•
–
If there could be more than one conflicting null hypotheses.
–
If the measurement is not reliable.
–
Not doing a statistical power before making final conclusions.
Overlapping Distributions
–
Even though the means can be significantly different, scores may overlap.
–
The standard deviation can show such case.
The Typical Use of t-Test
–
t-Test is typically used for different purposes such as evaluating H0 with one or two means.
–
It can also be used to measure the difference between two correlations.
•
Practical vs. Statistical Significance
•
Type I and Type II error.