Download Hypothesis Testing and p-value portion of the presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics Primer
ORC Staff:
Jayme Palka
Peter Boedeker
Marcus Fagan
Trey Dejong
1
Quick Overview of Statistics
2
Descriptive vs. Inferential Statistics

Descriptive Statistics: summarize and describe data (central tendency,
variability, skewness)

Inferential Statistics: procedure for making inferences about
population parameters using sample statistics
Sample
Population
3
Measures of Central Tendency

Mode: the most frequently occurring value in a distribution



Select the value(s) with the highest frequency
Median: the value representing the middle point of a distribution

Order data

Determine the median position = (n + 1) / 2

Locate the median based on step 2
Mean: the arithmetic average of a distribution

Σ𝑥
𝑛
Sum all the data values and divide by the number of values
4
Measures of Variability

Range: difference between the largest and smallest values in the data
5
(𝑋𝐻 − 𝑋𝐿 )

Mean deviation: measure of the average absolute deviations from the mean – uncommonly used
|Σ 𝑥 − 𝑥 |
𝑛

These measures are not very descriptive of a distribution’s variability, need better measures…
5
Measures of Variability Cont.

Sum of squares: sum of the squared deviation scores, used to compute
variance and standard deviation
𝑆𝑆 = Σ 𝑥 − 𝑥

Variance: the average squared deviations from the mean
𝑠2

2
Σ 𝑥 − 𝑥
=
𝑛 −1
2
Standard deviation: square root of the variance - commonly used
𝑠=
Σ 𝑥 − 𝑥
𝑛 −1
2
6
Variance and Sum of Squares
SS   x  x 
2
Student
𝑥
Girl #1
90
Girl #2
23
Girl #3
26
Boy #1
83
Boy #2
48
Boy #3
24
Average =
(𝑥 − 𝑥)
(𝑥 − 𝑥)2
x  x 


2
S
2
n 1
 x  x 
2
Sum =
S
n 1
7
Empirical Rule

The empirical rule states that symmetric or normal distribution with
population mean μ and standard deviation σ have the following
properties.
8
Sampling Distribution

Theoretical distribution of sample statistics (e.g., the mean,
standard deviation, Pearson’s r), as opposed to individual scores

NOT the same thing as a sample distribution or a population
distribution

Used to help generalize the findings of our sample statistics
back to our populations

Tough to understand, concrete example on next slide
9
Sampling Distribution
All possible outcomes are shown below in Table 1.
Table 1. All possible outcomes when two balls are sampled with replacement.
Outcome Ball 1
Ball 2
Mean
1
1
1
1.0
2
1
2
1.5
3
1
3
2.0
4
2
1
1.5
5
2
2
2.0
6
2
3
2.5
7
3
1
2.0
8
3
2
2.5
9
3
3
3.0
10
Sampling Error
As has been stated before, inferential statistics involve using a
representative sample to make judgments about a population. Lets say
that we wanted to determine the nature of the relationship between
county and achievement scores among Texas students. We could select a
representative sample of say 10,000 students to conduct our study. If we
find that there is a statistically significant relationship in the sample we
could then generalize this to the entire population.
However, even the most representative sample is not going to be
exactly the same as its population. Given this, there is always a chance
that the things we find in a sample are anomalies and do not occur in
the population that the sample represents. This error is referred as
sampling error.
11
Sampling Error
A formal definition of sampling error is as follows:
Sampling error occurs when random chance produces a sample
statistic that is not equal to the population parameter it
represents.
Due to sampling error there is always a chance that we are making
a mistake when rejecting or failing to reject our null hypothesis.
Remember that inferential procedures are
used to determine which of the statistical
hypotheses is true. This is done by
rejecting or failing to reject the null
hypothesis at the end of a procedure.
12
Sampling Distribution and Standard Error (SE)

https://www.youtube.com/watch?v=hvIDuEmWt2k
13
Hypothesis Testing

Null Hypothesis Significance Testing (NHST)

Testing p-values using statistical significance tests
(image from cnx.org)

Effect Size

Measure magnitude of the effect (e.g., Cohen’s d)
14
Null Hypothesis Significance Testing

Statistical significance testing answers the following question:


Assuming the sample data came from a population in which the null hypothesis is
exactly true, what is the probability of obtaining the sample statistic one got for
one’s sample data with the given sample size? (Thompson, 1994)
Alternatively:

Statistical significance testing is used to examine a statement about a relationship
between two variables.
15
Hypothetical Example

Is there a difference between the reading abilities of boys and girls?

Null Hypothesis (H0): There is not a difference between the reading abilities
of boys and girls.

Alternative Hypothesis (H1): There is a difference between the reading
abilities of boys and girls.

Alternative hypotheses may be non-directional (above) or directional (e.g., boys
have a higher reading ability than girls).
16
Testing the Hypothesis

Use a sampling distribution to calculate the probability of a statistical
outcome.

pcalc = likelihood of the sample’s result

pcalc < pcritical: reject H0

pcalc ≥ pcritical: fail to reject H0
17
Level of Significance (pcrit)

Alpha level (α) determines:

The probability at which you reject the null hypothesis

The probability of making a Type I error (typically .05 or .01)
True Outcome in Population
Observed
Outcome
Reject H0
Reject H0 is true
H0 is false
Type I error (α)
Correct Decision
Fail to reject H0 Correct Decision
Type II error (β)
18
Example: Independent t-test

Research Question: Is there a difference between the reading abilities
of boys and girls?

Hypotheses:

H0: There is not a difference between the reading abilities of boys
and girls.

H1: There is a difference between the reading abilities of boys and
girls.
19
Dataset

Reading test scores (out of 100)
Boys
Girls
88
88
82
90
70
95
92
81
80
93
71
86
73
79
80
93
85
89
86
87
20
Significance Level

α = .05, two-tailed test

df = n1 + n2 – 2
= 10 + 10 – 2 = 18

Use t-table to determine tcrit

tcrit = ±2.101
21
Decision Rules

If tcalc > tcrit, then pcalc < pcrit


Reject H0
If tcalc ≤ tcrit, then pcalc ≥ pcrit

Fail to reject H0
p = .025
p = .025
-2.101
2.101
22
Computations
Boys
Girls
Frequency (N)
10
10
Sum (Σ)
807
881
Mean (𝑋)
80.70
88.10
Variance (S2)
55.34
26.54
Standard Deviation (S)
7.44
5.15
23
Computations cont.

Pooled variance
= 40.944

Standard Error
= 2.862
24
Computations cont.


Compute tcalc
𝑋1 − 𝑋2
𝑡=
𝑆𝐸𝑋1 −𝑋2
= -2.586
Decision: Reject H0. Girls scored statistically significantly higher on
the reading test than boys did.
25
Confidence Intervals

Sample means provide a point estimate of our population means. Due
to sampling error, our sample estimates may not perfectly represent
our populations of interest. It would be useful to have an interval
estimate of our population means so we know a plausible range of
values that our population means may fall within.

95% confidence intervals do this.

Can help reinforce the results of the significance test.
CI95 = 𝑥 ± tcrit (SE)
= -7.4 ± 2.101(2.862) = [-13.412, -1.387]
26
Statistical Significance vs. Importance of
Effect


Does finding that p < .05 mean the finding is relevant to the real
world?

Not necessarily…

https://www.youtube.com/watch?v=5OL1RqHrZQ8
Effect size provides a measure of the magnitude of an effect


Practical significance
Cohen’s d, η2, and R2 are all types of effect sizes
27
Cohen’s d

Equation:
= -1.16


Guidelines:

d = .2 = small

d = .5 = moderate

d = .8 = large
Not only is our effect statistically significant, but the effect size is large.
28