Download IQL Chapter 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
IQL Chapter 10 – tTests, Two – Way Tables,
and ANOVA
Statistical Reasoning for everyday life, Bennett, Briggs, Triola, 3rd Edition
10.1 t Distribution for Inferences about a Mean
LEARNING GOAL
Understand when it is appropriate to use the Student t distribution rather than the normal distribution for
constructing confidence intervals or conducting hypothesis tests for population means, and know how to make
proper use of the t distribution.
t-DISTRIBUTION FOR INFERENCES ABOUT A MEAN
Much of the work in preceding sections assumed the sampling distribution is normal, but a review of articles in
professional journals shows that professional statisticians rarely use the normal distribution for confidence
intervals and hypothesis tests in real applications.
A major reason for this is that the normal distribution requires that we know the population standard deviation
σ. Because we generally do not know σ, we must estimate it with the sample standard deviation s. Statisticians
therefore prefer an approach that does not require knowing σ.
Such is the case with the Student t distribution, or t distribution for short, which can be used when we do not
know the population standard deviation and either the sample size is greater than 30 or the population has a
normal distribution.
We estimate the margin of error, E, to be:
E≈2s/√n
We use this formula for the standard score of the sample mean:
z = x-µ/σ/√n
Inferences about a Population Mean: Choosing between t and Normal Distributions
t distribution:
or
Normal distribution:
or
IQL Chapter 10
Population standard deviation is not known and the population is normally distributed.
Population standard deviation is not known and the sample size is greater than 30.
Population standard deviation is known and the population is normally distributed.
Population standard deviation is known and the sample size is greater than 30.
Page 1
CONFIDENCE INTERVALS USING THE t DISTRIBUTION
To specify a confidence interval, we must first calculate the margin of error, E.
With a t distribution, the formula is
s
n
Confidence Interval for a Population Mean (μ) with the t Distribution
If conditions require use of the t distribution (σ not known and n > 30 or population normally
distributed), the confidence interval for the true value of the population mean (μ) extends from the
sample mean minus the margin of error to the sample mean plus the margin of error. That is, the
confidence interval for the population mean is
x
where the margin of error is
s
n
and we find t from Table 10.1.
IQL Chapter 10
Page 2
Hypothesis Tests Using the t Distribution
When the t distribution is used for a hypothesis test of a claim about a population mean (H0: μ = claimed value),
the t value plays the role that the standard score z played when we studied these hypothesis tests in Section 9.2.
With the t distribution, instead of calculating the standard score z, we use the following formula to calculate t:
t
x-μ
s/ n
where n is the sample size, is the sample mean, s is the sample standard deviation, and μ is the population mean
claimed by the null hypothesis.
10.2 Hypothesis Testing with the Two – Way Tables
LEARNING GOAL
Interpret and carry out hypothesis tests for independence of variables with data organized in two-way tables
IDENTIFYING THE HYPOTHESES WITH TWO VARIABLES
Suppose that administrators at a college are concerned that there may be bias in the way degrees are awarded
to men and women in different departments. They therefore collect data on the number of degrees awarded to
men and women in different departments.
These data concern two variables: major and gender.
To test whether there is bias in the awarding of degrees, the administrators ask the following question:
Do the data suggest a relationship between the two variables?
Null and Alternative Hypotheses with Two Variables
The null hypothesis, H0, states that the variables are independent (there is no relationship between them).
The alternative hypothesis, Ha, states that there is a relationship between the two variables.
DISPLAYING THE DATA IN TWO-WAY TABLES
With the hypotheses identified, the next step in the hypothesis test is to examine the data set to see if it
supports rejecting or not rejecting the null hypothesis.
We can display the data efficiently with a two-way table (also called a contingency table), so named because it
displays two variables.
IQL Chapter 10
Page 3
Two-Way Tables
A two-way table shows the relationship between two variables by listing one variable in the rows and the other
variable in the columns.
The entries in the table’s cells are called frequencies (or counts).
CARRYING OUT THE HYPOTHESIS TEST
The basic idea of the hypothesis test is the same as always—to decide whether the data provide enough
evidence to reject the null hypothesis.
For the case of a test with a two-way table, the specific steps are as follows:
As always, we start by assuming that the null hypothesis is true, meaning there is no relationship between the
two variables. In that case, we would expect the frequencies (the numbers in the individual cells) in the two-way
table to be those that would occur by pure chance.
Our first step, then, is to find a way to calculate the frequencies we would expect by chance.
We next compare the frequencies expected by chance to the observed frequencies from the sample, which are
the frequencies displayed in the table.
We do this by calculating something called the chi-square statistic (pronounced “ky-square”) for the
sample data, which here plays a role similar to the role of the standard score z in the hypothesis tests we carried
out in Chapter 9 or the role of the t test statistic in Section 10.1.
Recall that for the hypothesis tests in Chapter 9, we made the decision about whether to reject or not reject the
null hypothesis by comparing the computed value of the standard score for the sample data to critical values
given in tables; similarly, in Section 10.1 we compared computed values of the t test statistic to values found in a
table.
Here, we do the same thing, except rather than using critical values for the standard score or t, we use
critical values for the chi-square statistic.
Definition
The expected frequencies in a two-way table are the frequencies we would expect by chance if there were no
relationship between the row and column variables
Computing the Chi – Square Statistic
Finding the Frequencies Expected by Chance
Computing the Chi-Square Statistic
IQL Chapter 10
Page 4
Finding the Chi-Square Statistic
Step 1. For each cell in the two-way table, identify O as the observed frequency and E as the expected
frequency if the null hypothesis is true (no relationship between the variables).
Step 2. Compute the value (O - E)2/E for each cell.
Step 3. Sum the values from step 2 to get the chi-square statistic:
The larger the value of c2, the greater the average difference between the observed and expected
frequencies in the cells.
10.3 Analysis of Variance (One – Way ANOVA)
LEARNING GOAL
Interpret and carry out hypothesis tests using the method of one-way analysis of variance.
HYPOTHESIS TESTING FOR VARIANCE
We follow the same general principles laid out for hypothesis testing in Section 9.1.
To begin with, we identify the null hypothesis; the mean Flesch scores for all three books are equal. The
alternative hypothesis, then, is that the three population means are different.
The hypothesis test must tell us whether to reject or not reject the null hypothesis.
Rejecting the null hypothesis would allow us to conclude that the books really do have different mean Flesch
scores, as we expect.
Not rejecting the null hypothesis would tell us that the data do not provide sufficient evidence for concluding
that the mean Flesch scores are different.
We write the null hypothesis as
H0: μClancy = μRowling = μTolstoy
We need a hypothesis test that will allow us to determine whether three different populations have the same
mean.
The method we use is called analysis of variance, commonly abbreviated ANOVA.
The name comes from the formal statistic known as the variance of a set of sample values; as we noted briefly in
Section 4.3, variance is defined as the square of the sample standard deviation, or s2.
IQL Chapter 10
Page 5
Definition
Analysis of variance (ANOVA) is a method of testing the equality of three or more population means
by analyzing sample variances.
One-Way ANOVA for Testing H0: μ1 = μ2 = μ3 = . . .
Step 1. Enter sample data into a statistical software package, and use the software to determine the test
statistic (F = variance between samples / variance within samples) and the P-value of the test
statistic.
Step 2. Make a decision to reject or not reject the null hypothesis based on the P-value of the test
statistic:
• If the P-value is less than or equal to the significance level, reject the null hypothesis of equal
means and conclude that at least one of the means is different from the others.
• If the P-value is greater than the significance level, do not reject the null hypothesis of
equal means.
This method is valid as long as the following requirements are met: The populations have distributions that are
approximately normal with the same variance, and the samples from each population are simple random
samples that are independent of each other.
IQL Chapter 10
Page 6