Download Review test 3spring

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
8.5 – Goodness of Fit Test
Suppose we want to make an inference about a group of data
(instead of just one or two). Or maybe we want to test counts of
2
categorical data. Chi-square (or c ) testing allows us to make
such inferences.
There are several types of Chi-square tests but in this section we
will focus on the goodness-of-fit test. Goodness-of-fit test is
used to test how well one sample proportions of categories
“match-up” with the known population proportions stated in the
null hypothesis statement. The Chi-square goodness-of-fit test
extends inference on proportions to more than two proportions
by enabling us to determine if a particular population
distribution has changed from a specified form.
The null and alternative hypotheses do not lend themselves to
symbols, so we will define them with words.
H o : _____ is the same as _____
H a : _____ is different from _____
For each problem you will make a table with the following
headings:
Observed Expected
Counts (O) Counts (E)
O  E 
2
E
The sum of the third column is called the Chi-square test
statistic.
2  
 observed
 expected 
expected
Table D gives p-values for
c2
2
with n – 1 degrees of freedom.
Chi-square distributions have only positive values and are
skewed right. As the degrees of freedom increase it becomes
2
more normal. The total area under the c curve is 1.
The assumptions for a Chi-square goodness-of-fit test are:
2. The sample must be an SRS from the populations of
interest.
3. The population size is at least ten times the size of the
sample.
4. All expected counts must be at least 5.
To find probabilities for c distributions:
2
TI-83/84 calculator uses the command  cdf found under the
DISTR menu.
2
R-Studio command is: 1 – pchisq(test statistic, df)
Examples:
1. The Mixed-Up Nut Company advertises that their nut mix
contains (by weight) 40% cashews, 15% Brazil nuts, 20%
almonds and only 25% peanuts. The truth-in-advertising
investigators took a random sample (of size 50 lbs) of the nut
mix and found the distribution to be as follows:
Brazil
Almonds
Nuts
Peanuts
15 lb
11 lb 13 lb
11 lb
Cashews
At the 1% level of significance, is the claim made by Mixed-Up
Nuts true?
Summary on how do we decide what to do in chapter 7 and
chapter 8
Test 3 review
1-7 are 10 points each and question 8-17 are 3 points each
1) The one-sample t statistic for a test of H0: μ = 19 vs. Ha: μ
< 19 based on n = 27 observations has the test statistic
value of t = -2.58. What is the p-value for this test?
Ti83/84
tcdf(-10^99,-2.58,26)=.00794
2) Let x represent the hemoglobin count (HC) in grams per
100 milliliters of whole blood. The distribution for HC is
approximately normal with μ = 14 for healthy adult
women. Suppose that a female patient has taken 10
laboratory blood samples in the last year. The HC data
sent to her doctor is listed below. Test whether these data
cast doubt on the current belief. (use α = 0.05)
State the null and alternate hypothesis, give the p-value sketch
and give the rejection area State your conclusion based on a 5%
significance level.
TI- 84 use Ttest
I get
t= 1.428
P=.18
Sample mean =15.4
Sample sd= 3.098
N=10
df= n-1=9 significance level .05
rejection area intT(.05/2, 9)=-2.26 and 2.26
3) Based on information from a large insurance company, 67%
of all damage liability claims are made by single people under
the age of 25. A random sample of 51 claims showed that 44
were made by single people under the age of 25. Does this
indicate that the insurance claims of single people under the
age of 25 is higher than the national percent reported by the
large insurance company?
: State your null and alternative hypotheses.
Ho: p = .67, Ha: p > .67
Sketch the rejection region.
Calculate the test statistic. Plot this value in your sketch.
Ti83/84
1-ProbZTest z=2.9273
Rejection area for significance level=.05
Invnorm(.05)=1.64(change the sign since Ha >)
Note:for Ha< keep it -1.64
For Ha not equal Invnorm (.05/2)
: Determine the P-value for your test. P=.0017
: State your conclusions clearly in complete sentences.
5) A 97% confidence interval for the mean of a population is
to be constructed and must be accurate to within 0.3
units. A preliminary sample standard deviation is 1.7. The
smallest sample size n that provides the desired accuracy
is
a) 167
*b) 152 (151.20)
c) 139
d) 143
e) 138
Calculate 97% CI TI-84: =2.17
In_norm(.97+(1-.97)/2)=Inv_norm(.985)=2.17
6) Mars Inc. claims that they produce M&Ms with the following
distributions:
Brown
Orange
20%
5%
Red
Green
25%
15%
Yellow
Blue
25%
10%
A bag of M&Ms was randomly selected from the grocery store
shelf, and the color counts were:
Brown
Orange
25
13
Red
Green
23
15
Yellow
Blue
21
14
Using the χ2 goodness of fit test (α = 0.05) to determine if the
proportion of M&Ms is what is claimed. Select the [test statistic,
p-value, Decision to Reject (RH0) or Failure to Reject (FRH0)].
a) [χ2 = 6.865, p-value = 0.983, RH0]
b) [χ2 = 13.730, p-value = 0.009, RH0]
*c) [χ2 = 13.730, p-value = 0.017, RH0]
d) [χ2 = 13.730, p-value = 0.017, FRH0]
e) [χ2 = 6.865, p-value = 0.983, FRH0]
7) True or false
a) In a hypothesis test, if the computed P-value is less
than 0.001, there is very strong evidence to
reject the null hypothesis.
b) In a hypothesis test, if the computed P-value is
greater than a specified level of significance,
then we fail to reject the null hypothesis.
c) What will reduce the width of a confidence interval?
Decrease variance.
8) A simple random sample of 49 8th graders at a large
suburban middle school indicated that 84% of them are
involved with some type of after school activity. Find the 98%
confidence interval that estimates the proportion of them
that are involved in an after school activity.
Ans:[0.718, 0.962]
9)
Television viewers often express doubts about the
validity of certain commercials. In an attempt to
answer their critics, a large advertiser wants to
estimate the true proportion of consumers who
believe what is shown in commercials. Preliminary
studies indicate that about 40% of those surveyed
believe what is shown in commercials. What is the
minimum number of consumers that should be
sampled by the advertiser to be 99% confident that
their estimate will fall within 2% of the true
population proportion?
Ans: 3982
10) The gas mileage for a certain model of car is known to
have a standard deviation of 6 mi/gallon. A simple random
sample of 49 cars of this model is chosen and found to have a
mean gas mileage of 28.4 mi/gallon. Construct a 97%
confidence interval for the mean gas mileage for this car
model.
Ans [26.540, 30.260]
(TI84_Stat ZInterval number 7)
10)
An SRS of 28 students at UH gave an average height
of 5.9 feet and a standard deviation of .1 feet. Construct a
90% confidence interval for the mean height of students at
UH.
Ans:[5.868, 5.932]
TI-84:TI interval