Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Name
Period
Date
Number
Chi Square Test
m&m Statistical Analysis
Have you ever wondered why the package of m&m’s you just bought never seems to have enough of your
favorite color? Or, why is it that you always seem to get the package of mostly brown m&m’s? What’s going on at the
Mars Company? Is the number of the different colors of m&m’s in a package really different from one package to the
next, or does the Mars Company do something to insure that each package gets the correct number of each color of
Do not fear, today you will learn a statistical test that allows you to answer these important questions. This test is
the Chi Square statistical test. This test allows you to calculate the difference between observed and expected data.
If the difference between observed (collected) data and expected data is large a large chi square value results, while
a small difference results in a small chi square value. The formula for Chi Square is
Χ2 = Σ (O – E)2
E
O represents observed data, E represents expected data and Σ represents the summation of multiple data sets.
The Mars Company Web Page states that each package of M&M’s should have specific percentages of
each color M&M which is representative of the amount of each color the factory puts out.
m&m’s
Color:
Brown – 13%
Green– 16%
Orange– 20%
Blue– 24%
Red– 13%
Yellow – 14%
Peanut m&m’s
Color
Brown – 20%
Green – 10%
Orange – 10%
Blue– 20%
Red– 20%
Yellow– 20%
One way that we could determine if the Mars Co. is true to its word is to sample a package of m&m’s and do a chi
squared test on the data. This will allow us to determine if any differences between our observed (counted) data and
our expected data are simply due to chance sample error or some other reason. We will begin with the null
hypothesis, which states that any deviation between observed and expected data is due to the random deviation of
data alone. We will accept this null hypothesis as long as the deviation is not too great. This lab will allow you to
learn how to run a chi square analysis test, and when to accept or reject the null hypothesis.
What would the null hypothesis be with regard to m&m’s?
To test this hypothesis we will need to collect data and calculate the Χ 2 statistic.
Procedure:
1. Wash your hands, as you will be handling food that you may want to munch on later.
2. Separate the m&m’s on your team’s plate by color and count the number of each color of m&m you have.
3. Record your observed counts under each category on the master candy list on the chalkboard, and in your
individual data table.
4. Sum the observed counts for each color category.
5. Determine the total number of m&m’s in your group’s data set.
6. Calculate the expected numbers of m&m’s in each color category using the information published by the Mars
Company.
7. Record the total expected counts in the individual data table.
8. Use this data to determine the probability that the difference between observed and expected values occurred by
chance. To do this, statisticians have constructed tables of chi-square values with their associated probabilities.
Because probabilities are also based upon “degrees of freedom.” This is always one less than the number of
phenotypic classes in any experiment. Statisticians have generally agreed upon a probability value of 0.05 as the
lowest acceptable value derived from the Χ2 test. This number indicates that if the experiment were repeated
many times, the deviations from expected due to chance alone would be as large or larger than those observed
about 5% of the time. Probabilities equal to or greater than 0.05 are considered to support the hypothesis while
probabilities lower than 0.05 do not support the hypothesis. If the value is unacceptable, either the experimental
design, including the method of data collection or the hypothesis, or both need to be reexamined.
For the Chi square tests you will conduct use the following distribution table
CHI-SQUARE DISTRIBUTION TABLE
Accept Hypothesis
Reject Hypothesis
Probability (p)
Degrees
of
Freedom
1
2
3
4
5
6
7
8
9
10
0.95
0.004
0.10
0.35
0.71
1.14
1.63
2.17
2.73
3.32
3.94
0.90 0.80 0.70 0.50
0.02
0.21
0.58
1.06
1.61
2.20
2.83
3.49
4.17
4.86
0.06
0.45
1.01
1.65
2.34
3.07
3.82
4.59
5.38
6.18
0.15
0.71
1.42
2.20
3.00
3.83
4.67
5.53
6.39
7.27
0.30
0.20
0.46 1.07
1.64
1.39 2.41
3.22
2.37 3.66
4.64
3.36 4.88
5.99
4.35 6.06
7.29
5.35 7.23
8.56
6.35 8.38
9.80
7.34 9.52 11.03
8.34 10.66 12.24
9.34 11.78 13.44
0.10
0.05
0.01
0.001
2.71
4.60
6.25
7.78
9.24
10.64
12.02
13.36
14.68
15.99
3.84
5.99
7.82
9.49
11.07
12.59
14.07
15.51
16.92
18.31
6.64
9.21
11.34
13.38
15.09
16.81
18.48
20.09
21.67
23.21
10.83
13.82
16.27
18.47
20.52
22.46
24.32
26.12
27.88
29.59
Individual Data Table
Colors
Observed (O)
Expected (E)
O-E
(O – E)2
(O – E)2/E
Brown
Green
Orange
Blue
Red
Yellow
2 =
Totals
1. This sentence represents a summary of your results.
With
degrees of freedom I have a 2 value of
and
, I therefore
the null hypothesis.
which has a p value between
2. Use this data to explain possible reasons for accepting/rejecting the null hypothesis. Recall in your explanation that
you only have a single bag of m&m’s which is essentially a random sample of what is being pumped out of the m & m
factory.
On the Smartboard, consolidate your individual data into data for the whole class, and
conduct another chi square test.
Class Data Table
Colors
Observed (O)
Expected (E)
O-E
(O – E)2
(O – E)2/E
Brown
Green
Orange
Blue
Red
Yellow
2 =
Totals
Use this data to answer the following questions
3. This sentence represents a summary of your results.
With
degrees of freedom I have a 2 value of
and
, I therefore
the null hypothesis.
which has a p value between
4. After completing two chi squared tests, do you believe the individual bags of m & m’s have an accurate representation
of colors as purported by the Mars Co. and what they are producing in their factory? Explain how your chi squared
data might help support your claim.
Another group of high school students accumulated the following data for peanut m&m’s.
Use the column for peanut m&m’s on the front page.
Class Data Table for Peanut m&m’s
Colors
Observed (O)
Brown
1520
Green
774
Orange
815
Blue
1620
Red
1603
Yellow
1550
Expected (E)
O-E
(O – E)2
(O – E)2/E
Χ2 =
Totals
5. This sentence represents a summary of your results.
With
degrees of freedom I have a 2 value of
and
, I therefore
the null hypothesis.
which has a p value between
```