Download Math 1040 Class Skittles Proportions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Nathan Schafer
Math 1040
E-porfolio post
4-27-2015
During this semester we conducted a project based on skittles. Each student purchased a small
bag of skittles. We then counted each candy by color and compiled the data as a class. Based on these
numbers we used this data to learn different concepts during the semester. Part 1 of the project was
collecting the data.
Part 2 (Individual)
1. Write a paragraph discussing your findings about the variable “Total candies in each bag”.
Address the following in your writing: What is the shape of the distribution? Do the graphs
reflect what you expected to see? Does the overall data collected by the whole class agree with
your own data from a single bag of candies? Include the number of candies from your own bag
and the total number of bags in the class sample in your discussion.
The total candies in each bag surprised me. When I looked at the graph’s I didn’t expect
to see it skewed left. I expected to see a more systematic distribution. The overall class data is
pretty close to my individual bag. My bag contained 63 candies while the median for the class
was 60, and the Mean was 59.
2. In a half page, explain the difference between categorical and quantitative data. Address the
following in your writing: What types of graphs make sense and what types of graphs do not
make sense for categorical data? For quantitative data? Explain why. What types of
calculations make sense and what types of calculations do not make sense for categorical
data? For quantitative data? Explain why.
Categorical data is something that can be observed but not measured, for example the colors on
a painting. While quantitative data is a data set of numbers that can be counted, for example the
height of our class or their weight. Bar graphs and pie charts are the best graphs to use for
Categorical date, because the show how large a category is in comparison to the whole or
population of observation. Histograms, stem plots, and box plots are great for quantitate data
because it will give a frequency of the data report.
Part 2 (Group)
Math 1040 Class Skittles Proportions
Color
Count
Proportion of Total
Red
Skittles
Orange
Skittles
Green
Skittles
Purple
Skittles
Yellow
Skittles
Total
Number of
Skittles in
the class
564
0.199
564
0.199
566
0.199
559
0.197
586
0.206
2839
1.000
Does the Class data represent a random sample?
Yes, the class data does represent a random sample. Although each student was asked to buy their own
bag of skittles and not every bag of skittles in the region had an equal chance of being selected, the
distribution of skittles from the central plant/warehouse was most likely random. The skittles company
most likely does not count colors as they load the bags and simply loads by weight, and assuming
students did not make any biased decisions about which bag to grab off the shelf every bag produced
had an equal chance of being shipped to any location in the country and being selected at random by a
student in the class.
What would the population be?
In this study, the sample is the class data. Since not everyone in the class is currently living in the
same state, the population would be all 2.17 ounce skittles bags in the United States. There are
currently different manufacturing plants operating overseas, therefore the population can only
reasonably be expanded to include the United States distribution circuit.
Part 3 (individual)
These charts compare the total count of skittles for the entire class against the total of my
individual bag. In my initial observation of my bag I expected to see more uniformity among the different
colors. I thought the colors might be off by one or two. I didn’t expect to see a five candy difference. After
seeing the totals for the class, I was completely shocked. I expected a wide variation in the class totals
because of what I saw in the individual bag. The distribution for the class seems closer than the individual
bag, which is not what I expected at all.
Part 3 (Group)
1. Using the total number of candies in each bag in our class sample, compute the
following measures for the variable “Total candies in each b
(a) mean number of candies per bag
The mean number of candies per bag is 59.1 candies.
(b)standard deviation of the number of candies per bag The standard deviation per bag is
6.4 candies.
(c)5-number summary for the number of candies per bag The 5-number summary is 3458-60-62-71.
Part 4 (Individual)
The purpose of using a confidence interval is to get an interval of numbers in which the mean of an
experiment will end up after many repeated experiments. For example if we take 100 responses from
250,000 people we will get a mean. If we take another 100 responses we will get a slightly different
mean, and so on for each experiment. We use a confidence interval to estimate the range of the
population mean. The confidence interval is an estimate of the population parameter.
Part 4 (Group)
99% Confidence Interval estimate for the population proportion of yellow candies
X= 586
n= 2839
Z-value for 95% CI = 2.576
p= 586/2839 = 0.206
0.206 +/- 2.576 * (0.007596)
0.206 +/- 0.01957
99% Confidence Interval Estimate: (0.186, 0.226)
Confidence Intervals estimated from a population proportion are used to determine, with the
specified degree of confidence, the proportion of a characteristic found within a population. In
relation to the skittles, we are 99% confident that the proportion of yellow skittles in any bag of
skittles falls between 0.186 and 0.226.
95% Confidence Interval estimate for the population mean number of skittles per bag
n= 49
Sx = 6.38
Sample mean= 59.15
Standard error of the mean = 0.9114
To find the t-value, a t-table was consulted using a degree of freedom of 50. The t-value is 2.009.
59.15 +/– t*(0.9114)
59.15 + 1.83 = 60.98
59.15- 1.83 = 57.32
95% Confidence Interval Estimate: (57.32, 60.98)
Confidence Interval estimates of the population mean use sample date to extrapolate an interval with
the specified degree of confidence that the mean characteristic of a population should fall within. In
this case, we are 95% confident that the mean number of skittles in any bag is between 57.32 and
60.98.
98% confidence interval estimate for the population standard deviation of the number of
candies per bag
n=49
s=6.37
8
S2=40.679
χ2 1-a/2 = 0.99
χ2 a/2 = 0.01
On the Chi square distribution chart, 50 degrees of freedom was used. The value for χ2 1-a/2 was
29.707. For χ2 a/2 it was 76.154.
√[ s2(df)/Chi value]
Lower bound: 5.06
Upper bound: 8.11
Confidence Interval estimates from the population standard deviation use the sample standard
deviation in order to generate an interval that the population standard deviation of the number
of candies should fall within, with the specified level of confidence. In this case, we are 98%
confident that the population standard deviation is within 5.06 and 8.11 candies. The problem
with confidence interval estimates taken from the sample standard deviation is that the sample
standard deviation may be quite different from the actual population standard deviation.
Reflection
I never really understood how vital statistics were to the world. It was great to learn all of the
different ways statistics are used in everyday life and science to help better our understanding of things.
During the semester we did a project with each person buying a small bag of skittles. We then collected
all the individual students’ candy counts by color and compiled the data into a chart. We then used this
data for learning different types of statistical analysis. We made charts, graphs, and confidence intervals.
This project really changed the way I think about information put out by market research firms
and other groups that use statistical data. I learned information can be misrepresented by simply using
the wrong medium to present it, for example using charts that represent size growth rather than a
larger picture to show growth. Pictures will make it seem like larger growth has occurred than the data
actually shows.
One of the other things I enjoyed seeing was how starting the Y-axis of a chart is important for
true representation of data. You can make the data show large or small differences if you choose the
right starting point for the Y-axis of your chart. Once I learned this information it was amazing to see it in
use in magazine ads for major companies trying to show why they are better at something than others,
when they are all very similar. Company A just shows the data in a different way to make you think
company A is so much better.
This class has really changed the way I think about data. Not only the way it is represented but
also how it is collected. There is so much room for bias in data collection and representation that you
really have to pay attention or you can be very mislead in how you see the information being presented
to you. I truly never knew how misleading data can be until I learned some of the concepts in this class.
This new information will help me analyze things differently in the future, not just in math but everyday
life.