Download Abby White Into to Statistics Math 1040 Skittles Term Project This

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Abby White
Into to Statistics
Math 1040 Skittles Term Project
This project is a way for the math 1040 students to use all of the concepts learned in class
and apply them to practical data to practice their skills. Each student had their own personal bag
of skittles in which they sorted into colors and counted the number of each skittle in the color
categories Red, Green, Orange, purple, and Yellow. The data from both statistics classes were
then compiled into a spread sheet. The students then used this data to determine the proportion of
each color within the sample. This was achieved by making a Pie Chart and a Pareto Chart for
the number of candies in each color. The students will then observe the data and reflect on the
comparisons of the two graphs. Students will then organize and display quantitative data.
Students will calculate the mean, standard deviation, and 5-number summary. Then using the
results of their calculations the students will create a frequency histogram and a boxplot. Next,
they will determine the shape or the data. The next part of the project is about confidence
intervals and hypothesis tests. First students with explain the general purpose and meaning of a
confidence interval and then solve three problems. Dealing with proportion, mean, and standard
deviation. The second part of this section students will explain the purpose and meaning of
hypothesis tests, then solve two different hypothesis test problems in which they will show their
work and interpret the results. The last part of this section is the reflection, students will draw
conclusions of the interval estimates and hypothesis tests for population proportions and discuss
whether the samples meet the requirements. Also, students will reflect on possible errors that
potentially could be made by using this data and improvements they see fit.
Title:
Title:
Proportion of each color of skittles in the sample
The number of skittles in each color
category
Key: 2:Green 3:Yellow 4:Purple 5:Orange
6:Red
700
648
629
612
607
600
18%
20%
21%
21%
20%
Green
500
Yellow
400
Purple
300
Orange
200
Red
548
100
0
0
1
2
3
4
5
6
Comparing the two graphs they reflect what I excepted to see, for the most part the
proportion of the different colors of skittles are very close to one another. The Pie chart does a
good job of displaying that they are very close to each other and also displays the percentage
which is very helpful. The Pareto chart is very easy to see what color of skittles has the most and
what has the least as well as comparing all the colors to each other. The distribution of the
skittles is what I would expect to see in a single bag of candies. In my personal bag of skittles
the proportions of each color are quite similar to the sample proportions of each color of skittles.
For example, the orange skittle proportion of the sample is .199 and the proportion of orange
skittles in my individual bag is .175 fairly similar. In comparing some of the proportions to my
own individual bag I found a significant difference to the sample proportion though. For
example, the sample proportion of green skittles is .213 and the proportion from my bad is .143,
which is a difference of 7% which is not a huge difference, but I expected them to be closer.
Number of Red
Number of
Number of
Number of
Number of
Candies
Orange Candies
Yellow Candies
Green Candies
Purple Candies
9
11
17
9
17
This is the data from my personal bag of skittles.
The proportion of skittles from the sample is:
Red=0.180
Orange=0.199
Yellow=0.207
Green=0.213
Purple=0.201
The proportion of skittles by color from my personal bag is:
Red=0.143
Orange=0.175
Yellow=0.270
Green=0.143
Purple=0.270
Organizing and Displaying Quantitative Data
Title:
Title:
Frequency of skittles per bag of candy
Distribution of the number of skittles per bag
The total number of bags in the sample was 50. The data has an outlier which is 86
skittles in one bag of candy, which skews the data slightly to the right. The graphs reflect what I
was expecting, which was a high number of bags of skittles with 60-65 candies and a few bags of
skittles with slightly higher and lower amounts. However the outlier is much higher than I
expected it is nearly double the amount of skittles than the bag with the lowest amount which
was 44. In my personal bag of skittles I had 63 candies which fits in to the 60-65 candies per bag
range.
For the sample of data:
Mean= 60.88
Standard Deviation= 5.509
The 5-number Summary:
Q1= 60
Median=61
Q3=63
Maximum=86
Reflection
Categorical data consists of names or labels that are not representing counts or
measurements. On the other hand, quantitative data consists of numbers representing counts and
measurements. The types of graphs that make since for categorical data are Pie charts and Pareto
Charts. Pie charts are good for categorical data because you can clearly distinguish and compare
the different categories. Pareto charts allow one axis to represent categorical data and the other to
represent frequency data. The types of graphs that makes since for quantitative data are
histograms and boxplots. Histograms are for numerical data because both axes of the graph are
numbers. Boxplots are also only dealing with a number line. The types of calculations that make
since for categorical data are finding proportions. The types of data that makes since for
quantitative data consist of finding the mean, standard deviation, and 5-Number summary. This
is because you can quantitative data consists of numbers you can count where as you cannot
count categorical data.
Confidence Interval Estimates
Confidence intervals are a range of values with a specified probability that the parameter is
within that range. Confidence intervals are used in statistics to estimate the parameter, the range
of value gives a margin of error, because we cannot be 100% certain. Also, how confident you
want to be affects the error. For example, if you want to be 99% confident the range of number
will be larger than if you only want to be 95% confident.
For the first confidence interval constructed at a 95% confidence interval estimate for the
true proportion of purple candies I found that 95% of all bags of Skittles will have between
18.68%-21.52% of purple candies. If I wanted to be even more certain of the proportion of
purple candies in each bag of Skittles I could construct a 99% confidence interval and that would
give me a wider range of proportion of purple Skittles per bag. This is shown in the next problem
when I estimated the true mean number of candies per bag. Since we cannot be 100% sure what
the true mean number of candies are in a bag of Skittles because we only have a sample mean
not a population mean. However, based on our sample we can be 99% sure that the true mean
number of Skittles per bag is between 58.79-62.97. In the last confidence interval question we
constructed a 98% confidence interval estimate for the standard deviation of the number of
candies per bag. This means that we are 98% confident that the number of candies per bag varies
by 4.419-7.075 candies.
Hypothesis Tests
A hypothesis test determines if the hypothesis, a claim or statistic about a population, is false or
could be valid. You can never prove a hypothesis, you simply fail to reject it. There could always
be lurking variables.
For the first problem I used a 0.01 significance level to test the claim that 20% of all
Skittles candies are green. After completing the problem the test statistic was not in the rejection
region thus I failed to reject the null hypothesis that claimed that 20% of all Skittles candies are
green. This does not mean that all Skittles candies are green there could be lurking variables,
Hypothesis test never prove anything.
The second problem I used a 0.05 significance level to test the claim that the mean
number of candies in a bag of Skittles is 56. After doing the calculations my test statistic was
within the rejection region. I rejected the null hypothesis that claims that the mean number of
Skittles in a bag is 56. The mean number of Skittles per bag in our class sample was 61 candies. I
think that if we tested the claim that the mean number of candies per bag was 61 then we would
fail to reject the claim.
Reflection
For p
For mean
For Standard Deviation
The sample must be a
The sample must be a
The sample must be a
simple random sample
simple random sample
simple random sample
The conditions for a
The population is normally
The population must have
binomial distribution must
distributed
normal distributed values
be met: fixed number of
Or n>30
(even in large samples).
trials, trials are independent,
there are two categories of
outcomes, and the
probabilities remain
constant for each trial.
There are at least 5
successes and 5 failures
I think our sample of Skittles is a simple random sample, because each bag was chosen by
chance from a larger population and each bag has the same probability of being chosen.
However, because all the students that bought a bag of Skittles lives in the same general area the
skittles in the greater Salt Lake area have a much greater probability of being chosen than a bag
of skittles in a different state of even a different part of Utah. So in that since this is not a simple
random sample of all bags of skittles, just a simple random sample of bags of skittles in the
greater Salt Lake area.
The conditions for the binomial distribution are met, it is a fixed number of trails 50
trials, They are all independent of one another and the probabilities remain constant for each
trial. Also, there are at least five successes and failures, and the population is greater than thirty.
The conditions for standard deviation are both met. It is a simple random sample and the
population is normally distributed.
Possible errors for the data could be that some people got the wrong size of Skittles bag,
either too small or too large. This would affect the mean number of Skittles in the bag, the
proportion of each color of Skittles in each bag and the standard deviation.
The sampling method could be improved by insuring that students only get 2.17 oz bags
of Original Skittles rather than varying sizes of other flavors. Also, if there was a way to get a
true simple random sample of Skittles form the factor that made them, before they are shipped all
over the world the data would be a lot more accurate.
In conclusion, the results show that 18.7%-21.5% of all Skittles are purple. That there are
between 58.79-62.97 Skittles in each 2.17 ounce bag of Original flavored Skittles. Also, that the
mean number of Skittles per bag varies from the mean by 4.419-7.075 Skittles
Reflective Writing and e-Portfolio
This project has been very educational. I feel that I have learned more about proportions,
probability, graphs, confidence intervals, and hypothesis tests from participating in this project.
Going to class and doing homework is helpful in learning new concepts, but when you use the
concepts you learn and apply them to a real world situation it solidifies the new concepts and
makes them easier to comprehend. Besides gaining a greater understanding of the statistical
concepts we have learned in class I have also learned a very useful skill that will help me
throughout the rest of my life in many ways other than math or homework assignments. Using
excel and becoming more comfortable with the program is a very practical skill I have gained
from this skittle project, I was not very comfortable with it before so being able to discover how
to make different types of graphs and charts, summing, averaging, finding standard deviation of
the different columns and rows will be useful for practical real life situations such as budgeting.
The math skills that I applied to this project will impact the other classes I take during my
school career in many ways. For instance, Microsoft Excel is a great program to become more
confident in and will definitely aid me in other classes. Also, the critical thinking skills I needed
for this project will be extremely useful. I learned how to compare and contrast data and different
graphs, which I think is an important skill in the world of education to be able to compare
information given in a research paper or in a scholarly journal. This skill will help me be better
informed by distinguishing between deceptive graphs and information.
Problem solving skills are essential to being a good student and really a good citizen. I
feel that the problem solving and critical thinking skills I have learned in this class and by doing
this project will be very beneficial in my other classes, as I continue onto nursing school, and
then after graduation in many aspects of life. From budgeting, to working as a nurse, and many
other ways.
Real world math problems are things we deal with every day and many people don’t even
realize it. They can be anything from determining how many mile per gallon of gas your car gets
to deciding which product to buy at the store. A project such as this opens my mind of thinking
about real world math problems and hopefully I will recognize next time I am using the different
skills I used in this project and apply them to the real world.