Download Angela Morrow Professor Sanborn Skittles Term Project Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Angela Morrow
Professor Sanborn
Skittles Term Project
Introduction:
The goal of our project was to apply statistics into your daily life. Each member of our class (27)
was to obtain a bag of skittles (2.07 oz) then you must sort them according to the color of the
candies (red, orange, yellow, green, and purple). Each person is to count and record the number
of each candy and then give the results to the teacher. Our teacher than compiled a spreadsheet of
individual and combined data. This data allows each student to learn about different statistical
methods that we have learned throughout this semester. I initially thought that thought that there
would be little difference from bags to bag, but to my surprise my hypothesis was incorrect.
There is actually quite a bit of differences between bag to bag, but when the data is complied
together. The differences seemed less significant.
Categorical Data: Colors
Skittles Porportions by Color
18%
23%
Red
Purple
Yellow
19%
20%
20%
Orange
Green
Skittles Porportions by Color
400
350
300
Red
250
Purple
200
376
329
150
Yellow
328
315
291
100
Orange
Green
50
0
RED
PURPLE
YELLOW
ORANGE
GREEN
I have created a Pie Chart and Pareto Chart to show the proportions of candies of each color for
the whole class (27). The data shown above reflects very similar results. Therefore, you can
determine that many of the students got similar results. Although the bag states they all weigh
exactly the same, there are some bags with more or less. This could cause some statistical errors,
also known as outliers. Show below are tables showing my individual data and the data the class
got as a group from our skittles bags.
Red
10
0.163
Orange
12
0.196
Red
376
0.229
Orange
315
0.192
Individual Data
Yellow
Green
14
8
0.229
0.131
Group Data
Yellow
328
0.200
Green
291
0.177
Purple
17
0.278
Total
61
Purple
329
0.200
Summary statistics
Column
n
Candies per Bag 27
Mean Variance Std. dev. Median Range Min Max Q1 Q3
60.73 2.28
1.51
61
6
58 64 59 62
Total
1639
Figure 1 Candies Per Bag
Figure 2 Candies Per Bag
Twenty-seven bags of data were collected. After analyzing this data and compiling them into
various charts. I determined that the results showing were normally distributed. This is what I
expected to see because of the data collected you can assume with the weight that there will be
almost the same amount in each bag, with a few outliers.
Total # of Bags in Sample Skittles Candies in my Bag
27 Bags
61 Candies
Reflection:
There are two types of data, quantitative (numerical) and categorical (qualitative). Quantitative
data is data that can be measured or counted. An example of quantitative data would be the
numbers of candies per color of Skittle. Categorical data is values or observations. An example
would be the colors of Skittles in each bag.
The Pareto and Pie Charts are used categorical data. Scatterplot, and dot-plot are good examples
for us of quantitative data. The types are not useful for quantitative data are Pie and Pareto charts
because they do not use numbers to graph the data. The types that are not ideal for categorical
data are box plots, or a histogram, because these graphs only use numbers.
Confidence Interval Estimates:
The purpose of Confidence Intervals is to estimate the true value of a population
proportion by using a sample proportion. We use a confidence interval, rather than a single
value to estimate more accurate results.
99% Confidence Interval for the true proportion of green candies at 99% confidence.
n=1639
x=291
p= 178
0.153 < u < 0.202
Based on the calculations from our data, we are 99% confidence that the interval
between 0.153 and 0.202 actually does contain the true value of the population proportion
p. This means that if we were to select many different samples of size 1639 and construct the
corresponding confidence intervals, 99% of them would actually contain the value of the
population proportion p.
True mean # of candies per bag at 95% confidence.
n= 27
x=60.73
s=1.5
95% Cl
t=2.056
df= 26
60.137 < μ < 61.323
Based on the calculations from our data, we are 95% confident that the interval from actually
does contain the true value of μ. This means that if we were to select many different samples of
the same size and construct the corresponding confidence intervals, in the long run 95% of them
would actually contain the value of μ.
Standard Deviation of # of candies per bag at 98% confidence
98%Cl
n=27
s=1.5
DF=26
²R = 45.642
²L = 12.198
σ = (1.132< 2.190)
Based on this result, we have 98% confidence that the limits of 1.132 and 2.190 contain the true
value of σ.
Hypothesis Tests
A hypothesis test is a test whether a claim of a value of a population proportion, a population
mean, or a population standard deviation and whether or not the claim is true. The purpose of a
hypothesis test is to make a conclusion about a claim.
N=1639
X=329
Hₒ : p = .20
H₁: p ≠ .20
Fail to reject the Hₒ
There is not sufficient evidence to warrant rejection of the claim that 20% of all Skittles candies
are purple.
Claim: The mean number of Skittles in a 2.17 oz bag is 62. (p=62)
Hₒ : p = 62
H₁: p ≠ 62
two tail test
𝞪 = 0.01
n = 27
x = 60.73
There is not sufficient evidence to warrant the rejection of the claim that the mean number of
candies in 2.17oz bag of Skittles is 62.
The conditions needed for interval estimates include that the sample is a random sample
and that the population is normally distributed. Our samples did meet these requirements as our
data came from a subset of samples that was part of a larger set, or population. The collection of
all date allows for our sample to be normally distributed, with our total n being 1639. Possible
errors could include counting errors, within each color of Skittle, and total number of candies per
bag. The sampling method could be improved by increasing the sample size and/or encouraging
full participation. We could also improve the sampling method by acquiring bags from different
parts of the country and/or world, rather than the local/surrounding area. I have drawn the
conclusion that the true mean number of candies in each bag of Skittles is close to the actual
mean we found by compiling our data. I have also drawn the conclusion that each color of
Skittle is some what evenly proportioned from one bag to the next.
Reflection on Term Skittles Project
I still remember looking at the instructions for this project and thinking that I was reading
something in another language. I had a hard time working on it because of how difficult it
seemed at first. I was amazed at how much work this was actually going to require. We needed
to create a random sample of data, have that data organized, create graphs, charts and interpret
what the information means. Throughout the semester we were thought a great variety of
statistical concepts that gave us the necessary tools to be able to complete this project. Little by
little I realized that I was not only capable of gradually understanding the instructions, but I was
also able to perform the correct sequence of steps for each one of the exercises that were part of
this project. As in any other discipline and class, after learning the theory, practice makes the
whole difference.
This project allowed us to put into practice key principles studied throughout the term, from
using a sampling method to performing hypothesis testing. The most challenging aspects of the
project were really understanding each concept, and how it applied to the population of Skittles
not just our sample.
This project and the class in general have gave me the tools to differentiate between valid
professional papers from those that end up being questionable sources of information when
analyzing things like graphs, and confidence levels and intervals that can make the whole
difference when trying to see if the study is well done. To be able to understand the language
behind the statistical analysis of studies with simple, but important terms such as media, range,
mean, and mode that are so frequently use in so many instances.
I have always struggled with math but found that this course allowed me to see how math does
apply to real life, and was refreshing to take a course outside of the classroom and math book. I
have always gritted my teeth at story problems but his course gave a whole new approach and
meaning to the information provided. I never thought I would say this, but math can be fun at
times, especially when you can understand how it applies to the real life situations helping us to
better understand the world around us.