Download File - Shelbi`s e

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Time series wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Shelbi Green
Skittles Project
11-21-14
For Math 1040, my classmates and I went out and bought a 2.17oz. bag of
Skittles. We set out to compile data to find out how frequently the same color of
Skittles occurred. From our data we compiled graphs, confidence intervals, and a
hypothesis tests with a conclusion on what we found. If based on my own personal
data, I predict that I would have more red Skittles than everyone else’s bag. But
comparing my own bag of Skittles, I would expect the class as a whole to have more
yellow Skittles. The Skittle data collected from the whole class of 25 is Red: 321,
Orange: 292, Green: 316, Yellow: 306, purple: 276; overall 1511 skittles. My own
personal data is Red: 11, Orange: 7, Yellow: 19, Green: 6, Purple: 17; overall 60
Skittles.
The proportion of the Skittle samples gathered by the class is Red: 0.212,
Orange: 0.193, Yellow: 0.203, Green: 0.209, Purple: 0.183. I calculated the
proportion values by dividing the total values of each group of Skittles by the total
number of all Skittles combined. To better understand these proportions, a visual
image like a graph makes them easier to comprehend. The types of graphs that our
class used to show these proportions are a pie chart and a pareto chart. Refer to the
figures below.
Shelbi Green
Skittles Project
11-21-14
Both graphs above show the results of the entire class Skittle colors. I
predicted that yellow Skittles would occur more frequently over the rest of the
colors for the entire class. But my prediction was incorrect because red Skittles
Shelbi Green
Skittles Project
11-21-14
occurred most frequently with a value of 0.212 and green Skittles also had a higher
frequency value of 0.209, while yellow Skittles only had a frequency value of 0.203.
The orange and purple Skittles had the lowest frequency values of 0.193 and 0.183,
respectively. I have provided tables below to show the colors of Skittles for the
entire class as well as my own personal data.
My own Skittle data in a table=60 Skittles overall
Red Skittles
Green Skittles Yellow Skittles Orange Skittles
11
6
19
7
Purple Skittles
17
The classes Skittle data=1511 Skittles overall
Red Skittles
Green Skittles Yellow Skittles Orange Skittles
321
316
306
292
Purple Skittles
276
Using the data collected from the 25 Skittle bags, the mean number of
candies per bag is 60.44. The standard deviation is 4.35. The 5-number-summary
is; Q1: 59, Q3: 62, Max: 77, Min: 53 and Med: 60. I have made a modified box-plot
and a frequency histogram with the information I have listed above.
Shelbi Green
Skittles Project
Candies per Bag Modified BoxPlot
11-21-14
Shelbi Green
Skittles Project
11-21-14
The graphs above are skewed to the right. This means the data has an outlier
that is pulling the data to the right. This means the data had included a bag whose
value was significantly larger than the rest of the bags values. The graphs are a good
representation of what I expected the data to look like. The modified box-plot and
histogram shows the outliers in our data. For example, the 77 skittles in one bag did
not measure up to the to the actual mean of 60.44 skittles in one bag. The class’s
data did not agree with my own personal finding bag of 60 skittles. For example my
yellow candy proportion is 0.317 and the class’s proportion of yellow candies is
0.203, a 10% difference.
Categorical data or qualitative data cannot be measured nor, can it represent
numbers that have value. It is also categorized into groups by labels or names.
Categorical data allows you to calculate percentages and proportions for which I
was able to make a pie chart and bar graph for this skittles project to represent the
skittle color proportion. Graphs that you could not use for categorical data would be
a box plot or a scatter plot because they involve numbers that represent counts.
Quantitative data or numerical data is the opposite of categorical data. The
data has numbers that represent measurements or have value. For the skittle
project I was able to make a histogram showing the calculated frequencies of the
skittle colors in our compiled data representing quantitative data. I made another
graph to show quantitative data with a box plot, which groups the numerical data
into quartiles. The mean and median of data is calculated because the numbers in
quantitative data actually have value to where you can use the numbers given to
find the mean or median. Graphs that you could not use to represent quantitative
Shelbi Green
Skittles Project
11-21-14
data would be a pie chart because the slices usually represent a category like names
or labels.
A confidence interval will determine the range of acceptable values for
estimating the population parameter. A parameter is an unchangeable quantity
or statistical measure for any population.
Shown below are examples of the three types of confidence intervals;
estimating a population proportion, estimating a population mean, and the
estimation of a population standard deviation/variance. The first one shown is
estimating a population proportion.
Shelbi Green
Skittles Project
11-21-14
The 99% confidence interval estimate for the true proportion of yellow
candies was chosen and given to us. From there I was able to solve for the alpha
value by subtracting 99 from 100 I got the value of 1% for alpha. The x value is the
total number of yellow skittle candies, which is 306. The n value is the total sample
size of skittles, which is 1511. The sample proportion value is solved by dividing the
x value over the n value, which is 306/1511. The z score is calculated by looking at
the alpha value and calculating the value under the two tails of the curve based on
the values given off the formulas and table chart, which is (+2.576 and -2.576). The
margin of error is calculated by the z score and multiplying by the square root of
(sample proportion multiplied by (1 minus sample proportion value)) divided by
the sample value. The value solved for margin of error is 0.0266. The population
proportion is solved by taking the sample proportion and adding/subtracting the
margin of error value to give the values of 0.1759 and 0.2291. I am confident that
Shelbi Green
Skittles Project
11-21-14
this interval shows that 99% of the time, the population proportion of yellow skittle
candies will be between 0.176 and 0.229.
Here is an example of another type of confidence interval: estimating
the population mean.
The 95% confidence interval for the true mean number of candies per bag
was chosen and given to us. From there I was able to solve for the alpha value by
subtracting 95 from 100 I got the value of 5% for alpha. The n value(sample size) is
the total number of skittle bags, which is 25. I found the Mean of 60.44 by taking all
of the skittles value 1511 and dividing that by how many skittle bags in our data 25.
I figured out the Standard Deviation by entering in all of the compiled data of the
skittle colors into my calculator TI-83. I pushed the stats button then clicked on edit.
Shelbi Green
Skittles Project
11-21-14
I then entered all the data into L1 and then pushed the stat button again, I went to
calc and hit the number one to get the standard deviation. The critical t value is
calculated by looking at the alpha value and calculating the value under the two tails
of the curve based on the values given off the formulas and table chart, which is
2.064. The margin of error is calculated by taking the critical t value and multiplying
that by the standard deviation, which is divided by square root of n. The value
solved for margin of error is 1.800. The population mean is solved by taking the
mean and adding/subtracting the margin of error value to give the values of 58.640
and 62.240. I am confident that 95% of the time the true value of  is between the
values of 58.640 and 62.240.
Shelbi Green
Skittles Project
11-21-14
Here is an example of the third type of confidence interval: estimation of a
population standard deviation/variance.
The 98% confidence interval estimate for the standard deviation of the
number of candies per bag was chosen and given to us. From there I was able to
solve for the alpha value by subtracting 98 from 100, I got the value of 2% for alpha.
The n value is the total number of skittle bags, which is 25. I found the Standard
Deviation by entering in all of the compiled data of the skittle colors into my
calculator TI-83. I pushed the stats button – the clicked on edit. I then entered all
the data into L1 and then pushed the stat button again, I went to calc and hit the
number one to get the standard deviation. The XR2 and XL2 are calculated by looking
at the alpha value and calculating the value under the two tails of the curve based on
the values given off the formulas and table chart A-4 chi-square distribution. The
Shelbi Green
Skittles Project
11-21-14
confidence interval estimate for the standard deviation is then calculated by taking
the square root of (n-1) multiplied by the standard deviation squared all divided by
XR2 or XL2. The values are 3.258 and 6.482. I am confident that 98% of the time the
true value of  lies between the values of 3.258 and 6.485
A hypothesis test is used to test the probability of a hypothesis being true. A
claim concerning a property of a population is defined as a hypothesis. A hypothesis
test consists of four steps. The first step is to state the given claim, identify the null,
and alternate hypothesis. The second step is to find the test statistics, which will be
later used in the problem. The third step would to either figure out the p-value or
critical value. The fourth step is comparing you p-value or critical value with the
test statistics to figure out the truth behind your claim and null hypothesis.
Below I have worked out some example problems of hypothesis testing. The
first one is a hypothesis test concerning a population proportion.
Shelbi Green
Skittles Project
11-21-14
Using a 0.05 significance level I was able to test the claim 20% of all skittle
candies are red. From there I was able to state the claim H0(or the null) equals 0.20
and the H1(alternate hypothesis) does not equal 0.20. I was the able to calculate the
test statistics by finding p-hat by taking the x value of red Skittle 321 and dividing by
the n value of 1511. I took p-hat and subtracted p(the proportion) 0.20 to get the
value of 0.0124. After that I took p 0.20 and multiplied it by q (1-p) 0.80 all divided
by n 1511 to get the value of 1.0589x10-4. I was then able to take the square root of
1.0589x10-4, which equals 0.0103. I then was able find my z values by taking my
first value of 0.0124 and dividing it by my last value found of 0.0103 to get the z
value or test statistic of 1.209. I was then able to find the P-value by looking up the z
value on a table chart that lists values of z-scores and the corresponding areas. The
P-value that corresponds to this problem is 0.0001 and since this is a two-tailed test
I squared the 0.0001 to get the P-value of 2x10-4. With this knowledge of the Pvalue I rejected the null and also did not support the claim that 20% of all skittles
are red.
Shelbi Green
Skittles Project
11-21-14
Here is another example problem testing the claim about a population
mean.
Shelbi Green
Skittles Project
11-21-14
Using a 0.01 significance level I was able to test the claim that the mean
number of candies in a bag of skittles is 55. From there I was able to state the claim
H0(or the null) equals 55 and the H1(alternate hypothesis) does not equal 55. I was
then able to calculate the test statistics by taking the mean of 60.44 minus 5, divided
by the standard deviation of 4.3597, which is divided by the square root of n 25.
The test statistics came out to be 6.239. I was then able to find the critical value by
looking up the critical t value by using the 0.01 significance level and the degrees of
freedom 24. I was able to look up my correct critical t value on a table chart with
critical t values listed. I found my critical t value to be +/-2.797. Knowing critical t
value I was able to compare the test statistic score of 6.239 and found that 6.239
was larger then the critical t values of +/-2.797. I was then able to reject the null and
fail to support the claim of the mean number of candies in each skittle bag is 55.
Requirements for a confidence interval for population proportions are: you
must have a simple random sample, also satisfy a binomial distribution, and you
have at least five successes as well as five failures amongst the data. All of the
Shelbi Green
Skittles Project
11-21-14
requirements are met for my 99% confidence interval for population proportions
example problem shown above. It is a simple random sample. The requirements
for the binomial distribution are met by having a fixed number of trials that are
independent, there are two outcomes either yellow candies or not yellow candies.
The probabilities remain constant. There has to be at least five successes as well as
five failures amongst the data, there are 5 yellow candies and there are 5 not yellow
candies
The conditions for a confidence interval for a population mean are: you must
have a simple random sample, also the population has to be normally distributed or
n is greater than 30. My 95% confidence interval for a population mean does meet
all the requirements by being a simple random sample, and I would assume that the
data is normally distributed.
The requirements for a confidence interval for a population standard
deviation/variance are: you must have a simple random sample, also the population
has to be normally distributed or n(the sample size) is greater than 30. This
estimation of a population standard deviation/variance does meet all the
requirements; it is a simple random sample, it also is normally distributed.
There are four requirements to test a claim concerning population
proportion. The observations must be from a simple random sample, also satisfy a
binomial distribution, np(the sample size multiplied by population proportion)
must be greater than or equal to five and nq(the sample size multiplied by one
minus the population proportion) must be greater than or equal to five. All of the
requirements are met for my first hypothesis test, testing a claim about population
Shelbi Green
Skittles Project
11-21-14
proportion. It is a simple random sample. The requirements for the binomial
distribution are met by having a fixed number of trials that are independent, there
are two outcomes either red Skittles or not red Skittle colors. The probabilities
remain constant. The requirements np(the sample size multiplied by population
proportion) must be greater than or equal to five and nq(the sample size multiplied
by one minus the population proportion) must be greater then or equal to five, this
requirement is met because there are 5 red Skittles and 5 non-red Skittles.
Requirements to test a claim about population mean are: you must have a
simple random sample, the population has to be normally distributed or n(the
sample size) is greater than 30. This estimation of a population mean does meet all
the requirements; it is a simple random sample, I would also assume that this data is
normally distributed.
Using the data given for the five problems above I could have made errors.
An error I could have made would be to solve for the wrong confidence interval. For
example, when asked to construct a population mean interval, I could of solved for a
population proportion confidence interval making my data incorrect. Another error
I could have made with the given data would to not look up the correct z or t values
to the corresponding alpha values.
For this Skittle project the sampling method that we used was random
sampling. To improve the sampling method there could have been more data added
into our classes compiled Skittle data.
From this Skittle project I have proven my hypothesis to be incorrect. I
predicted that I would have more red Skittles than everyone else’s bag. But
Shelbi Green
Skittles Project
11-21-14
comparing my own bag of Skittles, I would expect the class as a whole to have more
yellow Skittles. I found that the red color Skittle occurred most often in our Skittle
data and that yellow occurred more frequently in my own personal Skittle bag of
candy. From that data I can conclude that red Skittle appears more frequently over
all. Even with this known information the Skittle company seems to randomly fill
each Skittle bag hoping that each color of Skittles with get into the bags satisfying
everyone.
Reflective Writing
From this project, I have learned that red Skittles are the most prevalent. I
have also proven that I am able to take a sample from a population and calculate
reasonable estimates of population parameter values. The math skills that I have
gained from this 1040 Math class will be helpful to figure out the probabilities of
passing classes with certain test scores. Also when I apply to the nursing program
next semester at the University of Utah, I will be able to take all of my ten prerequisite classes and figure out what my grade point average is. This will help me
determine if my grade point average is in the desired ranges of 3.0-4.0.
Problem solving skills I have gained from this project would be to interpret
each problem and figure out what each problem was asking. I was than able to find
the correct formulas to use for each problem to get the answers I was looking for. I
also found out that labeling the correct information really does help me solve the
problem. For hypothesis testing if I do not label the claim at the beginning of the
problem, I find that my answer to the question can become very confusing, because I
have forgot what the claim is.
Shelbi Green
Skittles Project
11-21-14
I have found that real world math problems take a lot more work than I have
ever expected. I thought that this project wasn’t going to take much time to
complete but I was quickly proven wrong. I put in a lot of effort into find each
answer the questions. The questions required a lot of reflecting and to actually
think about what your data is telling me. I now have more respect for real world
math and the time and effort that it actually takes to complete.