* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download File - Shelbi`s e
Survey
Document related concepts
Transcript
Shelbi Green Skittles Project 11-21-14 For Math 1040, my classmates and I went out and bought a 2.17oz. bag of Skittles. We set out to compile data to find out how frequently the same color of Skittles occurred. From our data we compiled graphs, confidence intervals, and a hypothesis tests with a conclusion on what we found. If based on my own personal data, I predict that I would have more red Skittles than everyone else’s bag. But comparing my own bag of Skittles, I would expect the class as a whole to have more yellow Skittles. The Skittle data collected from the whole class of 25 is Red: 321, Orange: 292, Green: 316, Yellow: 306, purple: 276; overall 1511 skittles. My own personal data is Red: 11, Orange: 7, Yellow: 19, Green: 6, Purple: 17; overall 60 Skittles. The proportion of the Skittle samples gathered by the class is Red: 0.212, Orange: 0.193, Yellow: 0.203, Green: 0.209, Purple: 0.183. I calculated the proportion values by dividing the total values of each group of Skittles by the total number of all Skittles combined. To better understand these proportions, a visual image like a graph makes them easier to comprehend. The types of graphs that our class used to show these proportions are a pie chart and a pareto chart. Refer to the figures below. Shelbi Green Skittles Project 11-21-14 Both graphs above show the results of the entire class Skittle colors. I predicted that yellow Skittles would occur more frequently over the rest of the colors for the entire class. But my prediction was incorrect because red Skittles Shelbi Green Skittles Project 11-21-14 occurred most frequently with a value of 0.212 and green Skittles also had a higher frequency value of 0.209, while yellow Skittles only had a frequency value of 0.203. The orange and purple Skittles had the lowest frequency values of 0.193 and 0.183, respectively. I have provided tables below to show the colors of Skittles for the entire class as well as my own personal data. My own Skittle data in a table=60 Skittles overall Red Skittles Green Skittles Yellow Skittles Orange Skittles 11 6 19 7 Purple Skittles 17 The classes Skittle data=1511 Skittles overall Red Skittles Green Skittles Yellow Skittles Orange Skittles 321 316 306 292 Purple Skittles 276 Using the data collected from the 25 Skittle bags, the mean number of candies per bag is 60.44. The standard deviation is 4.35. The 5-number-summary is; Q1: 59, Q3: 62, Max: 77, Min: 53 and Med: 60. I have made a modified box-plot and a frequency histogram with the information I have listed above. Shelbi Green Skittles Project Candies per Bag Modified BoxPlot 11-21-14 Shelbi Green Skittles Project 11-21-14 The graphs above are skewed to the right. This means the data has an outlier that is pulling the data to the right. This means the data had included a bag whose value was significantly larger than the rest of the bags values. The graphs are a good representation of what I expected the data to look like. The modified box-plot and histogram shows the outliers in our data. For example, the 77 skittles in one bag did not measure up to the to the actual mean of 60.44 skittles in one bag. The class’s data did not agree with my own personal finding bag of 60 skittles. For example my yellow candy proportion is 0.317 and the class’s proportion of yellow candies is 0.203, a 10% difference. Categorical data or qualitative data cannot be measured nor, can it represent numbers that have value. It is also categorized into groups by labels or names. Categorical data allows you to calculate percentages and proportions for which I was able to make a pie chart and bar graph for this skittles project to represent the skittle color proportion. Graphs that you could not use for categorical data would be a box plot or a scatter plot because they involve numbers that represent counts. Quantitative data or numerical data is the opposite of categorical data. The data has numbers that represent measurements or have value. For the skittle project I was able to make a histogram showing the calculated frequencies of the skittle colors in our compiled data representing quantitative data. I made another graph to show quantitative data with a box plot, which groups the numerical data into quartiles. The mean and median of data is calculated because the numbers in quantitative data actually have value to where you can use the numbers given to find the mean or median. Graphs that you could not use to represent quantitative Shelbi Green Skittles Project 11-21-14 data would be a pie chart because the slices usually represent a category like names or labels. A confidence interval will determine the range of acceptable values for estimating the population parameter. A parameter is an unchangeable quantity or statistical measure for any population. Shown below are examples of the three types of confidence intervals; estimating a population proportion, estimating a population mean, and the estimation of a population standard deviation/variance. The first one shown is estimating a population proportion. Shelbi Green Skittles Project 11-21-14 The 99% confidence interval estimate for the true proportion of yellow candies was chosen and given to us. From there I was able to solve for the alpha value by subtracting 99 from 100 I got the value of 1% for alpha. The x value is the total number of yellow skittle candies, which is 306. The n value is the total sample size of skittles, which is 1511. The sample proportion value is solved by dividing the x value over the n value, which is 306/1511. The z score is calculated by looking at the alpha value and calculating the value under the two tails of the curve based on the values given off the formulas and table chart, which is (+2.576 and -2.576). The margin of error is calculated by the z score and multiplying by the square root of (sample proportion multiplied by (1 minus sample proportion value)) divided by the sample value. The value solved for margin of error is 0.0266. The population proportion is solved by taking the sample proportion and adding/subtracting the margin of error value to give the values of 0.1759 and 0.2291. I am confident that Shelbi Green Skittles Project 11-21-14 this interval shows that 99% of the time, the population proportion of yellow skittle candies will be between 0.176 and 0.229. Here is an example of another type of confidence interval: estimating the population mean. The 95% confidence interval for the true mean number of candies per bag was chosen and given to us. From there I was able to solve for the alpha value by subtracting 95 from 100 I got the value of 5% for alpha. The n value(sample size) is the total number of skittle bags, which is 25. I found the Mean of 60.44 by taking all of the skittles value 1511 and dividing that by how many skittle bags in our data 25. I figured out the Standard Deviation by entering in all of the compiled data of the skittle colors into my calculator TI-83. I pushed the stats button then clicked on edit. Shelbi Green Skittles Project 11-21-14 I then entered all the data into L1 and then pushed the stat button again, I went to calc and hit the number one to get the standard deviation. The critical t value is calculated by looking at the alpha value and calculating the value under the two tails of the curve based on the values given off the formulas and table chart, which is 2.064. The margin of error is calculated by taking the critical t value and multiplying that by the standard deviation, which is divided by square root of n. The value solved for margin of error is 1.800. The population mean is solved by taking the mean and adding/subtracting the margin of error value to give the values of 58.640 and 62.240. I am confident that 95% of the time the true value of is between the values of 58.640 and 62.240. Shelbi Green Skittles Project 11-21-14 Here is an example of the third type of confidence interval: estimation of a population standard deviation/variance. The 98% confidence interval estimate for the standard deviation of the number of candies per bag was chosen and given to us. From there I was able to solve for the alpha value by subtracting 98 from 100, I got the value of 2% for alpha. The n value is the total number of skittle bags, which is 25. I found the Standard Deviation by entering in all of the compiled data of the skittle colors into my calculator TI-83. I pushed the stats button – the clicked on edit. I then entered all the data into L1 and then pushed the stat button again, I went to calc and hit the number one to get the standard deviation. The XR2 and XL2 are calculated by looking at the alpha value and calculating the value under the two tails of the curve based on the values given off the formulas and table chart A-4 chi-square distribution. The Shelbi Green Skittles Project 11-21-14 confidence interval estimate for the standard deviation is then calculated by taking the square root of (n-1) multiplied by the standard deviation squared all divided by XR2 or XL2. The values are 3.258 and 6.482. I am confident that 98% of the time the true value of lies between the values of 3.258 and 6.485 A hypothesis test is used to test the probability of a hypothesis being true. A claim concerning a property of a population is defined as a hypothesis. A hypothesis test consists of four steps. The first step is to state the given claim, identify the null, and alternate hypothesis. The second step is to find the test statistics, which will be later used in the problem. The third step would to either figure out the p-value or critical value. The fourth step is comparing you p-value or critical value with the test statistics to figure out the truth behind your claim and null hypothesis. Below I have worked out some example problems of hypothesis testing. The first one is a hypothesis test concerning a population proportion. Shelbi Green Skittles Project 11-21-14 Using a 0.05 significance level I was able to test the claim 20% of all skittle candies are red. From there I was able to state the claim H0(or the null) equals 0.20 and the H1(alternate hypothesis) does not equal 0.20. I was the able to calculate the test statistics by finding p-hat by taking the x value of red Skittle 321 and dividing by the n value of 1511. I took p-hat and subtracted p(the proportion) 0.20 to get the value of 0.0124. After that I took p 0.20 and multiplied it by q (1-p) 0.80 all divided by n 1511 to get the value of 1.0589x10-4. I was then able to take the square root of 1.0589x10-4, which equals 0.0103. I then was able find my z values by taking my first value of 0.0124 and dividing it by my last value found of 0.0103 to get the z value or test statistic of 1.209. I was then able to find the P-value by looking up the z value on a table chart that lists values of z-scores and the corresponding areas. The P-value that corresponds to this problem is 0.0001 and since this is a two-tailed test I squared the 0.0001 to get the P-value of 2x10-4. With this knowledge of the Pvalue I rejected the null and also did not support the claim that 20% of all skittles are red. Shelbi Green Skittles Project 11-21-14 Here is another example problem testing the claim about a population mean. Shelbi Green Skittles Project 11-21-14 Using a 0.01 significance level I was able to test the claim that the mean number of candies in a bag of skittles is 55. From there I was able to state the claim H0(or the null) equals 55 and the H1(alternate hypothesis) does not equal 55. I was then able to calculate the test statistics by taking the mean of 60.44 minus 5, divided by the standard deviation of 4.3597, which is divided by the square root of n 25. The test statistics came out to be 6.239. I was then able to find the critical value by looking up the critical t value by using the 0.01 significance level and the degrees of freedom 24. I was able to look up my correct critical t value on a table chart with critical t values listed. I found my critical t value to be +/-2.797. Knowing critical t value I was able to compare the test statistic score of 6.239 and found that 6.239 was larger then the critical t values of +/-2.797. I was then able to reject the null and fail to support the claim of the mean number of candies in each skittle bag is 55. Requirements for a confidence interval for population proportions are: you must have a simple random sample, also satisfy a binomial distribution, and you have at least five successes as well as five failures amongst the data. All of the Shelbi Green Skittles Project 11-21-14 requirements are met for my 99% confidence interval for population proportions example problem shown above. It is a simple random sample. The requirements for the binomial distribution are met by having a fixed number of trials that are independent, there are two outcomes either yellow candies or not yellow candies. The probabilities remain constant. There has to be at least five successes as well as five failures amongst the data, there are 5 yellow candies and there are 5 not yellow candies The conditions for a confidence interval for a population mean are: you must have a simple random sample, also the population has to be normally distributed or n is greater than 30. My 95% confidence interval for a population mean does meet all the requirements by being a simple random sample, and I would assume that the data is normally distributed. The requirements for a confidence interval for a population standard deviation/variance are: you must have a simple random sample, also the population has to be normally distributed or n(the sample size) is greater than 30. This estimation of a population standard deviation/variance does meet all the requirements; it is a simple random sample, it also is normally distributed. There are four requirements to test a claim concerning population proportion. The observations must be from a simple random sample, also satisfy a binomial distribution, np(the sample size multiplied by population proportion) must be greater than or equal to five and nq(the sample size multiplied by one minus the population proportion) must be greater than or equal to five. All of the requirements are met for my first hypothesis test, testing a claim about population Shelbi Green Skittles Project 11-21-14 proportion. It is a simple random sample. The requirements for the binomial distribution are met by having a fixed number of trials that are independent, there are two outcomes either red Skittles or not red Skittle colors. The probabilities remain constant. The requirements np(the sample size multiplied by population proportion) must be greater than or equal to five and nq(the sample size multiplied by one minus the population proportion) must be greater then or equal to five, this requirement is met because there are 5 red Skittles and 5 non-red Skittles. Requirements to test a claim about population mean are: you must have a simple random sample, the population has to be normally distributed or n(the sample size) is greater than 30. This estimation of a population mean does meet all the requirements; it is a simple random sample, I would also assume that this data is normally distributed. Using the data given for the five problems above I could have made errors. An error I could have made would be to solve for the wrong confidence interval. For example, when asked to construct a population mean interval, I could of solved for a population proportion confidence interval making my data incorrect. Another error I could have made with the given data would to not look up the correct z or t values to the corresponding alpha values. For this Skittle project the sampling method that we used was random sampling. To improve the sampling method there could have been more data added into our classes compiled Skittle data. From this Skittle project I have proven my hypothesis to be incorrect. I predicted that I would have more red Skittles than everyone else’s bag. But Shelbi Green Skittles Project 11-21-14 comparing my own bag of Skittles, I would expect the class as a whole to have more yellow Skittles. I found that the red color Skittle occurred most often in our Skittle data and that yellow occurred more frequently in my own personal Skittle bag of candy. From that data I can conclude that red Skittle appears more frequently over all. Even with this known information the Skittle company seems to randomly fill each Skittle bag hoping that each color of Skittles with get into the bags satisfying everyone. Reflective Writing From this project, I have learned that red Skittles are the most prevalent. I have also proven that I am able to take a sample from a population and calculate reasonable estimates of population parameter values. The math skills that I have gained from this 1040 Math class will be helpful to figure out the probabilities of passing classes with certain test scores. Also when I apply to the nursing program next semester at the University of Utah, I will be able to take all of my ten prerequisite classes and figure out what my grade point average is. This will help me determine if my grade point average is in the desired ranges of 3.0-4.0. Problem solving skills I have gained from this project would be to interpret each problem and figure out what each problem was asking. I was than able to find the correct formulas to use for each problem to get the answers I was looking for. I also found out that labeling the correct information really does help me solve the problem. For hypothesis testing if I do not label the claim at the beginning of the problem, I find that my answer to the question can become very confusing, because I have forgot what the claim is. Shelbi Green Skittles Project 11-21-14 I have found that real world math problems take a lot more work than I have ever expected. I thought that this project wasn’t going to take much time to complete but I was quickly proven wrong. I put in a lot of effort into find each answer the questions. The questions required a lot of reflecting and to actually think about what your data is telling me. I now have more respect for real world math and the time and effort that it actually takes to complete.