* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Statistics 1040 - Miss Stella-Jean
Survey
Document related concepts
Transcript
Statistics 1040 – Term Project Part 1: Choosing Data 1. Tell me which of the three data sets you have chosen to work with this semester. Data Set 4: Freshman 15 Data: Data Set 4 explores the well-known expression “Freshman 15.” This term has been used to describe the weight gain that takes place during a college student’s freshman year. Both weight (in kilograms) and BMI were collected from multiple male and female students. The weight and BMI calculations were measured in September, the beginning of the freshman academic school year, and again in April, the end of the freshman academic school year. Results originally published July 1, 2006 in The Journal of American College Health, volume 55, number 1, page 41; article name “Changes in Body Weight and Fat Mass of Men and Women in the First Year of College: A Study of the ‘Freshman 15.’ ” Article authors include D.J. Hoffman, P. Policastro, V. Quick, and S.K. Lee. 3. Complete the following table for all variables in your chosen data set. Is the variable qualitative Variable name in the date set Describe what the variable means (include units) Sex Gender (female or male) Qualitative Weight in September Weight in kilograms of both male and female participants in September Quantitative or quantitative Weight in April Weight in kilograms of both male and female participants in April Quantitative BMI in September BMI of both male and female participants in September Quantitative BMI in April BMI of both male and female participants in April Quantitative Part 2: Graphical Representation of Data 1. Sample 1 Pie Chart Sample 1 Pareto 2. 2. The second sample was obtained by a systematic approach. The first row was selected (row 1) and every other row thereafter (counted by twos). This only got us a sample of n=34. Since the sample is obtained without replacement, we started again back at the remaining data and used the same approach, picking the first remaining row 2 and this got us to the sample n=35. 3. Though not much different, the first simple random of size n=35 contained 42.86% responses from women, compared to 57.14% of men. The second systematic sample contained 51.34 % of woman responders, compared to 48.57% of men and more equally represents both genders 4. The graphs obtained from the whole data set are very close to the data obtained from the systematic sample two. The whole data set contains 35 women and 32 men, compared with the systematic sample two of 18 women and 17 men. The systematic sample almost equally accounts for both genders. Adversely, sample one represents more men than women whereas the whole data set contains more women than men. I will say that I noticed how “off” pie charts seem to represent data. For some reason, small differences in data, as in the pie chart from sample 1, appear to be much larger. I can see how pie charts may be misleading with no numerical values. I believe histograms are a much better way of graphically representing data. Part 3: Sample Statistics 1. Sample statistics for the two group-selected samples: Simple Random sample n=35: Mean: 65.97 Standard deviation: 10.81 kg Five-number summary: Minimum – 50 Q1 – 57 Median (Q2) – 65 Q3 – 71 Max - 94 Sample n=35 using systematic approach (every 5th row) Mean: 64.34 Standard deviation: 17.03 kg Five-number summary: Minimum – 50 Q1 – 56 Median (Q2) – 64 Q3 – 70 Max - 94 2. Frequency Histogram and Box Plot for each sample Graphs for Simple Random sample n=35 Graphs for Systematic Sample n=35: 4. I wish there was some way to contrast each of the graphs between the two samples. However, they are both unsymmetrical and skewed to the right. This is not surprising because the summary statistics of each sample are very similar. It is reassuring to know that two different approaches to obtaining a sample yielded very similar results. Part 4: Confidence intervals 1. Confidence interval for population proportion: The sample proportion � = .571 is the best point estimate of the population proportion p. The 95% confidence interval of the population proportion for women was .407 < p < .735. In this manner, we are 95% confident that the interval from .407 to .735 actually does contain the true value of the population proportion p. Confidence interval for the population mean: The sample mean � = 65.1 is the best point estimate of the population mean �. The 95% confidence interval for the population mean for “September Weight in Kilograms” is 62.3 < � <69.7. Due to this, we are 95% confident that the interval 62.3 to 69.7 actually does contain the true value of �. Confidence interval for population standard deviation: The sample standard deviation of � = 11.3 is the best point estimate of the population standard deviation �. The 95% confidence interval of the population standard deviation for “September Weight in Kilograms is 9.194 < � < 15.296. Based on these calculations, we are 95% confident that the limits of 9.194 to 15.296 contain the true value of �. 2. For each of the confidence intervals, the values of the population parameters were captured. For the population parameter, there were 35 females and 32 males. With that in mind, the 95% confidence interval for the population proportion is .407 to .725. This confidence interval captures the population parameter because proportionally speaking, this would mean the population should contain between 27 to 49 men and women, which it does. In terms of the population mean, the population mean of 65.1 is captured within the 95% confidence interval of 62.3 to 69.7. The same also goes for the standard deviation. The 95% confidence interval of the population standard deviation is 9.195 to 15.296 and this limit does contain the standard deviation of 11.3 of the population parameter. It is very cool that our intervals captured the population parameters and that we actually got to compute it and see it correspond to a project that we are personally working on. Part 5: Hypotheses Testing 1. Level of significance chosen: 5% 2. In an observational study following the weights of females and males during a freshman year, the population proportion consisted of 51.45% females. Use a 0.05 significance level to test the claim that a simple random selection of 35 individuals, the proportion of females would be 51.45%. H0: p=52.24% H1: p≠ 52.24% Test statistic: = .289 𝑃̂ = .5145 p = .49 q = 1-.49 = .51 n=35 z= .5145−.49 √(.49)(.51) 35 = .290 P-value: (.6141^2) = .377 Fail to reject the null hypothesis because the p-value is greater than the significance level. There is not sufficient evidence to warrant rejection of the claim that the proportion of females will be 49%. 3. In an observational study following the weights of females and males during a freshman year, the September weight (kg) of the freshman is summarized by n=35, 𝑥̅ = 65.97 and s = 10.81. Use a 0.05 significance level to test the claim that the September weight during freshman year had a mean of 61 kg. H0: 𝜇 = 61 H1: 𝜇 ≠ 61 Test Statistic: 2.72 𝑥̅ =65.97 n= 35 s= 10.81 𝜇 = 61 t= 65.97−61.0 10.81 √35 = 2.72 P-value = .01 Reject H0 because the P-value is less than the significant level 0.05. There is sufficient evidence to support the claim that the sample is from a population with a mean weight of 61 kg during the September month of freshman year. 4. Given this information, the hypothesis test for the population mean for weight in kg does support the actual population (values of the original whole data set) mean of 65.1. I’m sure I didn’t form a correct hypothesis test, however. In regard to the hypothesis test regarding the population proportion, the population proportion (values of the original whole data set) has a proportion of 52.24 females. Also, when doing another simple random in StatCrunch, the proportion of females in a sample of 35 was 57.14% The hypothesis test for the population proportion claimed there was not sufficient evidence to warrant the rejection of the claim, and so I guess the hypothesis test regarding the population proportion is supported because it can’t actually be rejected because the P-value was greater than the significance level. Okay, am I right here? Partially right? Part 6: Reflection Many people hear the word statistics and automatically assume they will be learning a different form of mathematics. I was the same way. I believed, like the many mathematics courses in my time, that there was nothing beyond a letter grade or beyond the actual course itself. To be honest, I would say there are dozens of algebraic equations that I will never use in my lifetime. I’m not saying that these equations are not important and don’t have real life applications; however, it’s hard to see these applications beyond the scope of a textbook, and there is truth in the saying, “when will I ever use this.” Statistics is completely different than what I had expected. It truly is a subject that goes beyond the final letter grade and follows us into real life. There isn’t a single person who wouldn’t benefit from the knowledge acquired in a statistics course. I would say the striking difference between other mathematics courses I have taken and this statistics course is critical thinking. I couldn’t get by on solely knowing an equation. It required me to put thought into what I was doing and it begged me to ask important questions about the information put before me and whether or not I was just going to take it at face value or instead from the position of an informed consumer. This term project taught me to do something I have never before been able to do. I was able to take a set of data and scrutinize it with my own statistical analysis. I was able to use rules and equations to determine whether or not the data was something I could trust. I learned about the various ways to graphically portray data and how some ways are much more advantageous that others. I learned how to sum up a set of data purely with “summary statistics,” how to construct confidence intervals and how to test claims about the population’s proportion and mean. Less importantly but to a mind-blowing degree, I learned that whomever created the algorithms used to construct the glorious technology of the TI-84 should be given their weight in gold. One of the ways I’m most interested in applying what I’ve learned this semester is towards the scrutiny of statistical information given for various medical topics. Healthcare is a billion dollar industry with various special interest groups forcing the hands of politicians and researchers in order to construct policy and research studies in their favor. Medicine is my passion. I want to know everything I can about everything there is and I simply can’t get enough. But I also need to know that I can trust the information being presented and not assume it’s coming from a reputable, independent source. I now have the tools to look at statistical data and determine its worth and whether or not it’s something I will find useful and valuable in terms of my own professional standard of care.