Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Projects Here is a list of possible statistics projects for your convenience. If you can’t think of something to do on your own, you may select any of the following. I will try to give you all the instructions necessary so that you will only need follow instructions to complete the project. If you’d like to think of something to do on your own, you are welcome to do that, too. All right, here are the possible projects and the instructions. 1. Hypothesis: The average miles driven by cars in Parking Lot “A” are equal to the average miles driven by cars in Parking Lot “B”. This project will determine if there’s a difference between average miles driven by cars in the two lots. The “alternative hypothesis” is that the average miles driven by cars in the two lots are not the same. Select = .05 If you are able to get the mileage from 30 cars in each of the two lots then the sample size will be sufficiently large that you could use a Z statistic to test the claim, but I would still suggest you use a “t” statistic to do this test. If you get less that 30 cars from each of the two lots, then you will have to use a “t” statistic to test the claim. Go out and get the mileage from the cars in Parking Lots A and B of the desired number of cars in each lot. It is best to have the same number of cars measured in each of the two lots. Determine the mean of the mileage driven by the cars in Parking Lot A. Determine the mean of the mileage driven by the cars in Parking Lot B. Determine the standard deviation of the mileage driven by the cars in Parking Lot A. Determine the standard deviation of the mileage driven by the cars in Parking Lot B. Determine the critical values for this two-tailed t statistic (with the alpha set at .05). Calculate the “t” statistic by plugging in the appropriate numbers you’ve discovered in your research into the “t” statistic formula. If the actual value of the “t” statistic exceeds the critical values, then you will reject the null hypothesis (stated in #1 above). If the actual value of the “t” statistic does not exceed the critical values, then you will retain the null hypothesis, i.e., the average mileage driven by the cars in the two lots is the same. Discuss the results in English. What do the results of the hypothesis test mean? Be sure to discuss your method of getting your cars to measure. Were they selected by a random process or did you just start measuring mileage in cars that were convenient for you? (This will have bearing on your resultant conclusions as the process is based on random selection and not on convenience sampling.) Indicate in your write-up why you chose to use the “t” statistic or the “z” statistic. What are the assumptions of these statistics and did your investigation and use of these statistics conform to the assumptions of these statistics? 2. Another possibility. It is claimed that the average age of agriculture students at BHE is younger than the average age of students who are in the Liberal Arts and Sciences at BHE. Recognize that this claim is made in the “alternative hypothesis” language. You will need to restate the null hypothesis, i.e., the average age of students in both departments are the same or that the average of the LAS people age is greater than the aggies. Therefore, the null hypothesis is this: H0: aggies LAS people. The alternative is this: H1: aggies LAS people. Recognize that this is a one-tailed test because we are saying “younger.” If we had said “not the same age” then it would have been a two-tailed test. Randomly select 30 students from the agriculture department and get their ages. Randomly select 30 students from the Liberal Arts department and get their ages. Determine the appropriate critical value for this “t” statistic you will use to make your hypothesis test. This is a “t” test of independent samples; your degrees of freedom will be n1 + n2 – 2. It will be a one-tailed test; you will have one critical “t” value. Determine the mean age of the aggies. Determine the mean age of the LAS people. Determine the standard deviation of the aggies. Determine the standard deviation of the LAS people. Plug in these numbers into the “t” formula and calculate the “actual t” value resulting from your data. Compare the “actual t” value to your “critical t” value. This is going to be a onetailed test to the left. If you reject the null hypothesis, you will be concluding that the average age of the aggies is lower than the average age of the LAS people. If you retain the null hypothesis, you will be concluding that the average age of the aggies is equal to or greater than the age of the LAS people. Be sure to discuss your results in English. What does it mean to retain the null hypothesis or to reject it? Indicate that you have used the appropriate statistic, i.e., the “t” statistic and why. What assumptions does the independent t test make and did your procedure conform to those assumptions? Describe your results so that the average person who does not know a thing about statistics can understand what you’ve concluded and why. 3. Another example: It is claimed that the average horse-science aggie spends at least 2.5 hours a week grooming their animals. We are not comparing two groups here, only one to a stated norm (2.5 hours) and so we’ll only need to sample 1 group. You will want to sample approximately 30 horse-science students (if possible) or at least as many as you can find. You will ask these students “How many minutes a week do you spend grooming your horse?” The null hypothesis is this: H0: 150 minutes (2.5 hours) per week The alternative hypothesis is this: H1: 150 minutes per week If you have a sample of less than 30, you will use the t-statistic. Because we do not know the population standard deviation for this, we must use the “t” statistic even if we have a sample size of 30 or more. The critical value is a one-tailed value to the left. You will calculate this critical value using n-1 degrees of freedom. Use = .01, for example. Determine the mean and the standard deviation for your sample. Plug in the numbers into the t-statistic where X-bar is your sample mean and = 150 minutes. The standard deviation is the standard deviation of your sample and the “n” value is the size of your sample. Calculate the actual “t” value using the formula and compare it to the critical value you’ve already found. If the actual value is less than the critical value, you will reject the null hypothesis and retain the alternative hypothesis. If the actual value is great than the critical value, you will retain the null hypothesis. Write up your results in English so that someone who does not know statistics can understand what you’ve done and why you make the conclusions you’ve made. Make sure that you indicate the assumptions of the “t” statistic and that your investigation has complied to the assumptions of the “t” statistic. Make sure you indicate to me what the mean, standard deviation, and n-size is as well as your critical and actual values for the “t” statistic so that I can determine if you’ve done the right thing. 4. Another example: It is claimed that the average GPA of males at BHE is not equal to the average GPA of females at BHE. Recognize that this is a two-tailed test “not equal” You will get the GPA from a sample of 30 males and a sample of 30 females at BHE. You will select an alpha () value of, say, .05 or .01 You will use a “t” test of independent samples in this problem. You will determine the degrees of freedom as n1 + n2 – 2. You will note that the claim is made in alternative hypothesis language. You will need to determine the appropriate null hypothesis language, i.e., H0: male-GPA = female-GPA. The alternative is the claim, i.e., H1: male-GPA female-GPA. You will now calculate the critical values of this “t” statistic. You will determine the mean and standard deviation of the male’s GPA. You will determine the mean and standard deviation of the female’s GPA. You will plug these values into the “t” statistic and calculate an “actual value” of the “t” statistic. You will compare the actual value to the two critical values. If the actual “t” falls between the two critical values, you will retain the null hypothesis. If the actual “t” falls either higher than the high critical value or lower than the low critical value, you will reject the null hypothesis and retain the alternative hypothesis. Write up your results in English so that somebody who does not know statistics can understand what you’ve done. Make sure you discuss the assumptions of the “t” statistic and the procedure you’ve done so as to indicate that you’ve complied to the assumptions of the process. Indicate to me the means, standard deviations, n-sizes, critical values and resultant calculated actual value and your decision regarding the null hypothesis so that I know that you know what you’re doing. Here is a list of other things you could investigate in your projects: 5. Miles driven to school by agriculture students vs liberal arts and sciences students. 6. Hours spent per week doing homework between aggies and liberal arts people. 7. Amount spent on groceries per week between males and females at BHE. 8. Amount spent on clothes in the last month between males and females at BHE. 9. Distance from school to parents’ house between aggies and liberal arts people. 10. Number of books read in the last year between males and females at BHE. These are just a few of the projects that come to my mind. There are hundreds of things you could measure and compare between departments, genders, ages, parking lots, buildings, class times, etc. We have already done, in a fashion, some of these projects in class. So that you might see a complete example, I will do one here for your examination. A Comparison of Average Age Between Day and Evening Students at BHE By: Robert E. Lee In partial fulfillment of the requirements for Statistics April 2, 2000 Abstract: This project compares the average ages of students at BHE during the day and during the evening. The null hypothesis for the project was that the average age between the day and evening students would be the same. The alternative hypothesis for the project was that the average age between the day and evening students would be different. Results of the hypothesis test indicate that the average age between the day and evening students is not the same and that evening students are likely older than day-time students. Introduction: It has long been claimed that the average age of students taking classes at BHE in the evening is not the same as that of the students taking classes at BHE during the day. While this claim has been widely circulated and believed by students and staff at the College no one has, to this writer’s knowledge, actually tested this claim using standard statistical procedures. Circulation of unsupported claims can make rumor appear as fact; it is better to subject the claim to standard statistical verification than to believe an unsupported claim. Therefore, this project is designed to actually test this claim. The operating hypotheses for this project are as follows: H0: age-day age-evening H1: age-day age-evening Method: Thirty students were randomly selected from day classes. The method employed in their random selection involved their order of leaving two randomly selected classes on Friday, March 29, 2000. I asked every-other student leaving these two classes what their age was; it is assumed that order of leaving a class is random. One of the classes selected was during the morning and one in the early afternoon. It was assumed that in doing this, there would be no bias of age incorporated into the data. Thirty students were randomly selected from evening classes. Again, the method of selection involved randomly picking two evening classes and then asking every-other student leaving these two classes what their age was. In this manner of selecting students for participation in the study that there was no systematic bias operating with respect to students ages and, as best possible, a random selection of students in both the day and evening comparison groups was achieved. Statistic Employed: As there were 30 students in each of two groups (day and evening students) but the population standard deviation was not known for day and evening students I chose the t-test for independent samples as the appropriate statistic to test the claim. The assumptions for the statistic are that the two groups employed are not related to each other, i.e., independent groups, and, that the ages of the two groups are approximately normally distributed as well has having variances which are approximately equal. Statistic Critical Value: I was interested in a 95% probability of making a correct decision regarding my conclusion and, hence, = .05. There were 60 students combined in the two groups, hence my degrees of freedom in this t-test of independent samples was 58. A table-lookup on the t-table revealed the critical value in this hypothesis test to be +1.96 and –1.96. If the actual value of the t-statistic was less than –1.96t or greater than +1.96t, then I would reject the null hypothesis (stated above) and retain the alternative (also stated above). Sample Statistics: The average age of the 30 day-time students was 23.2 years of age with a standard deviation of 3.2 years. The average age of the 30 evening students was 26.8 years with a standard deviation of 4.5 years. As you can see, the standard deviations between the two groups were not substantially different and so the assumption that the variances between the two groups are approximately equal is maintained. Results of the Hypothesis Test: The appropriate statistical test to test the hypothesis has been indicated above, i.e., a t-test of independent samples. The resulting calculation of the actual “t” value for this project was –3.5709t. As you can see, this value is less than the lower critical value of –1.96t. As a result, I am forced to reject the null hypothesis (that the day- and evening-students are of the same average age) and accept the alternative hypothesis, i.e., that the day- and evening-students average ages are not the same. Examination of the average ages reveals the day group (mean=23.2) and evening group (mean=26.8). In the context of the rejection of the null hypothesis, the data appear to indicate that the day students are, in fact, younger than the evening students. Alternatively, we could say that the evening students are, in fact, older than the day students, with a p-value equivalent to .0007, i.e., the probability that this finding would occur by chance alone is 7 in 10,000. It is likely the case that, indeed, evening students are older than day students. Statistical analysis of these data do, in fact, support the claim that evening students are older than younger students. Discussion: Claims have been made at BHE that the evening students are older than the day-time students. Results of the present study appear to strongly support the claim. While it has historically been thought this was the case we can now specify with statistical clarity that it is true with a probability of my being in error less than .0007. It is possible that the selection method of students for the two groups may not have been as random as possible; my method for their selection was not by random number table or by computer selection. Nonetheless, this finding is substantial and suggests that evening students are older than day-time students. It is possible that, based on these findings, the college may desire to examine its instructional methods for day- and evening-students considering this age differential. Summary numbers for Mr. Lee: Mean(daytime) = 23.2 years; standard deviation = 3.2 years; subjects = 30 Mean(evening) = 26.8 years; standard deviation = 4.5 years; subjects = 30 t-critical values = -1.96 and +1.96 t-actual value = -3.5079 p<.05; actual p-value = .0007 Reject the null hypothesis; analysis supports the claim. Books used: Statistics Handbook for the TI-83 by Larry Morgan, 1997. Elementary Statistics by Robert Johnson, 3rd Edition, 1980.