Download Statistics Projects

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics Projects
Here is a list of possible statistics projects for your convenience. If you can’t think of
something to do on your own, you may select any of the following. I will try to give you
all the instructions necessary so that you will only need follow instructions to complete
the project. If you’d like to think of something to do on your own, you are welcome to
do that, too.
All right, here are the possible projects and the instructions.
1. Hypothesis: The average miles driven by cars in Parking Lot “A” are equal to the
average miles driven by cars in Parking Lot “B”.
 This project will determine if there’s a difference between average miles driven
by cars in the two lots.
 The “alternative hypothesis” is that the average miles driven by cars in the two
lots are not the same.
 Select  = .05
 If you are able to get the mileage from 30 cars in each of the two lots then the
sample size will be sufficiently large that you could use a Z statistic to test the
claim, but I would still suggest you use a “t” statistic to do this test.
 If you get less that 30 cars from each of the two lots, then you will have to use a
“t” statistic to test the claim.
 Go out and get the mileage from the cars in Parking Lots A and B of the desired
number of cars in each lot. It is best to have the same number of cars measured in
each of the two lots.
 Determine the mean of the mileage driven by the cars in Parking Lot A.
 Determine the mean of the mileage driven by the cars in Parking Lot B.
 Determine the standard deviation of the mileage driven by the cars in Parking Lot
A.
 Determine the standard deviation of the mileage driven by the cars in Parking Lot
B.
 Determine the critical values for this two-tailed t statistic (with the alpha set at
.05).
 Calculate the “t” statistic by plugging in the appropriate numbers you’ve
discovered in your research into the “t” statistic formula.
 If the actual value of the “t” statistic exceeds the critical values, then you will
reject the null hypothesis (stated in #1 above).
 If the actual value of the “t” statistic does not exceed the critical values, then you
will retain the null hypothesis, i.e., the average mileage driven by the cars in the
two lots is the same.
 Discuss the results in English. What do the results of the hypothesis test mean?
 Be sure to discuss your method of getting your cars to measure. Were they
selected by a random process or did you just start measuring mileage in cars that
were convenient for you? (This will have bearing on your resultant conclusions as
the process is based on random selection and not on convenience sampling.)

Indicate in your write-up why you chose to use the “t” statistic or the “z” statistic.
What are the assumptions of these statistics and did your investigation and use of
these statistics conform to the assumptions of these statistics?
2. Another possibility. It is claimed that the average age of agriculture students at BHE
is younger than the average age of students who are in the Liberal Arts and Sciences
at BHE.
 Recognize that this claim is made in the “alternative hypothesis” language. You
will need to restate the null hypothesis, i.e., the average age of students in both
departments are the same or that the average of the LAS people age is greater than
the aggies. Therefore, the null hypothesis is this: H0: aggies  LAS people. The
alternative is this: H1: aggies  LAS people.
 Recognize that this is a one-tailed test because we are saying “younger.” If we
had said “not the same age” then it would have been a two-tailed test.
 Randomly select 30 students from the agriculture department and get their ages.
 Randomly select 30 students from the Liberal Arts department and get their ages.
 Determine the appropriate critical value for this “t” statistic you will use to make
your hypothesis test. This is a “t” test of independent samples; your degrees of
freedom will be n1 + n2 – 2. It will be a one-tailed test; you will have one critical
“t” value.
 Determine the mean age of the aggies.
 Determine the mean age of the LAS people.
 Determine the standard deviation of the aggies.
 Determine the standard deviation of the LAS people.
 Plug in these numbers into the “t” formula and calculate the “actual t” value
resulting from your data.
 Compare the “actual t” value to your “critical t” value. This is going to be a onetailed test to the left. If you reject the null hypothesis, you will be concluding that
the average age of the aggies is lower than the average age of the LAS people. If
you retain the null hypothesis, you will be concluding that the average age of the
aggies is equal to or greater than the age of the LAS people.
 Be sure to discuss your results in English. What does it mean to retain the null
hypothesis or to reject it?
 Indicate that you have used the appropriate statistic, i.e., the “t” statistic and why.
What assumptions does the independent t test make and did your procedure
conform to those assumptions? Describe your results so that the average person
who does not know a thing about statistics can understand what you’ve concluded
and why.
3. Another example: It is claimed that the average horse-science aggie spends at least
2.5 hours a week grooming their animals.
 We are not comparing two groups here, only one to a stated norm (2.5 hours) and
so we’ll only need to sample 1 group.
 You will want to sample approximately 30 horse-science students (if possible) or
at least as many as you can find.













You will ask these students “How many minutes a week do you spend grooming
your horse?”
The null hypothesis is this: H0:   150 minutes (2.5 hours) per week
The alternative hypothesis is this: H1:   150 minutes per week
If you have a sample of less than 30, you will use the t-statistic. Because we do
not know the population standard deviation for this, we must use the “t” statistic
even if we have a sample size of 30 or more.
The critical value is a one-tailed value to the left. You will calculate this critical
value using n-1 degrees of freedom. Use  = .01, for example.
Determine the mean and the standard deviation for your sample.
Plug in the numbers into the t-statistic where X-bar is your sample mean and  =
150 minutes. The standard deviation is the standard deviation of your sample and
the “n” value is the size of your sample.
Calculate the actual “t” value using the formula and compare it to the critical
value you’ve already found.
If the actual value is less than the critical value, you will reject the null hypothesis
and retain the alternative hypothesis.
If the actual value is great than the critical value, you will retain the null
hypothesis.
Write up your results in English so that someone who does not know statistics can
understand what you’ve done and why you make the conclusions you’ve made.
Make sure that you indicate the assumptions of the “t” statistic and that your
investigation has complied to the assumptions of the “t” statistic.
Make sure you indicate to me what the mean, standard deviation, and n-size is as
well as your critical and actual values for the “t” statistic so that I can determine if
you’ve done the right thing.
4. Another example: It is claimed that the average GPA of males at BHE is not equal to
the average GPA of females at BHE.
 Recognize that this is a two-tailed test “not equal”
 You will get the GPA from a sample of 30 males and a sample of 30 females at
BHE.
 You will select an alpha () value of, say, .05 or .01
 You will use a “t” test of independent samples in this problem.
 You will determine the degrees of freedom as n1 + n2 – 2.
 You will note that the claim is made in alternative hypothesis language. You will
need to determine the appropriate null hypothesis language, i.e., H0: male-GPA =
female-GPA. The alternative is the claim, i.e., H1:  male-GPA  female-GPA.
 You will now calculate the critical values of this “t” statistic.
 You will determine the mean and standard deviation of the male’s GPA.
 You will determine the mean and standard deviation of the female’s GPA.
 You will plug these values into the “t” statistic and calculate an “actual value” of
the “t” statistic.
 You will compare the actual value to the two critical values. If the actual “t” falls
between the two critical values, you will retain the null hypothesis.




If the actual “t” falls either higher than the high critical value or lower than the
low critical value, you will reject the null hypothesis and retain the alternative
hypothesis.
Write up your results in English so that somebody who does not know statistics
can understand what you’ve done.
Make sure you discuss the assumptions of the “t” statistic and the procedure
you’ve done so as to indicate that you’ve complied to the assumptions of the
process.
Indicate to me the means, standard deviations, n-sizes, critical values and resultant
calculated actual value and your decision regarding the null hypothesis so that I
know that you know what you’re doing.
Here is a list of other things you could investigate in your projects:
5. Miles driven to school by agriculture students vs liberal arts and sciences students.
6. Hours spent per week doing homework between aggies and liberal arts people.
7. Amount spent on groceries per week between males and females at BHE.
8. Amount spent on clothes in the last month between males and females at BHE.
9. Distance from school to parents’ house between aggies and liberal arts people.
10. Number of books read in the last year between males and females at BHE.
These are just a few of the projects that come to my mind. There are hundreds of things
you could measure and compare between departments, genders, ages, parking lots,
buildings, class times, etc.
We have already done, in a fashion, some of these projects in class.
So that you might see a complete example, I will do one here for your examination.
A Comparison of Average Age Between Day and Evening Students at BHE
By: Robert E. Lee
In partial fulfillment of the requirements for Statistics
April 2, 2000
Abstract: This project compares the average ages of students at BHE during the day and
during the evening. The null hypothesis for the project was that the average age between
the day and evening students would be the same. The alternative hypothesis for the
project was that the average age between the day and evening students would be
different. Results of the hypothesis test indicate that the average age between the day and
evening students is not the same and that evening students are likely older than day-time
students.
Introduction: It has long been claimed that the average age of students taking classes at
BHE in the evening is not the same as that of the students taking classes at BHE during
the day. While this claim has been widely circulated and believed by students and staff at
the College no one has, to this writer’s knowledge, actually tested this claim using
standard statistical procedures. Circulation of unsupported claims can make rumor
appear as fact; it is better to subject the claim to standard statistical verification than to
believe an unsupported claim. Therefore, this project is designed to actually test this
claim. The operating hypotheses for this project are as follows:
H0: age-day age-evening
H1: age-day age-evening
Method: Thirty students were randomly selected from day classes. The method
employed in their random selection involved their order of leaving two randomly selected
classes on Friday, March 29, 2000. I asked every-other student leaving these two classes
what their age was; it is assumed that order of leaving a class is random. One of the
classes selected was during the morning and one in the early afternoon. It was assumed
that in doing this, there would be no bias of age incorporated into the data. Thirty
students were randomly selected from evening classes. Again, the method of selection
involved randomly picking two evening classes and then asking every-other student
leaving these two classes what their age was. In this manner of selecting students for
participation in the study that there was no systematic bias operating with respect to
students ages and, as best possible, a random selection of students in both the day and
evening comparison groups was achieved.
Statistic Employed: As there were 30 students in each of two groups (day and evening
students) but the population standard deviation was not known for day and evening
students I chose the t-test for independent samples as the appropriate statistic to test the
claim. The assumptions for the statistic are that the two groups employed are not related
to each other, i.e., independent groups, and, that the ages of the two groups are
approximately normally distributed as well has having variances which are approximately
equal.
Statistic Critical Value: I was interested in a 95% probability of making a correct
decision regarding my conclusion and, hence,  = .05. There were 60 students combined
in the two groups, hence my degrees of freedom in this t-test of independent samples was
58. A table-lookup on the t-table revealed the critical value in this hypothesis test to be
+1.96 and –1.96. If the actual value of the t-statistic was less than –1.96t or greater than
+1.96t, then I would reject the null hypothesis (stated above) and retain the alternative
(also stated above).
Sample Statistics: The average age of the 30 day-time students was 23.2 years of age
with a standard deviation of 3.2 years. The average age of the 30 evening students was
26.8 years with a standard deviation of 4.5 years. As you can see, the standard deviations
between the two groups were not substantially different and so the assumption that the
variances between the two groups are approximately equal is maintained.
Results of the Hypothesis Test: The appropriate statistical test to test the hypothesis has
been indicated above, i.e., a t-test of independent samples. The resulting calculation of
the actual “t” value for this project was –3.5709t. As you can see, this value is less than
the lower critical value of –1.96t. As a result, I am forced to reject the null hypothesis
(that the day- and evening-students are of the same average age) and accept the
alternative hypothesis, i.e., that the day- and evening-students average ages are not the
same. Examination of the average ages reveals the day group (mean=23.2) and evening
group (mean=26.8). In the context of the rejection of the null hypothesis, the data appear
to indicate that the day students are, in fact, younger than the evening students.
Alternatively, we could say that the evening students are, in fact, older than the day
students, with a p-value equivalent to .0007, i.e., the probability that this finding would
occur by chance alone is 7 in 10,000. It is likely the case that, indeed, evening students
are older than day students. Statistical analysis of these data do, in fact, support the claim
that evening students are older than younger students.
Discussion: Claims have been made at BHE that the evening students are older than the
day-time students. Results of the present study appear to strongly support the claim.
While it has historically been thought this was the case we can now specify with
statistical clarity that it is true with a probability of my being in error less than .0007. It
is possible that the selection method of students for the two groups may not have been as
random as possible; my method for their selection was not by random number table or by
computer selection. Nonetheless, this finding is substantial and suggests that evening
students are older than day-time students. It is possible that, based on these findings, the
college may desire to examine its instructional methods for day- and evening-students
considering this age differential.
Summary numbers for Mr. Lee:
Mean(daytime) = 23.2 years; standard deviation = 3.2 years; subjects = 30
Mean(evening) = 26.8 years; standard deviation = 4.5 years; subjects = 30
t-critical values = -1.96 and +1.96
t-actual value = -3.5079
p<.05; actual p-value = .0007
Reject the null hypothesis; analysis supports the claim.
Books used: Statistics Handbook for the TI-83 by Larry Morgan, 1997.
Elementary Statistics by Robert Johnson, 3rd Edition, 1980.