Download Random Sample Box Plot On Base Percentage Random Sample

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
FALL 2013
Team Members:
Boris Jurosevik
Gabriela Guerra
Brett Hoffmann
Yoni Zuniga
Mustafa Alasadi
Nicolas Araneda
The following data from a list of 1340 baseball players has been compiled. These 1340
baseball players have been broken down into categories based on their primary position played along
with being graphed on a pie graph to represent the proportion of each position.
D
1
3
2
S
C
8
139
145
148
154
254
O
492
Grand Total
1340
Total
O
C
S
2
3
1
D
The next data was collected from the above population which we used a random sampling.
1
2
3
C
O
S
Grand Total
1
2
5
6
15
5
34
Total
O
C
S
3
2
1
Total Count
Total
20
15
10
5
0
O
C
S
3
2
Primary Position Played
1
16
14
12
10
8
6
4
2
0
1
2
3
C
O
S
Using a simple random sample this sample was chosen. If we look at the graph we will see first
a random number was generated for each player. Later the numbers were ranked from smallest to
largest. After that the first 34 players were chosen. We created a frequency table to show how many
players play each position from our sample of 34 players,
If we compare the sample and population we could see that it is very similar. In both the
population statistics and our sample statistics, out-fielders have the most players then catchers, and
then shortstops. For this particular statistic our random sample is a good representation of the
population.
The following data was collected from the above population using a convenience sampling.
2
3
C
D
O
S
Grand Total
2
3
7
1
16
4
33
Total
O
C
S
3
2
D
Total
Total Count
20
15
10
5
0
O
C
S
3
2
Primary Position Played
D
18
16
14
12
10
8
6
4
2
0
2
3
C
D
O
S
This sample was chosen using a convenience sample. First the data was arranged
alphabetically by the first name of each player. Then the first 34 players were chosen. From our
sample of 34 players, we created a frequency table to show how many players play each position.
The sample in this case is also very similar to the population. In both the population data and
our sample data, out-fielders have the most players then catchers, and then shortstops. I think for this
particular statistic our convenience sample is a good representation of the population.
The population of baseball players referenced above was found to have a mean ‘on base
percentage’ of 0.336 with a standard deviation of 0.034. A random sample of 30 players was then
generated from the population of 1340 baseball players. The sample has a mean ‘on base percentage’
of 0.344 with a standard deviation of 0.032. The boxplot and frequency histogram below show the
distribution of the sample values.
Random Sample Box Plot
On Base Percentage
Random Sample Frequency Histogram
On Base Percentage
16
14
Frequency
12
10
8
6
4
2
0
0.250 - 0.279 0.280 - 0.309 0.310 - 0.339 0.340 - 0.369 0.370 - 0.399 0.400 - 0.429
On Base percentage
Another sample of 30 players was chosen from the population of 1340 baseball players, this
time using a systematic sampling method. The sample has a mean ‘on base percentage’ of 0.344 with
a standard deviation of 0.030. The boxplot and frequency histogram below show the distribution of
the sample values.
Systematic Sample Box Plot
On Base Percentage
Systematic Sample Frequency Histogram
On Base Percentage
Systematic Sample
14
Frequency
12
10
8
6
4
2
0
0.250 - 0.279 0.280 - 0.309 0.310 - 0.339 0.340 - 0.369 0.370 - 0.399 0.400 - 0.429
On Base Percentage
Both of the sample methods used above generated a mean and standard deviation of ‘on base
percentage’ similar to the population mean and standard deviation. Based on the box plots and
frequency histograms there seems to be a normal distribution. The box plots for both sampling
methods have a similar shape. Both of the frequency histograms start low, rise to a clear high point,
and then fall again. It appears that the random sample and systematic sample both yielded results
similar to what we would expect to see in the population.
-
On page two a random sample of 34 baseball players was generated. 15 of those players listed
‘Outfielder’ as their primary position played. The following 95% confidence interval estimates
the proportion of ‘Outfielders’ in the population of baseball players is 0.280 < p < 0.602.
-
On page three a convenience sample of 33 baseball players was generated. 16 of those players
listed ‘Outfielder’ as their primary position played. The following 95% confidence interval
estimates the proportion of ‘Outfielders’ in the population of baseball players is 0.314 < p <
0.655.
-
On page four a random sample of 30 baseball players was generated. This sample of players
showed a mean ‘on base percentage’ 0.344 with a standard deviation of 0.032. The following
95% confidence interval estimates the mean ‘on base percentage’ of the population of baseball
players is 0.332 < μ < 0.356.
-
On page five a systematic sample of 30 baseball players was generated. This sample of players
showed a mean ‘on base percentage’ 0.344 with a standard deviation of 0.030. The following
95% confidence interval estimates the mean ‘on base percentage’ of the population of baseball
players is 0.333 < μ < 0.355.
These confidence intervals indicate that we are 95% sure that the population proportion (1
and 2) and the population mean (3 and 4) fall within the above noted ranges. In all four cases the
actual population parameter falls within the confidence interval.
Part V
Given that the mean population of our sample was 0.336 and the standard deviation was 0.034, we
took a sample of 30 players and calculated a sample mean of 0.344 with a sample standard deviation
of 0.32. Since we were 95 percent confident that our numbers were between the above stated
intervals, we decided that our claim was that population mean was not equal to 0.336. Calculating the
proportion with the known population standard deviation, we got our test statistic of 1.288. Since it
was a 2-tail calculation we calculated a Critical Value of -1.96. Because our T-Statistic was 1.288, it
falls within the 95% confidence. Thus concluding that we fail to reject the null hypothesis, and so,
there is not enough evidence to conclude that our numbers fall within those parameters.
Part VI
This project has helped us understand all the concepts that we have learned from the beginning of the
semester to now. It applies in many situations because they help us understand statistical numbers in
our daily lives. We have developed statistical reasoning from the skills learned in this class and will
apply it to any life situation that would require statistical use.