Download Group Work STAT 366 1/27/10 1. Which data set has a larger

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

Transcript
Group Work STAT 366 1/27/10
1. Which data set has a larger standard deviation?
A) thirty “30’s” and thirty “90’s”, so that’s 60 data points in all with mean = 60
B) 31,32,33,…88,89,90 again, that’s 60 data points in all with mean = 60.5
(Hint: Don’t try to calculate the actual standard deviations. What does standard deviation measure?)
The answer is (A). In (A), the “distance” between the mean and every data point is 30, so the average of
these “distances”, or SD, must be 30 as well. In (B), this “distance” varies, with only two of the 60 data
points having “distances” around 30, namely 31 and 90, with the rest being smaller, so the average of
these “distances”, or SD, would be smaller for (B).
2.) In the season of 2001-2002, for NFL and NBA salaries, statistics were as follows:
Mean
Median
Standard Deviation
NFL
$1,175,506
$521,660
$1,595,995
NBA
$3,470,946
$2,400,000
$3,798,535
A friend of yours says “Whoa!! I know this info is correct, but how can the SD be bigger than the
mean(xbar)? That means that the range of salaries xbar ± SD would include negative numbers! The
empirical rule for “mound-shaped” data is that this range is supposed to contain about 68% of the data,
but I sure don’t know of any players with negative salaries.” Briefly explain his error.
The relatively small number of huge salaries drag the mean out to the right, and greatly enlarge the SD,
but the main issue here is that the distribution of these salaries, as a result of these few very large ones
mentioned above , is highly right skewed, not “mound-shaped”, so the empirical rule, which your friend
cites as evidence that there’s something fishy about the SD, doesn’t apply.
3) Suppose that, in the city of Kalamazoo, the weekly grocery expenses for a family of four has a mean
of $125 with an SD of $20. Assuming the data is normally distributed, find the % of families which spend
more than $147 per week.
For x=$147, the Zscore=(147-125)/25=1.1, meaning $147 is 1.1 SD’s above the mean. From the Z-table,
this corresponds to an area of 0 .864 under the curve to the left of 1.1. Therefore, the area to the right,
or, in other words, the % of data greater than $147 (1.1 SD’s above the mean) is 1-0.864= 0.136 or
13.6%. Hence 13.6% of the families spend more than $147 per week on groceries.
4)(Hypothetical: Really, I haven’t any reasonable idea) A study was done in Michigan on the homeless
rate among small to large cities (population greater 20,000). The mean rate (homeless person/1000 in
population) was 0.54 with an SD=0.13. Assuming the data is normally distributed, what homeless rate
would a city need to be located in the top 10% (90th percentile or better) ? Make a rough sketch of the
normal curve associated with this data and the location of the value (homeless rate) you found above.
The value of the 90th percentile (the value of which 90% of the data lies to the left) corresponds to the
zscore of 1.28, meaning 1.28 SD’s above the mean . So, for a normal distribution with mean = .54 and
SD=0.13, the 90th percentile will be x where (x-0.54)/0.13 = 1.28, or x= 0.54+ (0.13)*1.28 =0.7064. That
means for a city to be in the top 10%, it has a homeless rate of 0.7064 (homeless persons/1000
population) or better. This means a little less than 3 homeless people for every 4000 in the population.
Distribution Plot
Normal, Mean=0.54, StDev=0.13
3.0
2.5
Density
2.0
1.5
1.0
0.5
0.0
0.1
0.54
X
0.707