Download Review Exercises for Chapter 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Transcript
Review Exercises
Average and Standard Deviation
Chapter 4, FPP, p. 74-76
Dr. McGahagan
Problem 1. Basic calculations. Find the mean, median, and SD of the list x = (50 41 48 54 57 50)
Mean = (sum x) / 6 = 300 / 6 = 50
Median = 50. Note that the list must be sorted first: ( 41 48 50 50 54 57)
Squared deviations from the mean, using the sorted data.
X - Xbar :
( 41 - 50)
(48 - 50) (50 - 50)
(50 - 50)
(54 - 50)
(57 - 50)
Deviations:
-9
-2
0
0
4
7
Squared :
81
4
0
0
16
49
(note that sum = 0)
Sum of squared deviations = 81 + 4 + 0 + 0 + 16 + 49 = 85 + 65 = 150
Variance (mean squared deviation) = 150 / 6 = 25
Standard deviation = square root of variance = 5.0
Half a standard deviation is 2.5; those values within a half SD of the mean are in the interval 50 +/- 2.5,
that is, from 47.5 to 52.5. In the list above, this means the values highlighted in red: ( 41 48 50 50 54 57)
A one SD range goes from 45 to 55; this includes all the values except the final 2.
ASSIGNED. Problem 2. Which list has the smaller SD? Explain.
Information: Both lists have a mean of 50.
Don't calculate; explain the reasoning behind your conclusion
ASSIGNED. Problem 3. Guessing the mean and the SD of the list X =
0.7
1.6
9.8
3.2
5.4
0.8
7.7
6.3
2.2
4.1
8.1
6.5
3.7
0.6
6.9
9.9
8.8
3.1
5.7
9.1
Part a. Guess whether the average is closer to 1, 5, or 10.
Part b. Guess whether the standard deviation is closer to 1, 3, or 6.
Problem 4. Relation of mean and median.
Part a. For income in the US. The distribution will be right-skewed -- there are a few very large income
households, and a much larger number of middle and lower income households. Household income from
Census Bureau's Income, Poverty and Health Insurance Coverage in the US, 2007 , available from
http://www.census.gov/hhes/www/income/income.html, Table A-1, page 31:
Median income = $ 50,233 (with standard error reported at 140); Mean income = $67, 609 (SE = 236).
The standard error is a measure of the likely error of the estimate, related to the SD of the population and to the
size of the sample; we will meet it later in much more detail.
Part b. Years of schooling for US citizens over 25 completed may well be left-skewed -- most likely
median at 12 to 14 years, with a mean possibly pulled down below this by the few without any high school.
Data from Census 2000 for males aged 25 through 34 (19, 902,737 in category):
Less than 9 years
: 1,077,492
9-12 years
: 2,519,649
HS diploma (12)
: 5,464,280
Some college (13) : 4,418,322
Associate degree (14): 1,298,577
Bachelor's (16)
: 3,773,593
Graduate, prof. (18) : 1,350,824
As an additional exercise, translate this data into percentages and find the mean and median and SD
The numbers in parentheses may be taken as the midpoints of the categories for the purposes of this exercise
(they are my guesses rather than the Census Bureau's).
Weighted mean = (sum (percents * midpoints). So create variables for the midpoints and percents:
use (bind midpoints (list 4.5 10.5 12 13 14 16 18)) to create your list of midpoints, and
(bind percents (list of your numbers)), then use the EcLS command (sum (* percents midpoints)) = 12.9 years.
Note that percents should be a decimal for this to work, and no, you don't have to divide by 7.
To find the median, use the command (cusum percents) which will tell you that only the median is in the "some
college" category. The distribution is left-skewed, if only slightly.
ASSIGNED.Problem 5. Unusual blood pressure ?
Explain whether or not the given readings would be considered unusual.
Hint: standardize the scores by subtracting the mean then dividing by the SD:
Problem 6. Sketches of histograms:
(i) Left skewed, with highest point at 75
(ii) Symmetric, with highest point at 50
(iii) Right skewed, with highest point at 25
Note that the median will be closer than the mean to the mode or highest point for skewed distributions.
The left skewed distribution will have the lowest average, and the right skewed distribution the highest,
a. Averages of 40, 50 and 60 are really not in scrambled order.
b. Median < average for (iii); median = average for (ii) and median < average for (i).
c. SD for histogram (ii) [I know the text says (iii), but SD of (ii) should be easier to judge first] is
definitely less than 25 -- shade the area between the 25 and 75 and you have most of the area shaded in; you
should have only about 68 percent of the area shaded in for a "mean +/- one SD" area.
Likewise, shading the area between 45 and 55 would give you a narrow central strip, certainly less than 68 %.
This leaves 15 as the most likely SD here.
d. SD for histogram (iii) will be bigger than for (ii) -- since the mean is off to one side of the highest
point, there will be a lot of points to the left of the mean, and this will run up the sum of squared deviations.
But note that the SD cannot be as high as 50, for any of the distributions, because this would cover ALL the data.
Problem 7. Weights of college students.
Men:
Average = 66 kg; SD = 9 kg
Women: Average = 55 kg; SD = 9 kg
(a) Average and SD in pounds.
For weights: Men = 2.2 * 66 kg = 145 lbs; women = 2.2 * 55 kg = 121 lbs.
For SD: 2.2 * 9 kg. = 19.8
(b) For men, the one SD range will be between 57 and 75 kg; if weights are normally distributed, 68 percent of
men should be in this range.
(c) Weights of men and women together will probably be BIMODAL; the average of equal numbers of both
will be in the middle, at (66 + 55) / 2 = 60.5 kg, and since most men are grouped around 66 kg, and most
women around 55 kg, the SD will be much for the combined group than for either separately.
Simulation: the RNORM command below generates 5000 random numbers with mean 66 and SD 9 for men:
(bind wt0 (rnorm 5000 66 9)) and for women:
(bind wt1 (rnorm 5000 55 9))
Check for whether the mean and sd of these weights agree with what we asked the computer to do.
(they won't perfectly agree -- that's what random means !). Use (mean wt0) and (sd wt0)
Then create the weights in pounds:
(bind lbs0 (* 2.2 wt0)) and (bind lbs1 (* 2.2 wt1)); are means and SDs close to the values in (a) ?
For part b, (hist wt0) followed by shading the bins from 57 to 75: (shadebins 57 75).
For part c, combine the weights of men and women into the variable wts: (bind wts (combine wt0 wt1)), and
find mean and SD, and confirm that the SD is larger. The density plot (an outline sketch of a histogram) shows the
bimodality more clearly than the histogram: create with the command (density-plot wt0 wt1).
ASSIGNED. Problem 8. Average heights of boys and girls.
Given: Boys at age 9: 136 cm.
at age 11: 146 cm.
Average heights of mixed random sample of boys and girls at age 11: 147 cm.
Part a. Are boys taller than girls at age 11? Explain your reasoning.
Part b. Estimate the average height of boys at age 10. Explain your reasoning.
Part c. (not in text). Suppose that the sample of 11 year olds had 600 girls and 400 boys. What would
the average height of girls at 11 be?
ASSIGNED. Problem 9. Mean, median and outliers.
Computer file with 1000 households has incomes in the range $ 5,800 to $ 98,600.
By accident, the highest income gets an extra zero, and is recorded as $ 986,000.
Part a. Is the average affected? If so, what is the new average?
Part b. Is the median affected? If so, what is the new median?
ASSIGNED. Problem 10. Law school scores.
Incoming students have mean LSAT = 163 and SD = 8. Pick a student at random and guess their score.
For each point you are off, you will be penalized (absolute value of actual score - your guess)
Part a. What should you guess the score will be? Answer: 163, since the mean will minimize your likely loss.
Part b. You have about 1 chance in 3 of being more than 1 SD off. If you are more than 8 points off, you will
lose more than $ 8.
Problem 11. Root mean square of losses = the standard deviation. Note however that this is not your "average
loss"
Problem 12." Underclass" Since the percentage of people in poverty has remained roughly constant, can we
conclude that there is a "permanent underclass" ? Not necessarily, since there is no guarantee that the same
individuals are in poverty from one year to the next -- those newly unemployed will join the lowest ranks of the
income distribution, and those newly employed will leave it. Note also that the definition of the poverty line has
changed over time.