Download Solutions 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
STAT301 Solutions 2
(1a)
(i) I count the number of words on a page. On seven pages the word count is
100, 200, 500, 600, 800, 1300, 1600.
What is the mean and standard deviation for the above data set?
The mean is 728.5714 the standard deviation is 552.9144.
X̄
Transform the above data using the z-score Z = X−
. After making this transs.d
formation you should have a new data which is of also of size 7.
Evaluate the mean and standard deviation for this new data set.
The new data set is
−1.1368, −0.9560, −0.4134, −0.2325, 0.1292, 1.0335, 1.5761.
The mean of this transformed data is zero and the standard deviation
is one.
(ii) I measure the length of 7 ants, this is the data
0.01, 0.015, 0.016, 0.02, 0.024, 0.027, 0.03.
What is the mean and standard deviation for the above data set? Transform the
X̄
above data using the z-score Z = X−
. After making this transformation you
s.d
should have a new data which is of also of size 7.
Evaluate the mean and standard deviation for this new data set.
The new data set is
−1.4416, −0.7408, −0.6007, −0.0400, 0.5206, 0.9411, 1.3615.
The mean of this transformed data is zero and the standard deviation
is one.
(b) What happens to both the mean and standard deviation for both the transformed data
sets?
Despite the spread of both data sets being very different. After using the
z-transform both data sets have mean zero and standard deviation one. We
can also see the difference when looking at the numbers. Before transformation (i) takes extremely large and small values, whereas (ii) takes very
small values. After the z-transform both data sets take values mainly in
[-2,2].
(2) Using the instructions in HW1, load the yearly temperature data into Statcrunch.
This temperature is in Celsius. Using the class notes convert the temperature into
Fahrenheit (in Statcrunch go to transform data, then put the expression in the top bar
32 + (9/5) × global - include the global variable by highlighting on the global variable
and pressing add column, then press compute - this should give you a new column with
the global temperature in Fahrenheit). Make relative frequency histogram plots and
summary statistics (both mean and standard deviation) of both the temperature in
Celsius and Farhenheit (you don’t need to use the same x-axis for both plots in this
question).
(i) What is the relationship between the mean and standard deviations in Celsius
and Fahrenheit?
For Celcius the mean and standard deviation are It is clear that FahrenCelsius
Fahrenheit
mean standard deviation
-0.183 0.0527
31.67 0.1709
First Quartile
Third Quartile
0.334
31.39
-0.023
31.958
heit mean = 32 + 95 × (−0.183) and Fahrenheit standard deviation =
9
× 0.0527, which confirms what the relationships between the data and
5
the transformed data.
(ii) If you use Statcrunch’s default binwidth the shape of the relative frequency plots
in Celsius and Fahrenheit will look different. However, in class we said that a
linear transformation should preserve the shape. Can you give an explanation
as to why the histograms you have made have different shapes (hint: count the
number of bins in both histogram)?
The Histograms for both Celsius and Fahrenheit using the default is:
However, choosing a slightly narrowed binwidth for the Fahrenheit is
Figure 1: The left plot is the histogram in Celsius the right plot is the histogram in Fahrenheit.
in Figure 2, this Figure is closer in Shape to the Celsius plot. Thus the
difference between the plots is due to the default binwidths that are
being used (one cannot expect a plot with 5 binwidths to be the same
as one with 7 binwidths).
Figure 2: The Fahrenheit Histogram using 7 bins rather than 5.
(3) Using the instructions in HW1, load the calf weight data (see the link next to this
homework) into Statcrunch (make sure the delimiter is set to comma when uploading
the data). Make a relative frequency histogram of the calf weight data at birth (that
is column Wt 0) and overlay it with the a normal density (in Statcrunch on the third
page of the histogram option in overlay density choose option normal).
You should have a relative frequency histogram with a normal density plot over the
top.
(i) Do you think the calf weights at birth are close to normally distributed? What is
the mean and standard deviation of the calf weights at birth?
There is not a great match, but it does seem plausible.
(ii) Let us assume that the normal plot overlaying the histogram (not that the mean
and standard deviations are calculated from the data) is the true density plot
for the distribution of calf weights at birth. Using this normal density, calculate
probability a new born calf weighs more than 100 pounds.
The normal distribution used is N(93.2,7.78). First we assume that the
weights are normally distributed. Under this assumption to calculate
the chance a randomly selected calf is less than 100 pounds, we need to
make a z-transform (and a plot) Z = (100 − 93.2)/7.78 = 0.874. Looking
this up in the normal tables (look from outside in) we have 0.81. Thus
given that the weight of 8 week calves are normally distributed with
mean 93.2 pounds and standard deviation 7.78, there is 81% chance a
randomly selected calf will be less than 100 pounds.
(4) Suppose that the weight of hogs is normally distributed with mean 250 pounds and
standard deviation 20 pounds. Calculate the probability that the weight of a randomly
selected hog weights between 240 to 290 pounds.
We start by making a z-transform of the weights
240 → z =
240 − 250
= −0.5
20
290 → z =
290 − 250
= 2.
20
The probability we are after is the area between [−0.5, 2] on the standard
normal curve. Looking up the tables, the area less than 2 is 0.977 and the
area less than −0.5 is 0.308. Thus the area between 2 and −0.5 is 0.977−0.308 =
0.669. Thus the chance of a hog’s weight lying between 240 to 290 pounds is
66.9%.
(5) The weight of jersey cows is normally distributed with mean 300 pounds and standard
deviation 25 pounds, whereas the weight of a highland cow is normally distributed with
mean 380 pounds and standard deviation 40. A farmer has a jersey cow that weighs
320 pounds and a highland cow that weighs 400 pounds. Which cow is heavier relative
to their breed of cows (in other words, which cow is in the higher percentile).
Of course in real terms the highland cow is heavier than the jersey cow, but
we can see from the distribution of their respective breeds this is expected.
What we want to see is which cow is in the higher percentile fit her breed.
To do this we make a z-transform and then look these number up in the
tables.
zJ =
320 − 300
= 0.8
25
zH =
400 − 380
= 0.5.
40
Looking up these numbers in the z-tables gives that the jersey cow is in
the 78.8% percentile whereas the highland cow is in the 69.1% percentile.
Thus in terms of breed, the jersey cow is heavier than 78.8% of jersey cows
and is the heavier cow.
(6) Let us return to Question 4 in homework 1, in particular the relative frequency histograms of both the total number of M&Ms in a bag and the TotalGroup (the 34
averages). In Statcrunch (using the same procedure given in Q2) overlay the default
normal densities over both relative frequency histograms. By comparing the relative
frequency histogram with the overlayed normal densities, which is more normal; the
distribution of the total number of M&Ms in a bag or the distribution of average
number of M&Ms (recall this the average of 5 bags)?
Neither plots look like they are very normal. But certainly the plot of
TotalGroup which is closer in structure to a normal - the main feature is
that it is not bimodal, the values are concentrated about the mean, whereas
the histogram of the number of M&Ms is clearly far from normal - indeed
it is bimodal (has two peaks).
Figure 3: Top plot is the Histogram of the number of M&Ms in a bag. The lower plot is a
histogram of the average number of M&Ms in a bag, where the average is taken over 5 bags.