Download HW3_sol - UCLA Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
HOMEWORK #3
SOLUTIONS
Statistics 10
William Anderson,
Teaching Assistant
________________________________________________________________________
#3.1
The frequency table is:
Value
Observed Frequency
1
2
3
4
5
6
7
8
9
10
11
1
22
18
22
13
8
6
1
6
2
1
The mean or average length of a word from the data is 4.4 characters. The average
legnth is found by summing up all of the observations, and then dividing by the
number of observations. The mathematical notation is
1 n
 xi
n i 1
where xi denotes an individual observation and n is the number of observations. In
this case, n  100 .
Mean  X 
The mean or average length of a word from the fequency of occurances is 4.4
characters. This is called the expected value and is denoted
E ( X ) . The expected
value is calculated (for the descrete case) by summing up the values of the
observations multiplied by the probability of the observation. The mathematical
notation is
E  X    xi p xi 
n
i 1
where
xi
denotes an individual observation, pxi  is the probability of the
observation
xi , and n is the number of observations.
In this case, the probability of
the observation is just the observed frequency divided by the number of observations.
For example, the probability of observing a word length of two is 22/100 or 0.220.
The standard deviation of the length of a word is 2.2 characters. This is found by
summing up the squared differences of each observation and the mean, dividing by
the number of observations (less one), and then taking the square root. The
mathematical notation is
 x  X 
n
s
where
X
i 1
2
i
n 1
is the mean of the observations.
Using the estimates derived above, we would expect the distribution of English words
to follow the normal distribution (bell shaped curve) with a mean word length of 4.4
and a standard deviation of 2.2.
The chance of randomly selecting a word of length 2 is
X  4.4 

Pr  X  2  Pr  Z 
  Pr Z  1.1  0.36
2.2 

or 36%.
#3.2
X ~ N 3,4
The standard normal distribution is
Z ~ N 0,1 , so the Z
values are the number of
standard deviations away from the mean. The normal distribution is symmetric about
the mean.
For
x =-5, we have
Z
X 


X 3 53

 2 ,
4
4
or –5 is 2 standard deviations away from the mean.
For
x =11,
Z
X  3 11  3

2
4
4
or 11 is 2 standard deviations away from the mean.
For
x =5,
Z
X 3 53

 0.5
4
4
or 5 is 0.5 standard deviations away from the mean.
For
x =1.4,
Z
X  3 1.4  3

 0.4
4
4
or 1.4 is 0.4 standard deviations away from the mean.
Finally,
Pr  3  X  1  1  Pr  3  X  1   1  Pr  X  3 Pr( X  1) 
C
 3  3
1 3


 1  Pr  Z 
 Pr  Z 
  1  Pr Z  1.5 Pr Z  1 
4 
4 


 1  0.5  0.4331  0.3413  0.5  1  0.9082  0.0918
Note that
Pr(3  X  1) and Pr(3  X  1) are the same, since X is a
continuous variable (for discrete variables these probabilities are different, in
general). In the second probability, it includes the endpoints (-3 and –1), while the
first does not include the endpoints. However, for discrete variables they represent
the same areas under the probability density curve.
Related documents