Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia, lookup

Transcript
```Module H1 Practical 9
Approximations by the Normal Distribution
1.
Normal approximation to the binomial distribution
The data for this exercise comes from a survey conducted in Mauritius to investigate
cardiovascular (heart) disease among bus drivers and conductors and how it relates to
smoking. The data are available in worksheet BusData in file H1_data.xls. The variables
in the data set are:
Job:
Age:
Height:
Weight:
Serumtri:
Systolic:
Smoking:
Job type (1=Driver, 2=Conductor)
Age (in years)
Height (in metres)
Weight (in Kg)
Serum Trigyceride (in milligrams per decilitre, i.e. mg/dL)
Systolic Blood Pressure (in mg Hg)
Smoking status (1=non-smoker; 2=smoker)
In analysing these data, suppose it is initially of interest to determine the chance that there
will be 3 or fewer smokers in a randomly selected sample of size 10.
(a) First estimate the probability (say p) that one randomly selected person will be a
smoker. Use this to find the mean and standard deviation of the random variable
X=number of smokers in a sample of size 10. Note down your answer below.
p=
;
mean of X =
;
std.dev. of X =
(b) Next compute the exact answer to the question asked by assuming a binomial
distribution for the number of smokers in the sample of 10, stating clearly any assumptions
you make. Use the Excel function BINOMDIST for this purpose (with its last parameter
(c) Now use the normal approximation to the binomial. Note down your answer, compare
it with the exact results in (b) and comment on whether you think the normal
approximation is working well. If not, can you tell why?
Module H1 Practical 9 – Page 1
Module H1 Practical 9
(d) One problem in approximating binomial probabilities by normal probabilities is that a
discrete distribution, i.e. the binomial, is being approximated by a continuous distribution,
i.e. the normal.
This can be partly overcome by noting that any question relating to the binomial (where
the values increment by 1) can be better approximated by moving the binomial value up or
down (as appropriate) by 0.5.
Thus, in the question above, what was required was Pr(X  3). Change this to P(X  3.5),
and compute this probability using normal tables or the Excel function NORMDIST or
NORMSDIST.
State what you now think about the normal approximation. Is it any better?
Module H1 Practical 9 – Page 2
Module H1 Practical 9
2.
Normal approximation to the Poisson distribution
The number X of micro-organisms per litre of drinking water follows a Poisson
distribution with parameter  = 5000.
(a) First compute the mean and standard deviation of the random variable X.
Mean of X =
;
Std. deviation of X =
.
(b) Use the normal approximation to the Poisson distribution to find the probability that
there would be more than 5200 micro-organisms in a litre of drinking water?
(c) What is the probability that in a litre of drinking water, the number of micro-organisms
will be in the range from 4900 to 5100?
(d) Discuss with your neighbour in class whether there will be a practical benefit in the
extent to which the normal distribution approximates the Poisson distribution if some sort
of adjustment, as suggested in part (d) of question 1, is made when computing probabilities
required in parts (b) and (c) above.
Module H1 Practical 9 – Page 3
Module H1 Practical 9
3. The worksheet Distribu in file H1_data.xls, contains data on seven different
variables y1, y2, …., y7. The aim of this exercise is to give you a “feel” for the type of
normal probability plots you might get with data sets having a range of different
distributions.
(a) Produce histograms for each variable and comment on the shape of each below.
Some instructions on how to produce a histogram in Excel has already been given to
you in Question 3 of the previous Practical 9. You may wish to review these
instructions for the purpose of this exercise.
variable
(symmetric, skew, uniform, has a bell shape, etc.)
C1
C2
C3
C4
C5
C6
C7
(b) Now try a normal probability plot for each variate using the menu sequence SSC-Stat,
Visualisation, Normal Probability Plot. What do you conclude about the validity of
normality assumption for each variate? Is it consistent with your answers above
concerning the histograms above?