Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

Module H1 Practical 9 Approximations by the Normal Distribution 1. Normal approximation to the binomial distribution The data for this exercise comes from a survey conducted in Mauritius to investigate cardiovascular (heart) disease among bus drivers and conductors and how it relates to smoking. The data are available in worksheet BusData in file H1_data.xls. The variables in the data set are: Job: Age: Height: Weight: Serumtri: Systolic: Smoking: Job type (1=Driver, 2=Conductor) Age (in years) Height (in metres) Weight (in Kg) Serum Trigyceride (in milligrams per decilitre, i.e. mg/dL) Systolic Blood Pressure (in mg Hg) Smoking status (1=non-smoker; 2=smoker) In analysing these data, suppose it is initially of interest to determine the chance that there will be 3 or fewer smokers in a randomly selected sample of size 10. (a) First estimate the probability (say p) that one randomly selected person will be a smoker. Use this to find the mean and standard deviation of the random variable X=number of smokers in a sample of size 10. Note down your answer below. p= ; mean of X = ; std.dev. of X = (b) Next compute the exact answer to the question asked by assuming a binomial distribution for the number of smokers in the sample of 10, stating clearly any assumptions you make. Use the Excel function BINOMDIST for this purpose (with its last parameter as TRUE), and write down your answer below. (c) Now use the normal approximation to the binomial. Note down your answer, compare it with the exact results in (b) and comment on whether you think the normal approximation is working well. If not, can you tell why? SADC Course in Statistics Module H1 Practical 9 – Page 1 Module H1 Practical 9 (d) One problem in approximating binomial probabilities by normal probabilities is that a discrete distribution, i.e. the binomial, is being approximated by a continuous distribution, i.e. the normal. This can be partly overcome by noting that any question relating to the binomial (where the values increment by 1) can be better approximated by moving the binomial value up or down (as appropriate) by 0.5. Thus, in the question above, what was required was Pr(X 3). Change this to P(X 3.5), and compute this probability using normal tables or the Excel function NORMDIST or NORMSDIST. State what you now think about the normal approximation. Is it any better? SADC Course in Statistics Module H1 Practical 9 – Page 2 Module H1 Practical 9 2. Normal approximation to the Poisson distribution The number X of micro-organisms per litre of drinking water follows a Poisson distribution with parameter = 5000. (a) First compute the mean and standard deviation of the random variable X. Mean of X = ; Std. deviation of X = . (b) Use the normal approximation to the Poisson distribution to find the probability that there would be more than 5200 micro-organisms in a litre of drinking water? (c) What is the probability that in a litre of drinking water, the number of micro-organisms will be in the range from 4900 to 5100? (d) Discuss with your neighbour in class whether there will be a practical benefit in the extent to which the normal distribution approximates the Poisson distribution if some sort of adjustment, as suggested in part (d) of question 1, is made when computing probabilities required in parts (b) and (c) above. SADC Course in Statistics Module H1 Practical 9 – Page 3 Module H1 Practical 9 3. The worksheet Distribu in file H1_data.xls, contains data on seven different variables y1, y2, …., y7. The aim of this exercise is to give you a “feel” for the type of normal probability plots you might get with data sets having a range of different distributions. (a) Produce histograms for each variable and comment on the shape of each below. Some instructions on how to produce a histogram in Excel has already been given to you in Question 3 of the previous Practical 9. You may wish to review these instructions for the purpose of this exercise. variable Comments on shape of histogram (symmetric, skew, uniform, has a bell shape, etc.) C1 C2 C3 C4 C5 C6 C7 (b) Now try a normal probability plot for each variate using the menu sequence SSC-Stat, Visualisation, Normal Probability Plot. What do you conclude about the validity of normality assumption for each variate? Is it consistent with your answers above concerning the histograms above? SADC Course in Statistics Module H1 Practical 9 – Page 4