Download MAS113 Fundamentals of Statistics I Practical 6 Using simulation to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
MAS113 Fundamentals of Statistics I
Practical 6
Using simulation to investigate the Normal approximation to the
binomial distribution
For a specified value of p we can see how the grouped histogram from simulated binomial data compares with the normal curve to show that the distribution of the binomial tends to that of a normal random variable as the
parameter n increases.
Use Calc → Random Data → Binomial to simulate 1000 rows of data
from a binomial distribution with Number of trials n = 10 and probability
of success p = 0.2, store the values in C1. So C1 holds the simulations for
X.
To see a histogram with normal approximation do the following: Stat →
Basic Statistics → Display Descriptive Statistics... Select the column of the
binomial values in C1 and in Graphs... tick Histogram of data, with normal
curve, OK, OK. Look at the histogram.
Now simulate 1000 values from the binomial distribution, still with p =
0.2, but use n = 50. Look at the histograms again. As n gets larger the
histogram should look closer to the normal curve.
Repeat with p = 0.5. Erase the data using Data → Erase variables and
close the graphs.
Illustrating the Central Limit Theorem via simulations
We know from the lectures that if take the sample mean of n independent
random variable with finite mean and variance, then as n increases the distribution of the sample mean becomes more like a normal distribution. To
investigate this we generate 1000 values from an Exp(1) random variable in
each of columns c1-c16 (you can do this at one time by entering c1-c16 as
the columns for storing the random data).
You can obtain 1000 simulations of the sample mean of 2 observations
by Calc → Row Statistics then put a tick in ’mean’ and enter c1-c2 in Input
Variables. Store the values in C17. Repeat with means based on columns
c1-c4, c1-c8 and c1-c16, storing the results in c18, c19 and c20 respectively.
Use the histogram with the normal curve (as for the binomial) to see
how well the distribution of the sample mean of 2, 4, 8 and 16 independent
exponentials is approximated by the normal distribution.
1
Using simulation to investigate the joint distribution of the sample
mean and variance when the sample is from a normal distribution
We obtain 1000 simulations of values of Z1 , ..., Z9 , which are an independent
sample size 9 from the N (0, 1)
then compute the corre√ distribution. We
P
2
sponding simulations for U = 9X and V = 8S = 9j=1 (Zj − Z)2 and then
demonstrate that U and V are independent with U ∼ N (0, 1) and V ∼ χ28 .
Use Calc → Random Data → Normal to simulate 9 columns each of
1000 rows of data from a normal distribution with mean zero and standard
deviation one, store the values in C1-C9. So each row of columns C1-C9
holds a simulation of values of Z1 , ..., Zn .
Now use Calc → Row Statistics and put a tick in the box for the mean
to store the corresponding simulated values of X in C10. Repeat, but put a
tick in the box for standard
deviation to store simulated values of the sample
√
2
standard deviation S in C11. Now use calculator so that C12=3*C10
and C13=8*(C11**2) store the values of U = 3X and V = 8S 2 for the
simulations.
Next use Calc → Random Data → Chi-squared to store 1000 simulated
values from χ28 in column C14 for comparison with the simulations for V .
We first look at independence of U and V . Do a scatterplot. What should
this look like if U and V are independent? Calculate the sample coefficient of
correlation (exclude the p-value). Copy these into your report and comment
on the results.
Next we look to see if the distribution of U is N (0, 1). Simulations for
U are in column C12. Use Stats → Basic Statistics → Display Descriptive
Statistics to display only the sample size, mean, standard deviation and
coefficient of skewness for the data in C12 and to plot the histogram with
normal curve. Also look at a test for normality using Stat → Basic Statistics
→ Normality Test to get a probability plot and a test for normality (use the
Kolmogorov-Smirnov test). The hypotheses being tested is that the data is
from a normal distribution. The probability plot is of the observed ordered
values against the expectation of the ordered values (obtained if the normality
hypothesis is correct)which should be very close to the straight line if U has
normal distribution. Copy the results into your report and comment.
Finally we look to see if the distribution of V is χ28 . Display descriptive
statistics C13 and 14 for only the sample size, mean, standard deviation and
2
2
coefficient
q of skewness. Note that a χn distribution has µ = n, σ = 2n and
√
β1 = n8 so the measures from your simulations should be close to these
(with n = 8). Also plot histograms on the same panel for columns C13 and
C14. Use these to compare simulations of V in C13 with simulations from
χ28 in C14. Copy the results into your report and comment.
2