Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Crude Test for Normality: Normal Probability Plots Be gracious with me on this one. I haven’t yet had a chance to proofread and wordsmith it yet.... Recall that the Central Limit Theorem tells us that the sample mean x for samples of size n is approximately normal for any value of n and that the distribution of x becomes more closely normal as the sample size increases. Another consequence of the CLT is that if the original distribution of x is normal, that of x is exactly normal for any n. If n is small and x is not normal, the distribution of x may be far from normal thereby technically invalidating the use of the z- and t-distributions for constructing confidence intervals and performing tests of hypothesis. Although their is no universal agreement regarding what constitutes a sufficiently large sample size, most texts suggest that for n greater than 30 or so, the distribution of x usually can be safely regarded as being “normal enough.” We’ve adopted the point of view that in the absence of any other information, we can use the z distribution if σx is known and we can use the t distribution if σx is not known and must be approximated using the sample standard deviation S. However, we sometimes have to consider numerical results with grains of salt when n is small unless we know that x itself is close to being normally distributed. Sometimes we can use a stem and leaf diagram or a relative frequency histogram to determine whether x is approximately normal. However, if we are dealing with a small set of data, either of these two displays can be awkward to construct and either can give misleading results. There are several quite sophisticated tests for normality available but they are a bit beyond us in this course. An alternative means of crudely assessing normality for small data sets is provided by normal probability plots. These plots are constructed in a way to be roughly linear if x is approximately normal. Here’s the basic idea. Recall that is x is normal with mean µx and standard deviation σx , the corresponding standardized normal is defined by z= x − µx . σx Written another way z = mx + b where the slope m and the intercept b are given by m= 1 σx and b= µx . σx If z is plotted versus x, we thus obtain a straight line. If x is approximately normal, we expect to obtain a graph that is approximately linear. On the TI-83, we can construct a normal probability plot in the following manner. • Enter the x data in a list, say, L1. • Although it is not absolutely necessary, it’s a good idea to sort the data in ascending order. Do this using STAT/EDIT/SortA(L1). • In 2nd STAT PLOT set up Plot 1 as follows. • Select the lower right plot picture for the plot Type. • Choose L1 for the Data List. • For the Data Axis choose X is you want a plot of z vs x and choose Y if you want a plot of x vs z. • Choose either of the three available Mark symbols; the xz or zx data will be plotted using the symbol you choose. • Use ZOOM 9 to generate the desired normal probability plot. • Inspect the plot to see if it is roughly linear indicating that x is roughly normal. We note that the plot is also useful for detecting outliers. Generally, it’s a good idea to delete such outliers from L1 and repeat the process using the reduced list of data. Here is a brief explanation of what the TI does to generate the plot. It first estimates the cumulative frequencies for each of the ordered values xi . Commonly used approximations for these frequencies are fi = i , n+1 i = 1, ..., n and fi = i − 0.375 , n + 0.25 i = 1, ..., n. The expected values of z are next calculated using invNorm(fi ,0,1). These expected z values are used in the normal plot. It should be noted that different statistics packages estimate the fi in various ways but all are based on the idea that z and x should be roughly linear related if x is roughly normal.