Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SAS PROC UNIVARIATE 1 PROC UNIVARIATE This procedure computes statistics about the distribution of a numeric variable. The output includes a computation of the skewness, kurtosis, and Shapiro-Wilk test. In addition, a stem and leaf plot, a box plot, and a normal probability plot can be requested. These plots are character plots. High resolution plots of the histogram, the normal probability plot, and the quantile-quantile plot (q-q plot) are discussed in Sections 2 and 3. The general format of this procedure is: PROC UNIVARIATE DATA=dataset <options>; VAR variable1 variable2 etc.; RUN; If multiple variables are listed in the VAR statement, then the statistics and plots are computed for each variable listed. If no VAR statement is used, the statistics are computed on all the numeric variables in the data set. Example 1 This looks at the output of PROC UNIVARIATE when no additional information is requested. Using the country data, we’ll look at the population in 1992. DATA country; INFILE '/u2/example/Country.dat' FIRSTOBS=2 DLM='09'x; INPUT cont $ country $ pop92 urban gdp lifeexpm lifeexpf birthrat deathrat; RUN; PROC UNIVARIATE DATA=country; VAR pop92; RUN; The output for this code follows: Last Updated: September 3, 2009 SAS PROC UNIVARIATE 2 The SAS System 1 10:14 Tuesday, June 2, 2009 The UNIVARIATE Procedure Variable: pop92 Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 122 40.7485574 134.819843 7.0532748 2401917.47 330.857953 Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean 122 4971.324 18176.39 53.137302 2199343.19 12.206015 Basic Statistical Measures Location Mean Median Mode Variability 40.74856 9.90000 . Std Deviation Variance Range Interquartile Range 134.81984 18176 1169 22.91100 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t Sign Signed Rank t M S Pr > |t| Pr >= |M| Pr >= |S| 3.3384 61 3751.5 0.0011 <.0001 <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 99% 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Min 1169.619 886.362 121.644 59.640 27.351 9.900 4.440 2.376 1.574 1.082 0.739 Extreme Observations -----Lowest---- ------Highest----- Value Obs Value Obs 0.739 1.082 1.106 1.285 1.300 61 6 19 47 27 158.000 195.000 256.561 886.362 1169.619 57 74 67 84 68 End of Example 1 1. Keywords in the PROC Statement These keywords are put in the PROC statement. Last Updated: September 3, 2009 SAS PROC UNIVARIATE 3 ALPHA= This sets the level of significance of " for (1-")100% confidence limits. The default value is 0.05. This parameter only needs to be used if " differs from 0.05. CIPCTLDF This requests a distribution free confidence interval for the quantiles. MU0= This is set equal to a value to be used in the null hypothesis to test :0 in the output section labeled Tests for Location. To test for :0 = 10, use MU0=10. NORMAL requests statistics to test if the distribution of the data is normal. One of these tests is the Shapiro-Wilk test. PLOTS requests the plotting of the stem and leaf plot, box plot and a normal probability plot. These plots are character plots and are not high resolution graphics. The high resolution normal probability plot is discussed in Section 2. Example 2 The following is an example of using PROC UNIVARIATE on the jackknife residuals. In order to further analyze any of the residuals, they must be in a SAS data set. The OUTPUT statement in PROC REG was used to create the data set rescount used in this procedure. The code to read the data, perform the regression and put the residuals into a data set follows: DATA country; INFILE '/u2/example/Country.dat' FIRSTOBS=2 DLM='09'x; INPUT cont $ country $ pop92 urban gdp lifeexpm lifeexpf birthrat deathrat; RUN; PROC REG DATA=country; MODEL lifeexpf = birthrat; OUTPUT OUT=rescount RSTUDENT=jackknife R=resid; RUN; QUIT; To analyze the distribution of the jackknife residual, we use PROC UNIVARIATE as below. PROC UNIVARIATE DATA=rescount PLOTS NORMAL; Last Updated: September 3, 2009 SAS PROC UNIVARIATE 4 VAR jackknife; RUN; The following is the partial output which shows what is created by the NORMAL keyword. Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling W D W-Sq A-Sq Pr Pr Pr Pr 0.977718 0.094688 0.202382 1.037651 < > > > W D W-Sq A-Sq 0.0422 <0.0100 <0.0050 0.0096 Below are the character plots created by the keyword PLOTS. The stem and leaf plot could be replaced by a dot plot of the number of values in the data is sufficiently large. Stem 3 2 2 1 1 0 0 -0 -0 -1 -1 -2 -2 -3 Leaf 4 56777888 22222344 555566667777778999 0001111111122333333344444 4433333222221111111111000000 9987766655555 433111000 997655 310 5 2 ----+----+----+----+----+--- # 1 Boxplot 0 8 8 18 25 28 13 9 6 3 1 1 | | +-----+ | | *--+--* +-----+ | | 0 0 0 Normal Probability Plot 3.25+ * | + | +++++ | ****** * | +***** | +****** | ******* | ******** | ****++ | ****+ | +***** | ++++** |+ * -3.25+* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 End of Example 2 Example 3 In this example the 99% confidence interval for the quantiles is requested and the test of location for :0 = 9. The data for this example is the country data and the variable to be analyzed is the population in 1992. Last Updated: September 3, 2009 SAS PROC UNIVARIATE 5 DATA country; INFILE '/u2/example/Country.dat' FIRSTOBS=2 DLM='09'x; INPUT cont $ country $ pop92 urban gdp lifeexpm lifeexpf birthrat deathrat; RUN; PROC UNIVARIATE DATA=country MU0=9 CIPCTLDF ALPHA=0.01; VAR pop92; RUN; Setting MU0=9 changes the Tests for Location to test the null hypothesis :0=9. Tests for Location: Mu0=9 Test -Statistic- -----p Value------ Student's t Sign Signed Rank t M S Pr > |t| Pr >= |M| Pr >= |S| 2.601058 3 1321.5 0.0105 0.6510 0.0006 The CIPCTLDF and ALPHA=0.01 changes the Quantiles portion of the UNIVARIATE output to add the confidence limits and order statistics. Quantile Estimate 100% Max 99% 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Min 1169.619 886.362 121.644 59.640 27.351 9.900 4.440 2.376 1.574 1.082 0.739 Quantiles (Definition 5) 99% Confidence Limits -------Order Statistics------Distribution Free LCL Rank UCL Rank Coverage 256.561 59.640 44.149 17.631 7.515 2.792 1.106 0.739 0.739 1169.619 1169.619 256.561 56.386 16.095 6.828 3.187 2.376 1.106 120 110 102 80 47 18 3 1 1 122 122 120 105 76 43 21 13 3 58.26 98.99 99.02 99.04 99.16 99.04 99.02 98.99 58.26 End of Example 3 2. Histogram A histogram of the variable can also be requested using the statement: HISTOGRAM; This statement requests a histogram of the variables listed in the VAR statement. The histogram is a high resolution plot and is displayed in the SAS Graph window. Last Updated: September 3, 2009 SAS PROC UNIVARIATE 6 To have a normal distribution curve fitted on the histogram for comparison the keyword NORMAL can be put after a slash. This will use the mean and standard deviation from the data. HISTOGRAM/NORMAL; For other distributions, see the SAS Help and Documentation from the SAS Help Menu or the SAS online documentation (http://support.sas.com/91doc/docMainpage.jsp). Example 4 Using the data set rescount created in Example 2 above the following code requests the analysis of the variable jackknife and a histogram. PROC UNIVARIATE DATA=rescount; VAR jackknife; HISTOGRAM / NORMAL; RUN; The histogram with the normal density curve is displayed in Figure 1. Figure 1 Histogram of Jackknife Residuals End of Example 4 Last Updated: September 3, 2009 SAS PROC UNIVARIATE 3. 7 Normal Probability Plot and Quantile-Quantile Plot Two other plots which used to test if data is normally distributed are the normal probability plot and the quantile-quantile plot (q-q plot). The normal probability plot displays the ordered data against the percentiles of a normal distribution. The variable to be tested is plotted on the y-axis and the percentiles are plotted on the x-axis. To request this plot, the following statement is placed in the PROC UNIVARIATE step. PROBPLOT/NORMAL(MU=EST SIGMA=EST); This will create a normal probability plot with a reference line. Another plot used in testing the normality of the residuals is the q-q plot. Like the normal probability plot, SAS plots the residual on the y-axis and the normal quantiles on the x-axis. The q-q plot is requested using the following statement. QQPLOT/NORMAL(MU=EST SIGMA=EST); This statement requests a q-q plot along with a reference line. Example 5 Further to Example 2, the following PROC UNIVARIATE step, performs analysis of the jackknife residual, and requests the normal probability and qq plots. PROC UNIVARIATE DATA=rescount; VAR jackknife; PROBPLOT/NORMAL(MU=EST SIGMA=EST); QQPLOT/NORMAL(MU=EST SIGMA=EST); RUN; The next 2 figures are the plots generated by this request. Last Updated: September 3, 2009 SAS PROC UNIVARIATE 8 Figure 2 Normal Probability Plot of Jackknife Residuals Figure 3 Quantile-Quantile Plot of Jackknife Residuals End of Example 5 Last Updated: September 3, 2009