* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SAS--Proc Means (Descriptive Stats)
Survey
Document related concepts
Transcript
UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas SAS—Proc Means Proc Means is the basic SAS command used to compute descriptive statistics for numeric, measurement variables. (Proc Means should not be used for character/text variables, nor for nominal or ordinal numeric variables.) The keywords below are used to specify the statistics that you want Proc Means to compute and the order to display them in the output. Descriptive statistics keywords used in Proc Means: N (for each variable, gives number of rows in data set with non-missing data) NMISS (for each variable, gives number of rows with missing data for that variable) SUM (for each variable, gives sum of all values in all rows for that variable) MEAN (gives mean of each variable) MEDIAN (gives median of each variable) MODE (gives mode of each variable) MAX (gives max of each variable) MIN (gives min of each variable) RANGE (gives range of each variable) VAR (calculates the variance for each variable) VARDEF= either N or DF (sets degrees of freedom for VAR and STDDEV) STDDEV (calculates the standard deviation for each variable) CV (calculates the coefficient of variation for each variable) SKEWNESS or SKEW (calculates skewness for each variable) KURTOSIS or KURT (calculates kurtosis for each variable) STDERR (calculates the standard error for each sample mean) CLM (calculates two-sided confidence limits for each sample mean) UCLM (calculates one-sided, upper, confidence limit for each sample mean) LCLM (calculates one-sided, lower, confidence limit for each sample mean) maxdec=2 (sets the number of decimal places to show in the output) 1 Examples The following command will calculate a zillion descriptive statistics for every numeric, measurement variable in dataset01. The "vardef=df" command calculates statistics for a data set that is a sample of the population (if your data set covers the entire population, then use "vardef=n" instead). The "maxdec=3" command sets the number of decimal places that will be shown in the output to 3. proc means data=dataset01 vardef=df maxdec=3 N NMISS MEAN MEDIAN MODE VAR STD CV MIN MAX RANGE SUM SKEW KURT; run; Note: Standard error, confidence limits for the mean, and the Student's t-test are calculated for a sample of data (rather than data on the whole population), so you must use VARDEF= DF. Suppose you want descriptive statistics for just some of the variables (particular columns) in your dataset. Use the "var" command to specify which variables you want to analyze. The following commands produce output for variables X1 X2 and X3 only: proc means data=dataset01 vardef=df maxdec=3 N NMISS MEAN MEDIAN MODE VAR STD CV MIN MAX RANGE SUM SKEW KURT; var X1 X2 X3; run; Suppose you want descriptive statistics for just some of the observations (particular rows) in your dataset. First, sort the data set using the Proc Sort command. Second, use a "by" command in Proc Means to produce a set of descriptive statistics for each value of the "by" variable. For example, the following commands sort dataset01 by variable "Region", then Proc Means produces a separate set of output for each Region: proc sort data=dataset01; by Region; run; proc means data=dataset01 vardef=df maxdec=3 N NMISS MEAN MEDIAN MODE VAR STD CV MIN MAX RANGE SUM SKEW KURT; by Region; run; or, after the proc sort, you could use “where” instead of “by”: proc means data=dataset01 vardef=df maxdec=3 N NMISS MEAN MEDIAN MODE VAR STD CV MIN MAX RANGE SUM SKEW KURT; where Region=’coast’; run; the “where” command tells SAS to do proc means for only those data rows that meet the condition specified in the “where” command. 2