Download Proc Univariate: Generating Line Printer Plots

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Power law wikipedia , lookup

Time series wikipedia , lookup

Transcript
AGRO 6005
Conferencia 5
Uso de Procedimientos Gráficos
Una buena introducción a los gráficos en SAS está en la sección 1.9 del libro de Der y Everitt. Ver
también la descripción gráfica de un conjunto de datos en el capítulo 2 del mismo libro.
Procedimientos GCHART y GPLOT
PROC CHART | GCHART DATA=_______;
BY ________;
VBAR _________ / options;
HBAR _________ / options;
PIE
;
BLOCK
;
STAR
;
Opciones:
levels = 5
midpoints = 10 20
10 to
subgroup =
type = freq | pct
sumvar = (variable
group = (gráficos
30
100 by 10
| sum | mean
sobre la que se calculará la media o la suma)
adyacentes)
PROC PLOT|GPLOT DATA=__________;
BY ______;
PLOT yvar*xvar=variable
yvar*xbar=’char’
yvar*(xbar1 xbar2)
yvar1*xbar1=’char1’ ybar2*xbar2=’char2’ / overlay
Gráficos por menú:
SAS / ASSIST
(Solutions > ASSIST)
El siguiente material está tomado del manual online de SAS.
Proc Univariate: Generating Line Printer Plots
The PLOTS option in the PROC UNIVARIATE statement provides up to four diagnostic line printer
plots to examine the data distribution. These plots are the stem-and-leaf plot or horizontal bar chart, the
box plot, the normal probability plot, and the side-by-side box plots. If you specify the WEIGHT
statement, PROC UNIVARIATE provides a weighted histogram, a weighted box plot based on the
weighted quantiles, and a weighted normal probability plot.
Box Plot
The box plot, also known as a schematic plot, appears beside the stem-and-leaf plot. Both plots use the
same vertical scale. The box plot provides a visual summary of the data and identifies outliers. The
bottom and top edges of the box correspond to the sample 25th (Q1) and 75th (Q3) percentiles. The box
length is one interquartile range (Q3 - Q1). The center horizontal line with asterisk endpoints
corresponds to the sample median. The central plus sign (+) corresponds to the sample mean. If the
mean and median are equal, the plus sign falls on the line inside the box. The vertical lines that project
out from the box, called whiskers, extend as far as the data extend, up to a distance of 1.5 interquartile
ranges. Values farther away are potential outliers. The procedure identifies the extreme values with a
zero or an asterisk (*). If zero appears, the value is between 1.5 and 3 interquartile ranges from the top
or bottom edge of the box. If an asterisk appears, the value is more extreme.
To generate box plot using high-resolution graphics, use the BOXPLOT procedure in SAS/STAT
software.
Normal Probability Plot
The normal probability plot is a quantile-quantile plot of the data. The procedure plots the empirical
quantiles against the quantiles of a standard normal distribution. Asterisks (*) indicate the data values.
The plus signs (+) provide a straight reference line that is drawn by using the sample mean and standard
deviation. If the data are from a normal distribution, the asterisks tend to fall along the reference line.
Side-by-Side Box Plots
When you use a BY statement with the PLOT option, PROC UNIVARIATE produces full-page side-byside box plots, one for each BY group. The box plots (also known as schematic plots) use a common
scale that allows you to compare the data distribution across BY groups. This plot appears after the
univariate analyses of all BY groups. Use the NOBYPLOT option to suppress this plot.
Generating High-Resolution Graphics
If your site licenses SAS/GRAPH software, you can use the HISTOGRAM statement, PROBPLOT
statement, and QQPLOT statement to create high-resolution graphs.
The HISTOGRAM statement generates histograms and comparative histograms that allow you to
examine the data distribution. You can optionally fit families of density curves and superimpose kernel
density estimates on the histograms.
The PROBPLOT statement generates a probability plot, which compares ordered values of a variable
with percentiles of a specified theoretical distribution. The QQPLOT statement generates a quantilequantile plot, which compares ordered values of a variable with quantiles of a specified theoretical
distribution. Thus, you can use these plots to determine how well a theoretical distribution models a set
of measures.
PROC BOXPLOT: Overview
The BOXPLOT procedure creates side-by-side box-and-whisker plots of measurements
organized in groups. A box-and-whisker plot displays the mean, quartiles, and minimum and
maximum observations for a group. Throughout this chapter, this type of plot, which can
contain one or more box-and-whisker plots, is referred to as a box plot.
The PLOT statement of the BOXPLOT procedure produces a box plot. You can specify more
than one PLOT statement to produce multiple box plots.
You can use options in the PLOT statement to






control the style of the box-and-whisker plots
specify one of several methods for calculating quantile statistics (percentiles)
add block legends and symbol markers to reveal stratification in data
display vertical and horizontal reference lines
control axis values and labels
control the layout and appearance of the plot
Controlando los gráficos y la salida en SAS
Para manejar la salida gráfica a otras ventanas diferentes de la gráfica, debemos definir el
“graphic device” mediante la opción goptions. Algunas de las opciones interesantes para usar
son:
goptions device=activex;
goptions device=java;
goptions device=gif;
data maiz;
input local & $10. siembra : mmddyy8. hibrido $ repet peso
rendim altura longmazo diammazo;
format siembra date9.;
messiem=month(siembra);
if .<longmazo<16 then mazorca='MAZORCA LARGA';
else mazorca='MAZORCA CORTA';
if rendim>7000 then catrend='1.ALTO';
if 5000<rendim<=7000 then catrend='2.MEDIO';
if .<rendim<=5000 then catrend='3.BAJO';
datalines;
Juana Diaz
03/20/96 MLxHU3 1 70 2735 214 17.5 4.3
Juana Diaz
03/20/96 TTxHU5 1 69 4109 220 18.9 4.1
…
Isabela 12/12/96 TTxHU5 4 76.5 2003 190 15.8 4.5
Isabela 12/12/96 MLxTT1 4 75.5 5056 210 17 5
Isabela 12/12/96 JRExRE 4 76 4485 210 16.3 4.6
;
proc sort data=maiz;
by local hibrido;
ods rtf ;
ods html ;
goptions device=activex;
proc univariate noprint data=maiz;
by local;
histogram rendim / normal;
probplot altura / normal(mu=est sigma=est) pctlminor;
proc boxplot data=maiz;
by local;
plot altura*hibrido;
run;
proc gchart data=maiz;
vbar hibrido / type=mean sumvar=diammazo discrete;
vbar rendim / type=freq group=local levels=8;
pie catrend / type=freq ;
proc gplot data=maiz;
plot longmazo*diammazo=local;
plot longmazo*siembra='l' diammazo*siembra='d' / overlay;
run;
data hat;
do x=-5 to 5 by 0.25;
do y=-5 to 5 by 0.25;
z=sin(sqrt(x*x+y*y));
output;
end;
end;
run;
proc g3d data=hat;
plot y*x=z;
run;
ods html close;
ods rtf close;