Download Reading assignment: SAS textbook, Chapter 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
STAT 6250
Descriptive Statistics
Dr. Fan
Supplementary Materials
Based on the data in HTWT.txt
SAS Code:
DATA HTWT;
INPUT
GENDER $
HEIGHT
WEIGHT
COLLEGE $;
DATALINES;
M 68.5 155 SCI
F 61.2 99 BSNS
F 63.0 115 BSNS
M 70.0 205 SCI
M 68.6 170 ARTS
F 65.1 125 BSNS
M 72.4 220 ARTS
M 69.5 188 SCI
;
RUN;
For Quantitative Variables: PROC UNIVARIATE
PROC UNIVARIATE gives you an extensive statistics for quantitative variables
including normality tests (with the option NORMAL), stemplots and boxplots (with the
option PLOT). You can also get histograms, normal Q-Q plots, and normal probability
plots by HISTOGRAM, QQPLOT, and PROBPLOT, respectively.
SAS Syntax:
PROC UNIVARIATE DATA=dataname NORMAL PLOT;
TITLE “the title you want for the output”;
VAR var1 var2 … vari;
HISTOGRAM var1 var2 … varj;
QQPLOT var1 var2 … vark;
PROBPLOT var1 var2 … varl;
RUN;
Example:
We wish to get descriptive statistics including stemplots and boxplots for the
variable height. Also verify whether the variable height follows a normal
distribution.
SAS Code:
PROC UNIVARIATE DATA=HTWT NORMAL PLOT;
TITLE "DESCRIPTIVE STATISTICS + PLOTS";
1
STAT 6250
Descriptive Statistics
Dr. Fan
VAR HEIGHT WEIGHT;
HISTOGRAM HEIGHT / MIDPOINTS=60 TO 75 BY 5 NORMAL;
INSET
MEAN='Mean' (5.2)
STD='Standard deviation' (5.2);
QQPLOT HEIGHT;
PROBPLOT HEIGHT;
RUN;
Exercise: Obtain descriptive statistics for weight. Is it reasonable to assume weight
follows a normal distribution? Does the mean weight of the population significantly
differ from 160 lb?
Example:
We wish to get the descriptive statistics and histogram for height and for each
gender separately.
SAS Code:
PROC SORT DATA=HTWT;
BY GENDER;
RUN;
/** NUMERICAL AND VISUAL SUMMARIES FOR QUANTITATIVE VAR'S **/
PROC UNIVARIATE DATA=HTWT NORMAL PLOT;
TITLE "MORE DESCRIPTIVE STATISTICS + HISTOGRAMS";
BY GENDER;
VAR HEIGHT;
RUN;
For Categorical Variables: PROC FREQ and PROC GCHART
PROC FREQ outputs frequency tables for categorical variables. PROC GCHART
outputs the bar charts for categorical variables.
SAS Syntax:
PROC FREQ DATA=dataname;
TABLES var1 var2 … vari;
RUN;
PROC GCHART DATA=dataname;
VBAR var1 var2 … vari;
RUN;
Example:
Summarize the variable college and gender.
SAS Code:
2
STAT 6250
Descriptive Statistics
Dr. Fan
PROC FREQ DATA=HTWT;
TITLE “FREQUENCY TABLE”;
TABLES GENDER COLLEGE;
RUN;
/** VISUAL SUMMARY FOR CATEGORICAL VAR'S **/
PROC GCHART DATA=HTWT;
TITLE "BAR CHART";
VBAR GENDER COLLEGE;
RUN;
Relationship among Variables:

1 categorical + 1 quantitative variables
Example: Draw a plot to illustrate how the height of male students distributes differently
to the height of female students. Describe what you see.
/* Sorting the data to be used in the side by side boxplots */
proc sort data=htwt OUT=HTWT_SORTED;
by gender;
run;
PROC BOXPLOT DATA=HTWT_SORTED;
TITLE "SIDE-BY-SIDE BOXPLOT FOR GENDER AND HEIGHT";
PLOT HEIGHT*GENDER; /* QUANTITATIVE*CATEGORICAL */
RUN;

2 categorical variables
Example: Draw a plot to illustrate how the distribution of the colleges of male students
differs from that of female students. Describe what you see.
PROC FREQ DATA=HTWT;
TITLE "TWO-WAY TABLE";
TABLES GENDER*COLLEGE;
RUN;
PROC GCHART DATA=HTWT;
TITLE "SIDE-BY-SIDE BAR CHART FOR GENDER AND COLLEGE";
VBAR COLLEGE / GROUP= GENDER;
RUN;

2 quantitative variables:
Example: Draw a plot to illustrate the relationship between height and weight. Describe
what you see.
PROC GPLOT DATA=HTWT;
TITLE "SCATTERPLOT OF WEIGHT VS. HEIGHT";
PLOT WEIGHT*HEIGHT; /* Y*X */
RUN;
3
STAT 6250

Descriptive Statistics
Dr. Fan
1 categorical + 2 quantitative variables
Example: Draw a plot to illustrate the relationship between height and weight for
different genders. Describe what you see.
PROC GPLOT DATA=HTWT;
TITLE "SCATTERPLOT FOR WEIGHT BY HEIGHT WITH GENDER GROUP";
PLOT WEIGHT*HEIGHT=GENDER;
RUN;

2 categorical + 1 quantitative variables
Example: Draw a plot to illustrate how the height of male students distributes differently
to the height of female students within each college group. Describe what you see.
/* Sorting the data to be used in the side by side boxplots */
Proc Sort Data=HTWT;
By COLLEGE GENDER;
run;
/* Printing the side by side boxplots of height by college and gender*/
Proc Boxplot data = HTWT;
Title "Boxplot of height by College and Gender";
Plot Height*GENDER (COLLEGE);
run;
4