Download Fall 11, Midterm

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Statistics 7110
Instructor: Athanasios C. Micheas, Ph.D.
Midterm examination (in class)
MDLBH 7, 10:00-10:50 p.m., Friday, October 28, 2011
Directions: Create a doc file named “’your name’ SAS exam.docx” and enter
there all your output, comments, plots etc. Email the file to the instructor at the
end of class. Clearly mark your answers, and be sure to answer all questions. Make
sure you include all your input (sas code) and output (output window and graphics
window). Answer all posed questions using SAS procedures only, not proc insight or
the ASSIST module. The dataset needed for the problems can be found in the class
website at http://www.stat.missouri.edu/~amicheas/stat305/datasets. Work on the
problems alone; send the file to: [email protected].
Problem 1. (40 pts) A summary of the U.S. population current, past and projected can
be found in the file USPOP.dat. The study was conducted in 1991 by the U.S.
population reference bureau.
The variables are:
section: section of the country, NE= New England, MW=Midwest and so forth
zone: coded time zone
state: state
pop1991: population in year 1991 in thousands
pop1990: population in year 1990 in thousands
pop1980: population in year 1980 in thousands
pop2010: population in year 2010 in thousands
area: total area in thousands of square miles
popdens: population density in people per square mile
medage: average age
perc18: percentage (proportion) of population under 18 years as of 1991
perc65: percentage of population above 65 years as of 1991
coded: coded section
a) (25 pts) Create a SAS program that will read the data. In order to visualize
differences in the average proportions of people younger than 18 and people older
than 65, with regard to the section of the country, produce a plot with variable section
on the x-axis and perc18 and perc65 on the y-axis. Which section of the country has
the highest average percentage of young people? (younger than 18) Which has the
highest average percentage of older than 65 people? Is it the same section? Use green
color for the perc18 connected line and blue color for the perc65 line. You should
connect the points on the graph. (Hint: You will need to create a dataset that contains
means of perc18 and perc65 for each section to answer this problem. For the plot use
the overlay option in the plot statement, and two symbol statements with the
appropriate color and interpol options)
b) (15 pts) We wish to assess equality of the average proportions between people
younger than 18 (perc18) and people older than 65 (perc65), for the year 1991, using a
statistical procedure. Conduct a formal test for equality of the average proportions and
comment on the results (use a=.05). Make sure to check the validity of the test. You
may certainly assume independence between the two age groups.
Problem 2. (60 pts)
a) (20 pts) Create a sas program that will contain variables named x1-x12, each
containing 20 generated values from a binomial distribution with sample sizes 1,2,3,
...,11,12 respectively and probability of success .3 for all variables. (HINT: recall the
do-loop examples…)
b) (20 pts) Create a horizontal bar chart displaying 12 bars, with each bar having
length equal to the corresponding average of x1, x2,..,x12 from the generated data.
Note: Since we know what the theoretical means of the random variables x1-x12 are,
namely i*.3, i=1,2,...,12, the graph should look like a ladder going up.
c) (20 pts) Looking at the generated data as a 20x12 matrix of integers, compute the
frequency of each possible value 0,1,2,...,12 that the variables x1-x12 may take, and
print your results. (Hint: Create a variable taking values 0,1,2,...,12 and another
variable freq that will contain the frequency of those values in the table.)