Download pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
STAT 361 - Applied Statistics I
Syllabus
Fall 2007
Instructor:
Glen Takahara - Jeffery Hall 407
Phone: 533-2430, Email: [email protected]
Course Web Site:
http://www.mast.queensu.ca/ ∼ stat361
The web site is an integral resource for the course. All assignments (and
solutions), links to software and datasets, statistical demos, important announcements, and other resources will be posted here.
Lecture:
Slot 3 (Monday 10:30, Wednesday 9:30, Friday 8:30), Room Jeffery 116
Office Hours:
Wednesday 11:30-1:20, or by appointment
Text:
Applied Regression Analysis, A Research Tool, Second Edition, by J. O.
Rawlings, S. G. Pantula, and D. A. Dickey, Springer, 1998.
Assignments:
There will be 6 homework assignments. These will be posted on the class
web site; no paper copies will be handed out. Assignment 1 is due on
Friday, Sep. 21. Solutions to the assignments will be posted on the course
web page.
Grading:
25% homework, 25% mid-term test, 50% final exam.
Midterm Test
is scheduled for Friday, Oct. 19 in class (8:30-9:20). The midterm will also
have a take-home component.
Prerequisites:
A course in linear algebra; one of STAT 251, 269, 351, or 356; and one of
STAT 261, 263, 264, 267, 367; or permission of the instructor.
Description:
This course is an introduction to applied regression. I plan to cover Chapters
1-11 of the text, plus as much of Chapters 12-14 and Chapter 18 as time
allows. You are expected to read the text in addition to following along in the
lectures. Mandatory reading will be assigned in the homeworks. Computing
will be an integral part of the course (see the section on software below).
Software:
We will use the free software, R, which is a professional grade,
full featured and very powerful statistical computing package, and
is available for the Windows, Mac, and Linux platforms.
Go to
http://www.r-project.org and click on the download link to
download the package. Documentation on R is also available at this web
site. You will be using R for homeworks and the take-home portion of the
midterm exam. I will use R for some classroom lectures and will also provide relevant tutorial information on R in some lectures.
STAT 361 -- Syllabus, Fall 2007
p.2
Course Outline
• Review of Probability, Exploratory Data Analysis, Simple Linear Regression and Linear Algebra: review of probability concepts, including random variables, expectation, variance,
quantiles, probability distributions; review of descriptive statistics and graphical procedures,
including sample mean, sample variance, sample quantiles, scatterplots, histograms, boxplots, and quantile-quantile plots; review of the simple linear regression model, including
estimation, properties of estimates, analysis of variance, hypothesis testing, confidence intervals, and regression through the origin; review of linear algebra, including determinants, matrix inverses, geometry, linear equations, orthogonal transformations and projections, eigenvalues and eigenvectors, singular value decomposition, and principal components. (Class
notes and Chapters 1-2 of text).
• Multiple Linear Regression: the regression model in matrix form, estimation, properties
of estimates, analysis of variance, quadratic forms, hypothesis testing, confidence regions,
geometry of least squares, variable selection. (Chapters 3-7 of text).
• Special Cases: polynomial regression; class variables. (Chapters 8-9 of text).
• Regression Diagnostics: problem areas in regression, diagnostics, transformation of variables, collinearity. (Chapters 10-13 of text).
• Mixed Effects Models: random effects, fixed and random effects, general mixed linear model.
(Chapter 18 of text).
More on Prerequisites
The most important prerequisite or background material for this course is outlined below.
Several families of probability distributions are commonly used in regression modelling. For
this course, the most important by far are the Normal (or Gaussian), t, F and χ2 families, with
the normal distribution being the most important. I expect you to be familiar with the properties
of the (univariate) normal distribution as a prerequisite. I strongly recommend that you spend an
hour or so on your own reading about the basic properties of the t, F , and χ2 distributions, in
particular how they are derived from normal samples, their parametrizations, and their means and
variances. The text assumes you already know this. Any text on mathematical statistics will have
this material, for example Mathematical Statistics and Data Analysis, 2nd Ed. by John A. Rice,
Duxbury Press, 1994.
The text also assumes you are familiar with the basic ideas and methods of hypothesis testing
and confidence intervals. I am assuming you have been introduced to both these concepts in one
of the prerequisite courses.