Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT 361 - Applied Statistics I Syllabus Fall 2007 Instructor: Glen Takahara - Jeffery Hall 407 Phone: 533-2430, Email: [email protected] Course Web Site: http://www.mast.queensu.ca/ ∼ stat361 The web site is an integral resource for the course. All assignments (and solutions), links to software and datasets, statistical demos, important announcements, and other resources will be posted here. Lecture: Slot 3 (Monday 10:30, Wednesday 9:30, Friday 8:30), Room Jeffery 116 Office Hours: Wednesday 11:30-1:20, or by appointment Text: Applied Regression Analysis, A Research Tool, Second Edition, by J. O. Rawlings, S. G. Pantula, and D. A. Dickey, Springer, 1998. Assignments: There will be 6 homework assignments. These will be posted on the class web site; no paper copies will be handed out. Assignment 1 is due on Friday, Sep. 21. Solutions to the assignments will be posted on the course web page. Grading: 25% homework, 25% mid-term test, 50% final exam. Midterm Test is scheduled for Friday, Oct. 19 in class (8:30-9:20). The midterm will also have a take-home component. Prerequisites: A course in linear algebra; one of STAT 251, 269, 351, or 356; and one of STAT 261, 263, 264, 267, 367; or permission of the instructor. Description: This course is an introduction to applied regression. I plan to cover Chapters 1-11 of the text, plus as much of Chapters 12-14 and Chapter 18 as time allows. You are expected to read the text in addition to following along in the lectures. Mandatory reading will be assigned in the homeworks. Computing will be an integral part of the course (see the section on software below). Software: We will use the free software, R, which is a professional grade, full featured and very powerful statistical computing package, and is available for the Windows, Mac, and Linux platforms. Go to http://www.r-project.org and click on the download link to download the package. Documentation on R is also available at this web site. You will be using R for homeworks and the take-home portion of the midterm exam. I will use R for some classroom lectures and will also provide relevant tutorial information on R in some lectures. STAT 361 -- Syllabus, Fall 2007 p.2 Course Outline • Review of Probability, Exploratory Data Analysis, Simple Linear Regression and Linear Algebra: review of probability concepts, including random variables, expectation, variance, quantiles, probability distributions; review of descriptive statistics and graphical procedures, including sample mean, sample variance, sample quantiles, scatterplots, histograms, boxplots, and quantile-quantile plots; review of the simple linear regression model, including estimation, properties of estimates, analysis of variance, hypothesis testing, confidence intervals, and regression through the origin; review of linear algebra, including determinants, matrix inverses, geometry, linear equations, orthogonal transformations and projections, eigenvalues and eigenvectors, singular value decomposition, and principal components. (Class notes and Chapters 1-2 of text). • Multiple Linear Regression: the regression model in matrix form, estimation, properties of estimates, analysis of variance, quadratic forms, hypothesis testing, confidence regions, geometry of least squares, variable selection. (Chapters 3-7 of text). • Special Cases: polynomial regression; class variables. (Chapters 8-9 of text). • Regression Diagnostics: problem areas in regression, diagnostics, transformation of variables, collinearity. (Chapters 10-13 of text). • Mixed Effects Models: random effects, fixed and random effects, general mixed linear model. (Chapter 18 of text). More on Prerequisites The most important prerequisite or background material for this course is outlined below. Several families of probability distributions are commonly used in regression modelling. For this course, the most important by far are the Normal (or Gaussian), t, F and χ2 families, with the normal distribution being the most important. I expect you to be familiar with the properties of the (univariate) normal distribution as a prerequisite. I strongly recommend that you spend an hour or so on your own reading about the basic properties of the t, F , and χ2 distributions, in particular how they are derived from normal samples, their parametrizations, and their means and variances. The text assumes you already know this. Any text on mathematical statistics will have this material, for example Mathematical Statistics and Data Analysis, 2nd Ed. by John A. Rice, Duxbury Press, 1994. The text also assumes you are familiar with the basic ideas and methods of hypothesis testing and confidence intervals. I am assuming you have been introduced to both these concepts in one of the prerequisite courses.