Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia, lookup

Time series wikipedia, lookup

Choice modelling wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Regression analysis wikipedia, lookup

Linear regression wikipedia, lookup

Coefficient of determination wikipedia, lookup

Transcript
```Simple Linear Regression and
Correlation: Inferential Methods
Chapter 13
AP Statistics
Peck, Olsen and Devore
Topic 2: Summary of Bivariate Data



In Topic 2 we discussed summarizing
bivariate data
Specifically we were interested in
summarizing linear relationships between
two measurable characteristics
We summarized these linear relationships
by performing a linear regression using
the method of least squares
Least Squares Regression

Graphically display the data in a scatterplot


Calculate the Pearson’s Correlation Coefficient



Determine if the model is appropriate
No patterns
Determine the Coefficient of Determination


yˆ  a  bx
Inspect the residual plot


The strength of the linear association
Perform the least squares regression


Form, strength and direction
How good is the model as a prediction tool
Use the model as a prediction tool
Interpretation




Pearson’s correlation coefficient
Coefficient of Determination
Variables in yˆ  a  bx
Standard deviation of the residuals
Minitab Output
Simple Linear Regression Model





‘Simple’ because we had only one independent variable yˆ  a  bx
We interpreted
as a predicted value of y given a specific
value of x
When y  f (x) we can describe this as a deterministic
model. That is, the value of y is completely determined by a
given value x
That wasn’t really the case when we used our linear
regressions. The value of y was equal to our predicted value
+/- some amount. That is, y  a  bx  e
We call this a probabilistic model.
So, without e, the (x,y) pairs (observed points) would fall on
the regression line.
ŷ
Now consider this …



How did we calculate the coefficients in our
linear regression models?
We were actually estimating a population
parameter using a sample. That is, the
simple linear regression y  a  bx  e is an
estimate for the population regression line
y    x  e
We can consider a, b
estimates for  , 
Basic Assumptions for the Simple
Linear Regression Model




The distribution of e at any particular value
of x has a mean value of 0. That is, e  0
The standard deviation of e is the same for
any value of x. Always denoted by
The distribution of e at any value of x is
normal
The random deviations are independent.

Another interpretation of ŷ



Consider , y    x  e where the
coefficients are fixed and e is distributed
normally. Then the sum of a fixed number
ŷ
and a normally distributed
variable is
normally distributed (Chapter 7). So y is
normally distributed.
Now the mean of y will be equal to   x
plus the mean of e which is equal to 0
So another interpretation is the mean y
value for a given x value =   x
Distribution of y



Where y    x  e we can now see that
y is distributed normally with a mean of   x
The variance for y is the same as the
2
variance of e -- which is 
2
2
An estimate for  is se
Assumption


The major assumption to all this is that
the random deviation e is normally
distributed.
We’ll talk more about how this assumption
is reasonable later.
Inferences about the slope of the
population regression line

Now we are going to make some
inferences about the slope of the
regression line. Specifically, we’ll
construct a confidence interval and then
perform a hypothesis test – a model utility
test for simple linear regression
Just to repeat …

We said the population regression model is
y    x  e

The coefficients of this model are fixed but
unknown (parameters) – so using the method
of least squares, we estimate these
parameters using a sample of data (statistics)
and we get
y  a  bx  e
Sampling distribution of b


We use b as an estimate for the
population coefficient
in the simple
regression model
b is therefore a statistic determined by a
random sample and it has a sampling
distribution

Sampling distribution of b

When the four assumptions of the linear
regression model are met


The mean value of the sampling distribution of b
is  . That is, b  
The standard deviation of the statistic b is
b 


 x  X 
2
The sampling distribution of b is normally
distributed.
Estimates for …

The estimate for the standard deviation of b is
sb 

se
 x  X 
2
When we standardize b it has a t distribution
with n-2 degrees of freedom
b
t
sb
Confidence Interval


Sample Statistic +/- Crit Value * Std Dev of Stat
b  t  sb
*
Hypothesis Test


We’re normally interested in the null H o : 
because if we reject the null, the data
suggests there is a useful linear relationship
between our two variables
We call this ‘Model Utility Test for Simple
Linear Regression’
0
Summary of the Test



Ho :   0
Test Statistic
HA :   0
b
t
sb
Assumptions are the same four as those for
the simple linear regression model.
Minitab Output
```
Related documents