Download February 22

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Coefficient of determination wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
M243
PPSS
Washington’s Birthday
1. Two inference problems.
(a) Can we generalize our conclusions to a larger group of units?
(b) Can we infer cause and effect in the relationship that we have found?
2. Population
A population is a well-defined collection of individuals (the possible cases).
3. Sample
A sample is a subset of the population.
(a) We want our sample to be representative of our population.
(b) A method of choosing a sample that tends to choose a sample that is unrepresentative of the population
in some way is biased.
(c) To ensure an unbiased method, we introduce randomness.
(d) A simple random sample of size n is a sample produced by a method that ensures that each group of n
individuals in the population is equally likely to be the sample.
4. Parameter
A parameter is a numerical characteristic of (a model of) the population. Parameters are (almost)
always denoted by Greek letters.
5. Statistic
A statistic is a numerical characteristic of the sample. A statistic is computed from the data. Statistics
are never denoted by Greek letters.
6. Error.
Sampling error is the error in estimating the parameter (using statistics) that is due to the fact that
we are using a sample rather than the the population. (We will never know how large the sampling
error is.)
Useful R
> sample(x,5,replace=F)
Homework
1. Read Chapter 12, 287–294.
2. Practice problems (due Thursday, February 25)
12.1,3,5,7, 9
3. Problems to turn in (due Friday, February 26)
12.26
M243
PPSS
Washington’s Birthday
> plot(Hand~Height)
> l=lm(Hand~Height)
> l
Call:
lm(formula = Hand ~ Height)
Coefficients:
(Intercept)
-4.7883
Height
0.3756
> summary(l)
Call:
lm(formula = Hand ~ Height)
Residuals:
Min
1Q
-2.99817 -1.12483
Median
0.00183
3Q
1.12407
Max
3.25072
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.78832
5.00777 -0.956
0.347
Height
0.37555
0.07213
5.206 1.75e-05 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 1.534 on 27 degrees of freedom
Multiple R-squared: 0.501,
Adjusted R-squared: 0.4825
F-statistic: 27.11 on 1 and 27 DF, p-value: 1.750e-05
> cor(Hand,Height)
[1] 0.7077955
> anova(l)
Analysis of Variance Table
Response: Hand
Df Sum Sq Mean Sq F value
Pr(>F)
Height
1 63.779 63.779 27.105 1.750e-05 ***
Residuals 27 63.531
2.353
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
> sd(residuals(l))
[1] 1.506309
> sd(Hand)
[1] 2.132322
> plot(residuals(l)~Height)
1
Three most important descriptions of relationship:
1. We would predict an increase of .38 cm in hand span for each increase of 1 in of height.
2. Height “explains” 50% of the variation in hand span, at least for these individuals.
3. We would predict an increase in .7 of a standard deviation in hand span for each increase of 1 standard deviation
in height.