Download Notes - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Statistics 112 Notes 1
Reading: Review Chapter 2.
I. Basic Goal of Statistical Inference
Population
Inference about
population using
statistical tools
Sample
of Data
Example: The data set birthweight.JMP contains a sample
of the birthweights of 1236 babies born in the United States
from the Child Health and Development Studies. From this
sample, we would like to know something about the
population distribution of birthweights in the United States
– this would help doctors to judge when a baby has an
abnormally low or large birthweight.
II. Properties of Random Variable
A random variable is a variable whose value is a numerical
outcome of a random phenomenon.
Examples:
1. Toss two fair coins. The number of heads Y in the two
tosses is a random variable.
2. Observe the birthweight of a randomly chosen baby
from the U.S. population. The baby’s birthweight is a
random variable.
Probability distribution of a random variable: The
proportion of times the random variable will take on each
of its possible values in repeated repetitions of the random
phenomenon.
Example 1: For tossing two fair coins independently,
1
1
1
P(Y  0)  , P(Y  1)  , P(Y  2) 
4
2
4
Independence of random variables: Captures the idea that
two random variables X and Y are unrelated, that knowing
the value of X does not help to predict Y . The formal
definition is that X and Y are independent if the chance
that simultaneously X  x and Y  y can be found by
multiplying the separate probabilities:
P( X  x, Y  y )  P( X  x) * P(Y  y ) for every x, y .
Check your understanding:
For the population of people, do you think X  height and
Y  weight are independent?
For undergraduates, is it plausible that X  age and
Y  gender are independent?
If I flip two fair coins, a dime and a quarter, so that
1
P( HH )  P( HT )  P(TH )  P(TT )  , then is it true or
4
false that getting a head on the dime is independent of
getting a head on the quarter?
Expected value (mean) of a random variable: The mean
value of the random variable over repeated repetitions of
the random phenomenon. The expected value of a random
variable is the sum of its possible values weighted by their
probabilities.
Example 1 continued: For tossing two fair coins
independently,
1
1
1
E (Y )  0   1  2   1 ,
4
2
4
so I expect 1 head when I flip two fair coins. I might
actually get 0 heads or 2 heads, but 1 heads is what is
expected on average.
Variance and standard deviation: The standard deviation of
a random variable Y measures how far Y typically is from
its expectation E (Y ) . Being too high is as bad as being too
low – we care about errors and don’t care about their signs.
So we look at the squared difference between Y and E (Y ) ,
2
namely D  {Y  E (Y )} , which is, itself, a random
variable. The variance of Y is the expected value of D and
the standard deviation is the square root of the variance,
Var (Y )  E[{Y  E (Y )}2 ] and SD(Y )  Var (Y ) .
Example 1 continued: Toss two fair coins independently.
1
1
1
P(Y  0)  , P(Y  1)  , P(Y  2)  , E (Y )  1 .
4
2
4
D  {(Y  E (Y ))2 } takes the value (0  1)2  1 with
2
probability ¼, the value (1  1)  0 with probability ½ and
2
the value (2  1)  1 with probability ¼. The variance of
Y is the expected value of D namely:
1
1
1 1
Var (Y )  E ( D)  1*  0*  1*  .
4
2
4 2
So the standard deviation is
1
SD(Y )  Var (Y ) 
 0.707 .
2
So when I flip two fair coins, I expect one head but often I
get 0 or 2 heads instead, and the typical deviation from
what I expect is 0.707 heads. This 0.707 reflects the fact
that I get exactly what I expect, namely 1 head, half the
time, but I get 1 more than I expect a quarter of the time,
and one less than I expect a quarter of the time.
Check your understanding:
If a random variable has zero variance, how often does it
differ from its expectation?
Consider the height Y of a randomly chosen adult male in
the U.S.
What is a reasonable number for E (Y ) ? Pick one: 4 feet,
5’9’’, 7 feet.
What is a reasonable number for SD (Y ) ? Pick one: 1 inch,
4 inches, 3 feet.
III. Normal distribution
Continuous random variable: A continuous random
variable can take values with any number of decimals, like
1.2361248912. Weight measured perfectly, with all the
decimals and no rounding, is a continuous random variable.
Because it can take so many different values, each value
winds up having probability zero. If I ask you to guess
someone’s weight, not approximately to the nearest
millionth of a gram, but rather exactly to all the decimals,
there is no way you can guess correctly – each value with
all the decimals has probability zero. But for an interval,
say the nearest kilogram, there is a nonzero chance that you
can guess correctly. This idea is captured by the density
function.
Practical Note: We often will model a random variable as
being continuous even if there are many values it can take
on even if there are only a finite number of values it can
take on. For example, it is reasonable to model child’s
birthweight in ounces as a continuous random variable.
Density functions: A density function defines probability
for a continuous random variable. It attaches zero
probability to every number, but positive probability to
ranges (e.g., nearest kilogram). The probability that the
random variable Y takes values between 3.9 and 6.2 is the
area under the density function between 3.9 and 6.2. The
total area under the density function is 1.
Normal distribution: A random variable is said to have a
Normal distribution if it has the Normal density, which is
the familiar bell shaped curve:
The standard Normal distribution has expected value 0 and
standard deviation 1. The probability that a random
variable with a standard Normal distribution takes on
values between -1 and 1 is about 2/3 and the probability
that a standard normal variable takes on values between -2
and 2 is about .95 (To be more precise, there is a 95%
chance that a standard Normal random variable will be
between -1.96 and 1.96).
If Z is a standard Normal random variable and  and
  0 are two numbers, then Y     Z has the normal
distribution with mean  and standard deviation  . The
density function for Y continues to be bell shaped.
For a random variable Y with mean  and standard
deviation  , we can find the probability that Y is between
a and b using the standard normal density:
a
b
P ( a  Y  b)  P ( a     Z  b)  P (
Z 
).


Check your understanding: A company that offers an
expensive stereo component is considering offering a
warranty on the component. Suppose the population of
lifetimes of the components is a normal distribution with a
mean of 84 months and a standard deviation of 7 months.
What is the probability that a randomly chosen stereo’s
component will last between 86 and 90 months?
IV. Inference.
The population is typically described by certain parameters
about which we would like to make inferences based on a
sample.
Suppose a population is such that a randomly chosen
member of a population has a normal distribution with
mean  and standard deviation  . The parameters  and
 describe the population.
We obtain a random sample from the population Y1 , , Yn .
A random sample means that we draw at random
individuals from the population with equal probability.
We would like to make inferences about  and  :
Point Estimates – best estimates of  and  .
95% Confidence Intervals for  – interval that is likely to
contain the true  , interval will contain the true in 95% of
random samples
Point estimates:
1 n
ˆ   i 1 Yi
n
1
n
2
ˆ
ˆ 
(
Y


)
 i
n  1 i 1
ˆ
ˆ


2
Approximate 95% CI:
n
JMP computes these estimates automatically.
Click Analyze, then Distribution. Put Y variable in Y,
Columns.
For birthweight data,
Distributions
Birthweight
50 60 70 80 90
110
130
150
170
Quantiles
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
176.00
169.82
155.08
142.30
131.00
120.00
108.25
97.00
81.00
65.56
55.00
Moments
Mean
Std Dev
Std Err Mean
upper 95% Mean
lower 95% Mean
N
119.57686
18.236452
0.5187177
120.59453
118.5592
1236
ˆ  119.58, ˆ  18.24
95% CI : (118.56,120.59)
Check your understanding:
Based on these estimates, what is the approximate
probability that a baby will have birthweight less than 100
ounces?
Preview of the rest of the course: While understanding the
distribution of birthweights is of some interest, a more
interesting question is how do birthweights vary with
certain charactertics, such as whether the mother smokes.
This course will focus on making inferences about the
mean of a response variable Y (e.g., birthweight) for the
subpopulation of individuals with covariates X 1 , , X p
(e.g., whether mother smokes, gestation length) and making
inferences about how the mean of Y changes as
X 1 , , X p change.