Download Stat_311_REVIEW

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
(Quick & Dirty)
STAT 311 REVIEW
 Chapter 1 - Overview and Descriptive Statistics
 Chapter 2 - Probability
 Chapter 3 - Discrete Random Variables and
Probability Distributions
 Chapter 4 - Continuous Random Variables and
Probability Distributions
 Chapter 5 - Joint Probability Distributions and
Random Samples
 Chapter 6 - Point Estimation
Numerical
Random Variable X
assigns a number to each pop unit
Categorical
Continuous
Discrete
k = 2 categories Binary (0/1)
k > 2 categories
Density f(x)
Population Distribution of X = Right foot length (mm)
f ( x) 
probability density
function (pdf)
Properties?
f ( x)  0

 f ( x) dx  1
The probability density curve pictured above is “skewed to the right”
(or “positively skewed”). But many other possibilities exist, such as:
• skewed to the left (i.e., negatively skewed)
• symmetric (no skew)
• unimodal (one peak)
• bimodal (two peaks)
• uniform (i.e., flat) over some finite interval
• normal (i.e., the “bell curve”)
2
Numerical
Random Variable X
assigns a number to each pop unit
Categorical
Continuous
Discrete
k = 2 categories Binary (0/1)
k > 2 categories
Population Distribution of X = Right foot length (mm)
Density f(x)
f ( x) 

probability density
function (pdf)
Properties?
f ( x)  0

 f ( x) dx  1
Parameters
mean

  E[ X ]   x f ( x) dx


variance   E ( X   )2    ( x   )2 f ( x) dx

 2
2
2


 E  X      x f ( x) dx   2

2
3
Continuous
Numerical
Random Variable X
assigns a number to each pop unit
Categorical
Discrete
k = 2 categories Binary (0/1)
k > 2 categories
Population Distribution of X = Right foot length (mm)
Density f(x)
f ( x) 
probability density
function (pdf)
a
Parameters
mean
Properties?
f ( x)  0

 f ( x) dx  1
b

  E[ X ]   x f ( x) dx


variance   E ( X   )2    ( x   )2 f ( x) dx

 2
2
2


 E  X      x f ( x) dx   2

2
bb
PP(a  X  b)   a f ( x) dx
a
 F (b)  F (a)
4
Numerical
Random Variable X
assigns a number to each pop unit
Categorical
Continuous
Discrete
k = 2 categories Binary (0/1)
k > 2 categories
Population Distribution of X = Right foot length (mm)
Density f(x)
f ( x) 
probability density
function (pdf)

  E[ X ]   x f ( x) dx


variance   E ( X   )2    ( x   )2 f ( x) dx

 2
2
2


 E  X      x f ( x) dx   2

2
f ( x)  0

 f ( x) dx  1
x
Parameters
mean
Properties?
cumulative distrib function (cdf )
F ( x)  P( X  x)
bb
PP(a  X  b)   a f ( x) dx
a
 F (b)  F (a)
5
Numerical
Random Variable X
assigns a number to each pop unit
p(xi )
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
1
k > 2 categories
f ( x) 
f ( x)  0

 f ( x) dx  1
, 9, 9 12 , 10, 10 12 , 11, 11 12 ,

  E[ X ]   x f ( x) dx


variance   E ( X   )2    ( x   )2 f ( x) dx

 2
2
2


 E  X      x f ( x) dx   2

2
Properties?
probability density
function (pdf)
Parameters
mean
Discrete
k = 2 categories Binary (0/1)
Population Distribution of X = Shoe size ( ,9,9 12 ,10,10 12 ,11,1112 , )
Density f(x)
xi
Categorical
Continuous
cumulative distrib function (cdf )
F ( x)  P( X  x)
b
P(a  X  b)   f ( x) dx
a
 F (b)  F (a)
6
Continuous
Numerical
Random Variable X
assigns a number to each pop unit
p(xi )
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
1
p( x) 
f ( x)  0

 f ( x) dx  1
, 9, 9 12 , 10, 10 12 , 11, 11 12 ,

  E[ X ]   x f ( x) dx


variance   E ( X   )2    ( x   )2 f ( x) dx

 2
2
2


 E  X      x f ( x) dx   2

2
Properties?
probability mass
function (pmf)
Parameters
mean
k > 2 categories
Population Distribution of X = Shoe size ( ,9,9 12 ,10,10 12 ,11,1112 , )
Density f(x)
xi
Categorical
Discrete
k = 2 categories Binary (0/1)
cumulative distrib function (cdf )
F ( x)  P( X  x)
b
P(a  X  b)   f ( x) dx
a
 F (b)  F (a)
7
Continuous
Numerical
Random Variable X
assigns a number to each pop unit
p(xi )
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
1
p( x) 
Properties?
probability mass
function (pmf)
p ( x)  0

 p ( x)  1

Parameters
mean
k > 2 categories
Population Distribution of X = Shoe size ( ,9,9 12 ,10,10 12 ,11,1112 , )
Density f(x)
xi
Categorical
Discrete
k = 2 categories Binary (0/1)
, 9, 9 12 , 10, 10 12 , 11, 11 12 ,
  E[ X ]    x p( x)

variance  2  E ( X   ) 2     ( x   ) 2 p( x)

 E  X 2    2    x 2 p( x)   2

cumulative distrib function (cdf )
F ( x)  P( X  x)
P ( a  X  b)   a p ( x )
b
 F ( b)  F ( a  )
8
Numerical
Random Variable X
assigns a number to each pop unit
Categorical
Continuous
Discrete
k = 2 categories Binary (0/1)
k > 2 categories
Population Distribution of X ~ Dist (  ,  )
Density f(x)
f ( x) 
p( x) 
probability density
function (pdf)
Parameter Estimation
Sample,
size n
X1 ,
, Xn
random
1 n
X   Xi
n i 1
How do we obtain a random sample-based
estimator ˆ of the population mean  ?
How do we obtain a random sample-based
estimator ˆ 2 of the population variance  2 ?
Moreover, E  X    and E  S 2    2 .
1 n
2
S 
(
X

X
)
i
n 1 
i 1
2
probability mass
function (pmf)
 X is an unbiased estimator of 
S 2 is an unbiased estimator of  2 .
Numerical
Random Variable X
assigns a number to each pop unit
Categorical
Continuous
Discrete
k = 2 categories Binary (0/1)
k > 2 categories
Population Distribution of X ~ Dist ( )
Density f(x)
f ( x) 
p( x) 
probability density
function (pdf)
Parameter Estimation
Sample,
size n
X1 ,
, Xn
random
in general… How do we obtain a random sample-based
estimator ˆ of a population parameter  ?
1 n
X   Xi
n i 1
1 n
2
S 
(
X

X
)
i
n 1 
i 1
2
probability mass
function (pmf)
ˆ  ˆ( X1, X 2 , , X n )
Method of Moments, MLE,… (Stat 311)
Properties (e.g., bias)? Improvement?
Continuous
Numerical
Discrete
k = 2 categories Binary (0/1)
Random Variable X
assigns a number to each pop unit
Categorical
k > 2 categories
Population Distribution of X
Density f(x)
f ( x) 
p( x) 
probability density
function (pdf)
probability mass
function (pmf)
… etc…
Sample 3,
Sample 1,
size n
Sample 2,
X1
size n
X2
size n
Sample 4,
X3
size n
X4
How are these random
X values distributed ?
11
Numerical
Random Variable X
assigns a number to each pop unit
Categorical
Continuous
Discrete
k = 2 categories Binary (0/1)
k > 2 categories
Density f(x)
Population Distribution of X
f ( x) 
p( x) 
probability density
function (pdf)
probability mass
function (pmf)
Sampling Distribution of X
As long as  and  exist,
   “standard error”
X  N  ,
 of the mean (SEM)
n

for "large" values of n (> 30).


n
X
IMPORTANT FACT!
Numerical
Random Variable X
Continuous
Discrete
k = 2 categories Binary (0/1)
Density f(x)
Suppose
X tofollows
assigns a number
each pop unit a
Categorical
k > 2 categories
normal distribution
X N ( ,  )
Population Distribution of X
f ( x) 
p( x) 
probability density
function (pdf)


As long as  and  exist,
   exactly
X  N  ,

n

. 30).
for "large"
ALL values of n (>
probability mass
function (pmf)
Sampling Distribution of X


n
X
Normal Distribution N (  ,  )
In general….
What symmetric interval about the mean
contains 100(1 – )% of the population values?
1–
/2
/2
  z 2 

“ / 2 critical values”
Example:
  .05
  z 2 
Normal Distribution N (  ,  )
In general….
What symmetric interval about the mean
contains 100(1 – )% of the population values?
“Approximately 95% of
any normally-distributed
population lies within 2
standard deviations of
the mean.”
.95
.025
.025
  1.96
z.025 

  1.96
z.025 
““.025
 / 2 critical values”
cumulative areas
Example:
  .05
Use the included table or R:
> qnorm(c(.025, .975))
[1] -1.959964 1.959964
Random Variable X
Density f(x)
Population Distribution of X
Dist (  ,  )
Continuous
Discrete
f ( x) 
p( x) 
probability density
function (pdf)
probability mass
function (pmf)
To summarize…
Suppose we wish to estimate the mean Sampling Distribution of X  N (  ,  n )
 from a particular random sample.
We can now use
 n
known properties of
n
Sample,
1
the “bell curve” to
END
x

x
i
size n
n i 1
REVIEW
improve our estimate.

x1 , x2
, xn

X