Download Chapter 2 Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 2 Descriptive Statistics
Sample mean
n
x
Sample variance
s 
Empirical Rule
Chebyshev’s theorem
z score
Coefficient of variation
pth percentile
Weighted mean
i 1
i
x1  x2    xn
n

n
n
2
Sample standard
deviation
Calculating the Sample
Variance
(Computational formula
for s2)
x
 x  x 
i
i 1
n 1
s s
2
 x  x    x2  x 
 1
2
    xn  x 
2
n 1
2
2

 n  

 xi 
1  n 2  i 1  
2
s 
 xi n 
n  1  i 1




For a normally distributed population, this rule tells us that
68.26 percent, 95.44 percent, and 99.73 percent of the
population measurements are within one, two, and three
standard deviations, respectively, of the population mean.
A theorem that (for any population) allows us to find an
interval that contains a specified percentage of the individual
measurements in the population.
x  mean
z
standard deviation
Standard deviation
Coefficient of variation 
100
Mean
For a set of measurements arranged in increasing order, a
value such that p percent of the measurements fall at or below
the value, and (100 − p) percent of the measurements fall at or
above the value.
 wi xi
Sample variance for
grouped data
w
fM fM
x
n
f
 f M  x 
s 
Population mean for
grouped data

Population variance for
grouped data

i
Sample mean for
grouped data
2
i
i
i
i
2
i
2
2
i
n 1
fi M i
N
 f M

i
N
i
 
2
i
Chapter 3 Probability
Computing the
Probability of an
Event
The Rule of
Complements
The Addition Rule
Mutually Exclusive
Events
The Addition rule
for two mutually
exclusive events
The Addition rule
for N mutually
exclusive events
Conditional
probability
Number of sample space outcomes that correspond to the event
Total number of sample space outcomes
P  A   1  P  A
P  A  B   P  A  P  B   P  A  B 
P( A  B)  0
P  A  B   P  A  P  B 
P  A1  A2  AN   P  A1   P  A2     P  AN 
P  A | B 
P  A  B
P  B
P  B | A 
P  A  B
P  A
The General
multiplication rule
P  A  B   P  A P  B | A
 P  B P  A | B
Independent Events
P  A | B   P  A
P  B | A  P  B 
The Multiplication
rule for two
independent events
The Multiplication
rule for N
independent events
Bayes’ theorem
P  A  B   P  A P  B 
P  A1  A2  AN   P  A1  P  A2   P  AN 
P  Si | E  
P  Si  E  P  Si  P  E | Si 

PE
PE
Chapter 4 Discrete Random Variables
Properties of a Discrete Probability
Distribution P(x)
P( x)  0
 p ( x)  1
All x
The Mean, or Expected Value, of a
Discrete Random Variable
x   x  P  x 
All x
The Variance and standard deviation of a
discrete random variable
 x2    x  x  P  x 
2
All x
 x   x2
The Binomial Distribution
p  X  x 
The Mean, Variance, and Standard
Deviation of a Binomial Random
Variable
The Poisson Distribution
x  np,  x2  npq, and  x  npq
n!
p x q n x
x ! n  x !
e   x
x!
x  ,  x2   ,  x  
P  x 
The Mean, Variance, and Standard
Deviation of a Poisson Random Variable
The Hypergeometric Distribution
  
p  x 
 
N r
n x
r
x
N
n
 r 
The Mean and Variance of a
Hypergeometric Random Variable
 r 
r  N  n 
 x  n   and  x2  n  1  

N
 N  N  N  1 
Chapter 5 Continuous Random Variables
Properties of a Continuous
Probability Distribution
The Uniform Distribution
f ( x)  0
 1

f  x   d  c
0
x 
The Normal Probability
Distribution
z values
The Standard Normal Distribution
Normal approximation to the
binomial distribution
cd
2
for c  x  d
otherwise
and  x 
1  x 
d c
12
2
 

1
f  x 
e 2  
 2
x
z

x
z

Consider a binomial random variable x where n is the
number of trials and p is the probability of success. If
np  5 and n(1 – p)  5, then x is approximately
normal with mean  = np and standard
deviation   npq .
To standardize, use z 
x  0.5  

or z 
x  0.5  

.
The Exponential Distribution
Mean and standard deviation of an
exponential distribution
λe λx for x  0
and P( x  a )  1  ea
f  x  
otherwise
0
1
1
 x  and  x 


Chapter 6 Sampling Distributions
Sampling distribution of the
sample mean
If x has mean  and standard deviation  , then x has
Standard deviation of the
sampling distribution of the
sample mean
Central limit theorem
x 
Sampling distribution of the
sample proportion
mean x   and standard deviation  x   n . In
addition, if x follows a normal distribution, then x also
follows a normal distribution.

n
If the sample size n is sufficiently large (at least 30),
then x will follow an approximately normal distribution
with mean x   and standard deviation  x   n .
If np  5 and n(1 – p)  5, then p̂ is approximately
normal with mean  p̂ = p and standard
deviation  pˆ 
Standard deviation of the
sampling distribution of the
sample proportion
 pˆ 
p(1  p)
n
p(1  p) / n .
Chapter 7 Hypothesis Testing
Hypothesis Testing
Steps
1. State the null and alternative hypotheses. 2. Specify the level
of significance. 3. Select the test statistic. 4. Find the critical
value (or compute the p-value). 5. Compare the value of the test
statistic to the critical value (or the p-value to the level of
significance) and decide whether to reject H0.
x  0
z0 
 n
Hypothesis test
about a population
mean (σ known)
Large-sample
pˆ  p0
hypothesis test about z0  p (1  p ) .
0
0
a population
n
proportion
Sampling
x1  x2 has mean  x1  x2  1  2
distribution of
2
2
x1  x2 (independent and standard deviation  x  x   1   2
1
2
n1 n2
random samples)
Hypothesis test
 x1  x2   D0
about a difference in z0 
 12  22
population mean (σ1

n1 n2
and σ2 known)
Large-sample
 pˆ1  pˆ 2   0 , pˆ  total number of success in both samples
hypothesis test about z0 
total number of trials in both samples
1 1
a difference in
pˆ (1  pˆ )   
population
 n1 n2 
proportions where p1
= p2
Large-sample
 pˆ1  pˆ 2   D0
hypothesis test about z0  ˆ
p1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )

a difference in
n
n2
1
population
proportions where p1
 p2
Calculating the
0  a
probability of a Type z* 
 n
II error
Sample-size
( z *  z ) 2  2
n
determination to
( 0   a ) 2
achieve specified
values of α and β
Chapter 8 Comparing Population Means and Variances Using t Tests and F Ratios
t test about μ
t test about μ1 – μ2 when
σ12 = σ22
t test about μ1 – μ2 when
σ12 ≠σ22
Hypothesis test about μd
t0 
t0 
t0
x  0
s n
 x1  x2   D0 , s 2  (n1  1) s12  (n2  1) s22 , df
1 1
s 2p   
 n1 n2 
 x  x   D0 , df
 1 2
t0 
s12 s22

n1 n2
n1  n2  2
p

s
2
1
s
2
1
n1  s22 n2 
 n1  n2  2
2
n1  ( n1  1)   s22 n2  (n2  1)
2
2
d  D0
sd n
Sampling distribution of
s12
2
2
2 2



If
,
then
has an F distribution with
1
2
s1 /s2 (independent
s22
random samples)
df1 = n1 – 1 and df2 = n2 – 1.
Hypothesis test about
s12
s22
2
2
2
2
2
For
.
For
.
H
:



,
F

H
:



,
F

a
1
2
0
a
1
2
0
the equality of σ1 and
2
2
s
s
2
1
σ2 2
Chapter 9 Confidence Intervals
z-based confidence interval
for a population mean μ with
σ known
t-based confidence interval
for a population mean μ with
σ unknown
Sample size when estimating
μ
Large-sample confidence
interval for a population
proportion p
Sample size when estimating
p
t-based confidence interval
for μ1 – μ2 when σ12 = σ22
t0 
 x1  x2   D0 , s 2  (n1  1) s12  (n2  1) s22 ,
1 1
s   
 n1 n2 
df  n1  n2  2
2
p
p
n1  n2  2
t-based confidence interval
for μ1 – μ2 when σ12 ≠σ22
s12 s22
 ,
n1 n2
 x1  x2   t 2,df
df 
Large-sample confidence
interval for a difference in
population proportions
s
2
1
s
2
1
n1  s22 n2 
2
n1  (n1  1)   s22 n2  (n2  1)
2
 pˆ1  pˆ 2   z 2
2
pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )

n1
n2
Chapter 10 Experimental Design and Analysis of Variance
One-way
ANOVA
sums of
squares
The sum
of
squares
total
(SST) is
The
betweengroups
mean
square
(MSB) is
The mean
square
error
(MSE) is
One-way
ANOVA
F test
n1
p
n2
SSB   ni ( xi  x ) , SSE   ( x1 j  x1 )   ( x2 j  x2 ) 
2
j 1
i 1
SST  SSB  SSE
MSB 
SSB
p 1
MSE 
SSE
n p
F
MSB SSB / ( p  1)

MSE SSE / (n  p)
2
j 1
2
np
  ( x pj  x p ) 2
j 1
Estimati
on in
one-way
ANOV
A:
Individu
al
100(1 –
)
confiden
ce
interval
for
i  h
Estimati
on in
one-way
ANOV
A:
Tukey
simultan
eous
100(1 –
)
confiden
ce
interval
for
i  h
Estimati
on in
one-way
ANOV
A:
Individu
al
100(1 –
)
confiden
ce
interval
for i
Random
ized
 xi  xh   t /2
1 1
MSE    , df  n  p
 ni nh 
 xi  xh   q
MSE 1 1
 , with q corresponding to p and n  p.
2
ni nh
xi  t /2
MSE
, df  n  p
ni
p
b
i 1
j 1
p
b
SSB  b ( xi  x )2 , SSBL  p  ( x j  x )2 , SST   ( xij  x )2
i 1 j 1
block
sums of
squares
Estimati
2
 xi  xh   t /2 s , df  ( p  1)(b  1), s  MSE
on in a
b
randomi
zed
block
experim
ent:
Individu
al
100(1 –
)
confiden
ce
interval
for
i  h
Estimati
s
x

x

q
, with q corresponding to p and( p  1)(b  1).


i
h

on in a
b
randomi
zed
block
experim
ent:
Tukey
simultan
eous
100(1 –
)
confiden
ce
interval
for
i  h
a
a
b m
b
Two2
2
,
SS
(1)

bm
(
x

x
)
SST   ( xij ,k  x )
SS (2)  am ( x j  x )2 ,

i
way
i 1
i 1 j 1 k 1
j 1
ANOV
a
b
A sums
SS (int)  m ( xij  xi  x j  x )2 ,
of
i 1 j 1
squares SSE = SST – SS(1) – SS(2) – SS(int)
Estimati
2
 xi  xi '   t /2 MSE   , df  ab(m  1)
on in
 bm 
two-way
ANOV
A:
Individu
al
100(1 –
)
confiden
ce
interval
for
i  i '
Estimati
on in
two-way
ANOV
A:
Tukey
simultan
eous
100(1 –
)
confiden
ce
interval
for
factor
1
i  i '
Estimati
on in
two-way
ANOV
A:
Tukey
simultan
eous
100(1 –
)
confiden
ce
interval
for
factor
2
 j   j'
 xi  xi '   q
 1 
MSE 
 , with q corresponding to a and ab(m  1).
 bm 
 xi  xh   q
 1 
MSE 
 , with q corresponding to b and ab(m  1).
 am 
Estimati
on in
two-way
ANOV
A:
Individu
al 100(1
– )
confiden
ce
interval
for ij
xij  t /2
MSE
, df  ab(m  1) with q corresponding to b and ab(m  1).
m
Chapter 11 Correlation Coefficient and Simple Linear Regression Analysis
Least squares point
estimates of β0 and β1
b1 
SS xy
SS xx
, where
 n  n 
  xi    yi 
n
n
SS xy   ( xi  x )( yi  y )   xi yi   i 1   i 1 
n
i 1
i 1
 n 
  xi 
n
n
2
2
and SS xx   ( xi  x )   xi   i 1 
n
i 1
i 1
b0  y  b1 x
The predicted value of yi
yˆi  b0  b1 xi
Point estimate of a mean
value of y at x = x0
Point prediction of an
individual value of y at x
= x0
ŷ  b0  b1 x0
ŷ  b0  b1 x0
2
Chapter 12 Multiple Regression
Chapter 13 Nonparametric Methods
Sign test for a population
median
If H a : M d  M 0 , then S = number of sample
measurements less than M0. If H a : M d  M 0 , then
S = number of sample measurements greater than
M0.
Large-sample sign test
Wilcoxon rank sum test
Wilcoxon rank sum test (largesample approximation)
S  .5  .5n
.5 n
If H a : D1 shifted to the right of D2, then reject H0
if T  TU (n1  n2 ) or T  TL (n1  n2 ) .
If H a : D1 shifted to the left of D2, then reject H0 if
T  TU (n1  n2 ) or T  TL (n1  n2 ) .
If H a : D1 shifted to the right or left of D2, then
reject H0 if T  TU or T  TL .
n (n  n  1)
T  T
, T  1 1 2
,
z
2
T
z
n1n2 (n1  n2  1)
12

T = sum of the ranks associated with the negative
paired differences
T  = sum of the ranks associated with the positive
paired differences
If H a : D1 shifted to the right of D2, then reject H0
T 
Wilcoxon signed ranks test
if T = T   T0 .
If H a : D1 shifted to the left of D2, then reject H0 if
T = T   T0 .
If H a : D1 shifted to the right or left of D2, then
reject H0 if T = the smaller of T  and T  is  T0 .
z
T  T
t
rs n  2
Kruskal-Wallis H test
T
, T 
n( n  1)
, T 
4
n(n  1(2n  1)
24
Kruskal-Wallis H statistic
Spearman’s rank correlation
coefficient
Spearman’s rank correlation test
1  rs2
, where rs  1 
Chapter 14 Chi-Square Tests
Goodness of fit test for multinomial
probabilities
k
2  
i 1
( f i  Ei )2
Ei
6 d 2
n(n 2 1)
Test for homogeneity
k
2  
i 1
Goodness of fit test for a normal
distribution
k
2  
i 1
( f i  Ei )2
Ei
( f i  Ei )2
Ei
Chi-Square test for independence
Chapter 15 Decision Theory
Maximin criterion
Maximax criterion
Expected monetary value criterion
Expected value of perfect information
Expected value of sample information
Expected net gain of sampling
Find the worst possible payoff for each
alternative and then choose the alternative
that yields the maximum worst possible
payoff.
Find the best possible payoff for each
alternative and then choose the alternative
that yields the maximum best possible
payoff.
Choose the alternative with the largest
expected payoff.
EVPI = expected payoff under certainty –
expected payoff under risk
EVSI = EPS - EPNS
ENGS = EVSI – cost of sampling
Chapter 16 Time Series Forecasting
No trend
Linear trend
Quadratic trend
Modelling constant seasonal variation by
using dummy variables
Multiplicative decomposition method
Simple exponential smoothing
Double exponential smoothing
Mean absolute deviation (MAD)
For a time series with k seasons, define
k – 1 dummy variables in a multiple
regression model. (e.g. for quarterly data,
define three dummy variables)
Yt  TRt  SN t  CLt  IRt
Mean squared deviation (MSD)
Percentage error (PE)
Mean absolute percentage error (MAPE)
PE 
yt  yˆ t
 100
yt
n
 PE
t 1
A simple index
An aggregate price index
A Laspeyres index is
A Paasche index is
n
yt
100
y0
  pt 

 100
  p0 
 pt q0 100
 p0q0
 p q 100
p q
t t
0 t
Related documents