Download SCStatCompar

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Statistical Comparison of Two
or More Systems
The most relevant of all the Basic
Theory Lectures.
No Holidays.
THE MISSION

Your analysis task involves manipulating
conditions of the system of interest
from a prescribed set of options.


Design of Experiments: Determine if the different
options are really different. Is the best one really
statistically better?
Ranking and Selection: What’s the probability that
the best sample indicates the best system setting?
VOCABULARY

Factor


An element of the system that will be
manipulated
Setting or Level

A value that a Factor may assume
EXAMPLE : Simulation model
of Football (EA Sports)

Factors




Quarterback
Running Back
Strong Safety
Settings or Levels for Quarterback



Dante’
Bret
Johnny U.
TYPES OF DESIGNS

One Factor, Two Settings




Paired samples
Behrens-Fischer
Question: Which is Best?
More than one Factor



Factorial Designs
Partially Exhaustive Designs
Question: Are the settings significant differencemakers?
PAIRED SAMPLES


Example: Quarterback Controversy!
Simulate St. Louis Rams vs. Tampa Bay Bucs,
recording the Quarterback Rating



Run the simulation 28 times for each player,
resulting in data set



Level 1: Curt Warner
Level 2: Mark Bulger
W1, W2, ..., W28
B1, B2, ..., B28
Is E[B] > E[W]?
BRUTE FORCE



Confidence interval on the quantity
E[W]-E[B]
If it doesn’t include 0.0, we have
conclusive evidence that there is a
difference
Equivalent to the Hypothesis Test

H0: E[B]=E[W]
CALCULATIONS ON
VARIANCES: SOME BASICS

Let X and Y be random variables
VAR[ X ]  E ( X  E[ X ])
2

  ( X  E[ X ]) dFX ( x)
2

CALCULATIONS ON
VARIANCES: SOME BASICS

Let X and Y be random variables
1)VAR[ X ]  E[ X ]  ( E[ X ])
2)VAR[ X  Y ]  VAR[ X ]  VAR[Y ]  2COV [ X , Y ]
3)COV [ X , Y ]  E[ XY ]  E[ X ]E[Y ]
2
2
4)VAR[cX ]  c VAR[ X ]
5)VAR[ X  Y ]  VAR[ X ]  VAR[Y ]  2COV [ X , Y ]
2
COV=0 if X and Y are independent.
SAMPLE MEAN
 n

  Xi 
n
1


i 1


VAR( X )  VAR
 2 VAR  X i 
 n  n
 i 1 




n
VAR( X )
 2 VAR X i  
n
n
X 
X
n
CONFIDENCE INTERVAL


a/2 probability
of Type I error
on each end of
the confidence
interval
basic interval
for X-bar is
X  Za / 2 VAR[ X ]
X  Za / 2
X  Za / 2

2
n

n
BASIC CONFIDENCE
INTERVAL
(W  B )  Za / 2 VAR[W  B ]
VAR[W  B ]  VAR[W ]  VAR[ B ]  2COV [W , B ]
VAR[W ]  VAR[ B]  0

28
SPREADSHEET HIGHLIGHTS 1

(U-0.5)*SQRT(12)



zero mean
unit stddev
m + (U-0.5)*SQRT(12)*



mean m
stddev 
uniform over an interval centered at m and
*SQRT(12)/2 wide
COMMON RANDOM NUMBERS




Correlation is not always BAD!
Suppose we could INDUCE
CORRELATION between the W’s and
the B’s without adding any bias?
Reduces the theoretical variance of
W-bar – B-bar
FREE POWER (the probability of
correctly rejecting H0: equal means)
STREAMING

Segregate the random number
generation task into streams connected
to phenomena
seed1
Zi=aZi-1 mod m
seed2
Inter-arrival
times
Service
times
1. Change features of the service.
2. Use exact same arrival stream for
comparing each service setting.
SPREADSHEET HIGHLIGHTS 2

Use same results of RAND() for building





Bulger samples
Warner samples
Note CI shrinkage
Try with identical sigma
Discuss “Estimation”
Behrens-Fischer Problem




Comparison of Means
No pairs, equal sample sizes, or equal variances
Remember that we are after the variance of Wbar – B-bar
Common use: New samples vs. History
VAR[W  B ]  VAR[W ]  VAR[ B ]  2COV [W , B ]
 VAR[W ] / nW  VAR[ B] / nB  0
SPREADSHEET HIGHLIGHTS
MULTI-SETTING CASE



Can involve many Factors or just one
Treatment i has mean mi
Analysis of Variance (ANOVA)



Data from treatment 1, 2, ..., n
H0: m1 =...mn-1 =mn
Are the treatments distinguishable?
DESIGN OF EXPERIMENT
Determine
Factors and
Settings
Design = Which Factors,
Which Settings for each
Treatment
Collect Data
According to
Design
State
Conclusion
Perform
ANOVA
FULL FACTORIAL


Build sample of All Combinations
Factors




Quarterback (2)
Running Back (3)
Strong Safety (3)
2x3x3=18 Treatments
HOW ANOVA WORKS



Xi,j is ith sample from jth treatment point
Assumed iid Normal (never!)
Decomposition of variability



Observation (Obs)
Treatment vs. Grand Mean (Tr)
Within Treatment (Res)
X i , j  m   i  ei , j
HYPOTHESIS H0



The treatment variability is random
variability
The size of the treatment variability is
in-scale with the residual variability
ANOVA uses sums of squares


g treatments
nt samples from treatment t
ANOVA TABLE
g
SSTr
 n (x
t 1
g nt
t
t
 x)
degrees
freedom
g 1
2
SS Re s

( xi , j  xt )
SSObs

( xi , j  x )
t 1 j 1
g nt
t 1 j 1
g
2
2
n  g
i 1
g
t
 n 1
i 1
t
REMEMBER chi-SQUARED?
From our Goodness-of-Fit Test




X~N(0,1)
for n independent X’s
sum of n X2 is chi-SQUARED with n
degrees of freedom
if estimates (X-bar, sigma) were used to
make X’s N(0,1), lose one d.f. per
estimate
F-distribution



X is chi-sq with n d.f.
Y is chi-sq with m d.f.
(X/n)/(Y/m) has F distribution
ANOVA HYPOTHESIS TEST
SSTr / d . f
~F
SS Re s / d . f
The normalizing  cancels!
ANOVA HYPOTHESIS TEST



Compare the
test statistic to a
table
Reject if its big
and conclude
that ...
the Treatments
are Different!
SPREADSHEET HIGHLIGHTS