Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Comparison of Two or More Systems The most relevant of all the Basic Theory Lectures. No Holidays. THE MISSION Your analysis task involves manipulating conditions of the system of interest from a prescribed set of options. Design of Experiments: Determine if the different options are really different. Is the best one really statistically better? Ranking and Selection: What’s the probability that the best sample indicates the best system setting? VOCABULARY Factor An element of the system that will be manipulated Setting or Level A value that a Factor may assume EXAMPLE : Simulation model of Football (EA Sports) Factors Quarterback Running Back Strong Safety Settings or Levels for Quarterback Dante’ Bret Johnny U. TYPES OF DESIGNS One Factor, Two Settings Paired samples Behrens-Fischer Question: Which is Best? More than one Factor Factorial Designs Partially Exhaustive Designs Question: Are the settings significant differencemakers? PAIRED SAMPLES Example: Quarterback Controversy! Simulate St. Louis Rams vs. Tampa Bay Bucs, recording the Quarterback Rating Run the simulation 28 times for each player, resulting in data set Level 1: Curt Warner Level 2: Mark Bulger W1, W2, ..., W28 B1, B2, ..., B28 Is E[B] > E[W]? BRUTE FORCE Confidence interval on the quantity E[W]-E[B] If it doesn’t include 0.0, we have conclusive evidence that there is a difference Equivalent to the Hypothesis Test H0: E[B]=E[W] CALCULATIONS ON VARIANCES: SOME BASICS Let X and Y be random variables VAR[ X ] E ( X E[ X ]) 2 ( X E[ X ]) dFX ( x) 2 CALCULATIONS ON VARIANCES: SOME BASICS Let X and Y be random variables 1)VAR[ X ] E[ X ] ( E[ X ]) 2)VAR[ X Y ] VAR[ X ] VAR[Y ] 2COV [ X , Y ] 3)COV [ X , Y ] E[ XY ] E[ X ]E[Y ] 2 2 4)VAR[cX ] c VAR[ X ] 5)VAR[ X Y ] VAR[ X ] VAR[Y ] 2COV [ X , Y ] 2 COV=0 if X and Y are independent. SAMPLE MEAN n Xi n 1 i 1 VAR( X ) VAR 2 VAR X i n n i 1 n VAR( X ) 2 VAR X i n n X X n CONFIDENCE INTERVAL a/2 probability of Type I error on each end of the confidence interval basic interval for X-bar is X Za / 2 VAR[ X ] X Za / 2 X Za / 2 2 n n BASIC CONFIDENCE INTERVAL (W B ) Za / 2 VAR[W B ] VAR[W B ] VAR[W ] VAR[ B ] 2COV [W , B ] VAR[W ] VAR[ B] 0 28 SPREADSHEET HIGHLIGHTS 1 (U-0.5)*SQRT(12) zero mean unit stddev m + (U-0.5)*SQRT(12)* mean m stddev uniform over an interval centered at m and *SQRT(12)/2 wide COMMON RANDOM NUMBERS Correlation is not always BAD! Suppose we could INDUCE CORRELATION between the W’s and the B’s without adding any bias? Reduces the theoretical variance of W-bar – B-bar FREE POWER (the probability of correctly rejecting H0: equal means) STREAMING Segregate the random number generation task into streams connected to phenomena seed1 Zi=aZi-1 mod m seed2 Inter-arrival times Service times 1. Change features of the service. 2. Use exact same arrival stream for comparing each service setting. SPREADSHEET HIGHLIGHTS 2 Use same results of RAND() for building Bulger samples Warner samples Note CI shrinkage Try with identical sigma Discuss “Estimation” Behrens-Fischer Problem Comparison of Means No pairs, equal sample sizes, or equal variances Remember that we are after the variance of Wbar – B-bar Common use: New samples vs. History VAR[W B ] VAR[W ] VAR[ B ] 2COV [W , B ] VAR[W ] / nW VAR[ B] / nB 0 SPREADSHEET HIGHLIGHTS MULTI-SETTING CASE Can involve many Factors or just one Treatment i has mean mi Analysis of Variance (ANOVA) Data from treatment 1, 2, ..., n H0: m1 =...mn-1 =mn Are the treatments distinguishable? DESIGN OF EXPERIMENT Determine Factors and Settings Design = Which Factors, Which Settings for each Treatment Collect Data According to Design State Conclusion Perform ANOVA FULL FACTORIAL Build sample of All Combinations Factors Quarterback (2) Running Back (3) Strong Safety (3) 2x3x3=18 Treatments HOW ANOVA WORKS Xi,j is ith sample from jth treatment point Assumed iid Normal (never!) Decomposition of variability Observation (Obs) Treatment vs. Grand Mean (Tr) Within Treatment (Res) X i , j m i ei , j HYPOTHESIS H0 The treatment variability is random variability The size of the treatment variability is in-scale with the residual variability ANOVA uses sums of squares g treatments nt samples from treatment t ANOVA TABLE g SSTr n (x t 1 g nt t t x) degrees freedom g 1 2 SS Re s ( xi , j xt ) SSObs ( xi , j x ) t 1 j 1 g nt t 1 j 1 g 2 2 n g i 1 g t n 1 i 1 t REMEMBER chi-SQUARED? From our Goodness-of-Fit Test X~N(0,1) for n independent X’s sum of n X2 is chi-SQUARED with n degrees of freedom if estimates (X-bar, sigma) were used to make X’s N(0,1), lose one d.f. per estimate F-distribution X is chi-sq with n d.f. Y is chi-sq with m d.f. (X/n)/(Y/m) has F distribution ANOVA HYPOTHESIS TEST SSTr / d . f ~F SS Re s / d . f The normalizing cancels! ANOVA HYPOTHESIS TEST Compare the test statistic to a table Reject if its big and conclude that ... the Treatments are Different! SPREADSHEET HIGHLIGHTS