Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The art of computer science analysis Emmanuel Jeannot INRIA Complex HPC Spring School May, 10 2011 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 1 / 19 Outline 1 Comparing System Using Sample Data 2 Conclusion E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 2 / 19 Comparing System Using Sample Data Outline 1 Comparing System Using Sample Data 2 Conclusion E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 3 / 19 Comparing System Using Sample Data Comparing systems using sample data [Jain 91, Chap 13] Determine the confidence interval of the mean Comparing two alternatives Confidence interval for proportion Determining sample size E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 4 / 19 Comparing System Using Sample Data Determine the confidence interval of the mean Problem S = {x1 , . . . , xn }: a set of results Determine the mean µ of S, such that: P(c1 ≤ µ ≤ c2 ) = 1 − α α: significance level (e.g. 0.01) 1 − α: confidence level (e.g. 0.99) Notations n: number of experiments P x̄ = n1 xi : sample mean q 1 P s = n−1 (x̄ − xi )2 : unbiased estimation of the standard deviation E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 5 / 19 Comparing System Using Sample Data When n is large (n ≥ 30) √ Central-limit theorem: x̄ ∼ N (µ, σ/ n) µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi . E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 6 / 19 Comparing System Using Sample Data When n is large (n ≥ 30) √ Central-limit theorem: x̄ ∼ N (µ, σ/ n) µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi . x̄−µ √ . Z ∼ N ( x̄−µ √ , 1) ∼ N (0, 1). Z = σ/ n σ/ n P(−c ≤ Z ≤ c) = 1 − α ⇔ c = z1−α/2 zi : value of the i th quantile of a unit normal variate. α = 0.1 : z1−α/2 = z0.95 = 1.64 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 6 / 19 Comparing System Using Sample Data When n is large (n ≥ 30) √ Central-limit theorem: x̄ ∼ N (µ, σ/ n) µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi . x̄−µ √ . Z ∼ N ( x̄−µ √ , 1) ∼ N (0, 1). Z = σ/ n σ/ n P(−c ≤ Z ≤ c) = 1 − α ⇔ c = z1−α/2 zi : value of the i th quantile of a unit normal variate. α = 0.1 : z1−α/2 = z0.95 = 1.64 0 0.1 0.2 0.3 0.4 alpha=0.1 −4 −3 −2 −1 0 1 1.64 2 3 4 N(0,1) E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 6 / 19 Comparing System Using Sample Data When n is large (n ≥ 30) √ Central-limit theorem: x̄ ∼ N (µ, σ/ n) µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi . x̄−µ √ . Z ∼ N ( x̄−µ √ , 1) ∼ N (0, 1). Z = σ/ n σ/ n P(−c ≤ Z ≤ c) = 1 − α ⇔ c = z1−α/2 zi : value of the i th quantile of a unit normal variate. α = 0.1 : z1−α/2 = z0.95 = 1.64 0 0.1 0.2 0.3 0.4 alpha=0.1 −4 −3 −2 −1 0 1 1.64 2 3 4 N(0,1) −c ≤ x̄−µ √ σ/ n √ √ ≤ c ⇔ x̄ − cσ/ n ≤ µ ≤ x̄ + cσ/ n. However, s ≈ σ E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 6 / 19 Comparing System Using Sample Data When n is large (n ≥ 30) √ Central-limit theorem: x̄ ∼ N (µ, σ/ n) µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi . x̄−µ √ . Z ∼ N ( x̄−µ √ , 1) ∼ N (0, 1). Z = σ/ n σ/ n P(−c ≤ Z ≤ c) = 1 − α ⇔ c = z1−α/2 zi : value of the i th quantile of a unit normal variate. α = 0.1 : z1−α/2 = z0.95 = 1.64 0 0.1 0.2 0.3 0.4 alpha=0.1 −4 −3 −2 −1 0 1 1.64 2 3 4 N(0,1) −c ≤ x̄−µ √ σ/ n √ √ ≤ c ⇔ x̄ − cσ/ n ≤ µ ≤ x̄ + cσ/ n. However, s ≈ σ With probability 1 − α √ √ µ ∈ [x̄ − z1−α/2 s/ n, x̄ + z1−α/2 s/ n] E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 6 / 19 Comparing System Using Sample Data Example x̄ = 10 n = 64 s=2 α = 0.1 ⇒ z0.95 = 1.64 ⇒ µ ∈ [10 − 1.64 × 2/8, 10 + 1.64 × 2/8] ⇒ µ ∈ [9.59, 10.41] α = 0.01 ⇒ z0.995 = 2.58 ⇒ µ ∈ [10 − 2.58 × 2/8, 10 + 2.58 × 2/8] ⇒ µ ∈ [9.35, 10.65] R code interval <-function(x,conf_level=0.9){ n<-length(x) X<-mean(x) s<-sd(x) alpha<-1-conf_level q<-qnorm(1-alpha/2) return(c(X-q*s/sqrt(n),X+q*s/sqrt(n))) } E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 7 / 19 Comparing System Using Sample Data n ≤ 30 and xi follow a normal distribution t(n): Student distribution with n degree of freedom. x̄−µ √ . Z ∼ t(n − 1). Z = σ/ n P(−c ≤ Z ≤ c) = 1 − α ⇔ c = t[1−α/2,n−1] t[i,k ] : value of the i th quantile of a Student variate with k degree of freedom. α = 0.1, n = 5 : t[0.95,4] = 2.13 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 8 / 19 Comparing System Using Sample Data n ≤ 30 and xi follow a normal distribution t(n): Student distribution with n degree of freedom. x̄−µ √ . Z ∼ t(n − 1). Z = σ/ n P(−c ≤ Z ≤ c) = 1 − α ⇔ c = t[1−α/2,n−1] t[i,k ] : value of the i th quantile of a Student variate with k degree of freedom. α = 0.1, n = 5 : t[0.95,4] = 2.13 With probability 1 − α √ √ µ ∈ [x̄ − t[1−α/2,n−1] s/ n, x̄ + t[1−α/2,n−1] s/ n] E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 8 / 19 Comparing System Using Sample Data n ≤ 30 and xi follow a normal distribution t(n): Student distribution with n degree of freedom. x̄−µ √ . Z ∼ t(n − 1). Z = σ/ n P(−c ≤ Z ≤ c) = 1 − α ⇔ c = t[1−α/2,n−1] t[i,k ] : value of the i th quantile of a Student variate with k degree of freedom. α = 0.1, n = 5 : t[0.95,4] = 2.13 With probability 1 − α √ √ µ ∈ [x̄ − t[1−α/2,n−1] s/ n, x̄ + t[1−α/2,n−1] s/ n] R code student_interval <-function(x,conf_level=0.9){ n<-length(x); X<-mean(x);s<-sd(x);alpha<-1-conf_level q<-qt(1-alpha/2,n-1) return(c(X-q*s/sqrt(n),X+q*s/sqrt(n))) } E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 8 / 19 Comparing System Using Sample Data Comparing two alternatives (paired observations) 6 benchmarks were used to compare two systems. The observations are: {(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}. E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 9 / 19 Comparing System Using Sample Data Comparing two alternatives (paired observations) 6 benchmarks were used to compare two systems. The observations are: {(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}. Is one system better than the other? E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 9 / 19 Comparing System Using Sample Data Comparing two alternatives (paired observations) 6 benchmarks were used to compare two systems. The observations are: {(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}. Is one system better than the other? Differences: 6 observations: {-13.7,13.1,-2.8,-1.1,-3.0,5.6} Sample means x̄ = −0.32 Sample standard deviation s = 9.03 These observation are likely to follow a normal distribution (P value of Shapiro/Wills test = 0.82>0.1): we can use the student distribution. E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 9 / 19 Comparing System Using Sample Data Comparing two alternatives (paired observations) 6 benchmarks were used to compare two systems. The observations are: {(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}. Is one system better than the other? Differences: 6 observations: {-13.7,13.1,-2.8,-1.1,-3.0,5.6} Sample means x̄ = −0.32 Sample standard deviation s = 9.03 These observation are likely to follow a normal distribution (P value of Shapiro/Wills test = 0.82>0.1): we can use the student distribution. α = 0.1, t[0.95,5] = 2.015. 90% confidence interval: √ √ µ ∈ [−0.32−2.015×9.03/ 6, −0.32+2.015×9.03/ 6] = [−7.76, 7.12] E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 9 / 19 Comparing System Using Sample Data Comparing two alternatives (paired observations) 6 benchmarks were used to compare two systems. The observations are: {(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}. Is one system better than the other? Differences: 6 observations: {-13.7,13.1,-2.8,-1.1,-3.0,5.6} Sample means x̄ = −0.32 Sample standard deviation s = 9.03 These observation are likely to follow a normal distribution (P value of Shapiro/Wills test = 0.82>0.1): we can use the student distribution. α = 0.1, t[0.95,5] = 2.015. 90% confidence interval: √ √ µ ∈ [−0.32−2.015×9.03/ 6, −0.32+2.015×9.03/ 6] = [−7.76, 7.12] The interval contains 0: hence the two systems are not different (with a probability of 0.9) E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 9 / 19 Comparing System Using Sample Data Comparing two alternatives (unpaired observations) Unpaired observations: no correspondence between the two samples (you cannot subtract them pairwise) Example: measured bandwidth between Europe and America and between Europe and Asia. The Student test (t-test) can compute the confidence interval of the difference of the means. R code r<-t.test(x,y,paired=FALSE,conf.level=0.9) r$conf.int[1] r$conf.int[2] E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 10 / 19 Comparing System Using Sample Data Confidence interval for proportions System A is better than system B for n1 among n experiments. 1 Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n n E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 11 / 19 Comparing System Using Sample Data Confidence interval for proportions System A is better than system B for n1 among n experiments. 1 Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n n n1 ∼ B(n, p1 ) (p1 the true probability that A outperform B). E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 11 / 19 Comparing System Using Sample Data Confidence interval for proportions System A is better than system B for n1 among n experiments. 1 Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n n n1 ∼ B(n, p1 ) (p1 the true probability that A outperform B). if np1 ≥ 10 and n(1 − p1p ) ≥ 10 n1 ∼ B(n, p1 ) ∼ N (np1 , np1 (1− p1 )) q q p̂1 p̂2 1) ∼ N p , ⇔ p̂1 = nn1 ∼ N p1 , p1 (1−p 1 n n E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 11 / 19 Comparing System Using Sample Data Confidence interval for proportions System A is better than system B for n1 among n experiments. 1 Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n n n1 ∼ B(n, p1 ) (p1 the true probability that A outperform B). if np1 ≥ 10 and n(1 − p1p ) ≥ 10 n1 ∼ B(n, p1 ) ∼ N (np1 , np1 (1− p1 )) q q p̂1 p̂2 1) ∼ N p , ⇔ p̂1 = nn1 ∼ N p1 , p1 (1−p 1 n n With probability 1 − α " r p1 ∈ p̂1 − z1−α/2 E. Jeannot (INRIA) p̂1 p̂2 , p̂1 + z1−α/2 n The art of computer science analysis r p̂1 p̂2 n # May, 10 2011 11 / 19 Comparing System Using Sample Data Confidence interval for proportions System A is better than system B for n1 among n experiments. 1 Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n n n1 ∼ B(n, p1 ) (p1 the true probability that A outperform B). if np1 ≥ 10 and n(1 − p1p ) ≥ 10 n1 ∼ B(n, p1 ) ∼ N (np1 , np1 (1− p1 )) q q p̂1 p̂2 1) ∼ N p , ⇔ p̂1 = nn1 ∼ N p1 , p1 (1−p 1 n n With probability 1 − α " r p1 ∈ p̂1 − z1−α/2 p̂1 p̂2 , p̂1 + z1−α/2 n r p̂1 p̂2 n # If the interval contains 0.5, we cannot conclude that A outperforms B. E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 11 / 19 Comparing System Using Sample Data Example An experiment is repeated 40 times. System A is found superior to system B 30 times, can we state with 99% confidence that system A is superior? E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 12 / 19 Comparing System Using Sample Data Example An experiment is repeated 40 times. System A is found superior to system B 30 times, can we state with 99% confidence that system A is superior? n = 40, n1 = 30 p̂ q1 = 30/40 q = 0.75 (np̂1 = 30, n(1 − p̂1 ) = 10) p̂1 p̂2 n = 0.75×0.25 40 E. Jeannot (INRIA) = 0.068 The art of computer science analysis May, 10 2011 12 / 19 Comparing System Using Sample Data Example An experiment is repeated 40 times. System A is found superior to system B 30 times, can we state with 99% confidence that system A is superior? n = 40, n1 = 30 p̂ q1 = 30/40 q = 0.75 (np̂1 = 30, n(1 − p̂1 ) = 10) p̂1 p̂2 n = 0.75×0.25 40 = 0.068 α = 0.01, z0.995 = 2.58 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 12 / 19 Comparing System Using Sample Data Example An experiment is repeated 40 times. System A is found superior to system B 30 times, can we state with 99% confidence that system A is superior? n = 40, n1 = 30 p̂ q1 = 30/40 q = 0.75 (np̂1 = 30, n(1 − p̂1 ) = 10) p̂1 p̂2 n = 0.75×0.25 40 = 0.068 α = 0.01, z0.995 = 2.58 p1 ∈ [0.75 − 2.58 × 0.068, 0.75 + 2.58 × 0.068] = [0.57, 0.92] The confidence interval does not include 0.5. Hence, we can conclude with 99% confidence that system A is superior than system B. E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 12 / 19 Comparing System Using Sample Data Code R code proportion_test <-function(x,conf_level=0.9){ n<-length(x) X<-mean(x) n1<-sum(findInterval(x,1)) n2<-n-n1 p1<-n1/n p2<-n2/n if(p1*n<10 || p2*n<10){ stop("Cannot apply normal approximation!") } alpha<-1-conf_level q<-qnorm(1-alpha/2) s<-sqrt(p1*p2/n) return(c(p1-q*s,p1+q*s)) } E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 13 / 19 Comparing System Using Sample Data R is nice! R provides all the above function Confidence interval of the mean : r<-t.test(x,conf.level=0.9) Comparing paired experiment : r<-t.test(x,y,paired=TRUE,conf.level=0.9) Comparing unpaired experiment : r<-t.test(x,y,paired=FALSE,conf.level=0.9) CI for proportion : r<-prop.test(n1,n,conf.level=0.9) r=binom.test(n1,n,conf.level=0.9) inf<-r$conf.int[1] sup<-r$conf.int[2] E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 14 / 19 Comparing System Using Sample Data Computing the number of experiments Problem You have a confidence interval, how many more experiments (n) you need to reduce your confidence interval to a given level ()? √ √ CI of the mean: µ ∈ [x̄ − z1−α/2 s/ n, x̄ + z1−α/2 s/ n] If you want: µ ∈ [x̄(1 − ), x̄(1 + )] z s 2 n ≥ 1−α/2 x̄ q q CI of a proportion: p1 ∈ p̂1 − z1−α/2 p̂1np̂2 , p̂1 + z1−α/2 p̂1np̂2 If you want p1 ∈ [p̂1 − , p̂1 + ] n≥ E. Jeannot (INRIA) 2 z1−α/2 p̂1 p̂2 2 The art of computer science analysis May, 10 2011 15 / 19 Comparing System Using Sample Data Example For the mean x̄ = 10, s = 2 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 16 / 19 Comparing System Using Sample Data Example For the mean x̄ = 10, s = 2 α = 0.1 ⇒ z0.95 = 1.64 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 16 / 19 Comparing System Using Sample Data Example For the mean x̄ = 10, s = 2 α = 0.1 ⇒ z0.95 = 1.64 1.64×2 2 = 0.05 ⇒ n ≥ 10∗0.05 = 43 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 16 / 19 Comparing System Using Sample Data Example For the mean x̄ = 10, s = 2 α = 0.1 ⇒ z0.95 = 1.64 1.64×2 2 = 0.05 ⇒ n ≥ 10∗0.05 = 43 For proportion n = 40, n1 = 30 ⇒ p̂1 = 30/40 = 0.75, p̂2 = 0.25 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 16 / 19 Comparing System Using Sample Data Example For the mean x̄ = 10, s = 2 α = 0.1 ⇒ z0.95 = 1.64 1.64×2 2 = 0.05 ⇒ n ≥ 10∗0.05 = 43 For proportion n = 40, n1 = 30 ⇒ p̂1 = 30/40 = 0.75, p̂2 = 0.25 α = 0.01 ⇒ z0.995 = 2.58 E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 16 / 19 Comparing System Using Sample Data Example For the mean x̄ = 10, s = 2 α = 0.1 ⇒ z0.95 = 1.64 1.64×2 2 = 0.05 ⇒ n ≥ 10∗0.05 = 43 For proportion n = 40, n1 = 30 ⇒ p̂1 = 30/40 = 0.75, p̂2 = 0.25 α = 0.01 ⇒ z0.995 = 2.58 = 0.005 ⇒ n ≥ E. Jeannot (INRIA) 2.582 ×0.75×0.25 0.005 = 250 The art of computer science analysis May, 10 2011 16 / 19 Comparing System Using Sample Data Outline 1 Comparing System Using Sample Data 2 Conclusion E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 17 / 19 Comparing System Using Sample Data Conclusion Computer-Science is also an experimental science There are different and complementary approaches of doing experiments in computer-science Often, computer-scientists lack of tools and methods to perform insightful experiments: General methodology Performance analysis Statistics and probability Data analysis and representation E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 18 / 19 Comparing System Using Sample Data Further reading E. Jeannot (INRIA) The art of computer science analysis May, 10 2011 19 / 19