* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Materials discussed during the two review sessions
Survey
Document related concepts
Transcript
Review Session Jehan-François Pâris Agenda Statistical Analysis of Outputs Operational Analysis Case Studies Linear Regression How to use this presentation Most problems have One slide stating the problem One slide explaining how to solve the problem One slide allowing you to check your answer You will learn more by trying first to do the problems on your own than by reading their solutions Do not forget either to review the problems in the original notes Statistical Analysis of Outputs The big picture The problems Constructing confidence intervals Handling auto correlated data The tools Central-Limit Theorem Wilson’s formula Batch means (and regeneration) RNG tricks Confidence Intervals Distinguish between CIs for means CSIM does it for you CIs for proportions We are on our own Major issue is independence of data points CSIM uses batch means Central Limit Theorem If the n mutually independent random variables x1, x2, …, xn have the same distribution, and if their mean m and their variance s2 exist then … Central Limit Theorem The random variable 1 n xi m n i 1 s n is distributed according to the standard normal distribution (zero mean and unit variance). CI for means (I) For large values of n, the (1-)% confidence interval for m is given by z / 2s z / 2s ,x x n n with F ( z / 2 ) 1 2 CI for means (II) F(z) is taken from a table of the normal distribution F(0.025) For smaller values of n, we have to use Student’s t random variable Wider = 1.96 CIs We replace s by the sample standard deviation s Example We have 100 observations for the waiting time xbar = 4.25 minutes s2 = 25 Example We have 100 observations for the waiting time xbar = 4.25 minutes s2 = 25 Answer is 4.25 ± 1.96 sqrt(25/100) = 4.25 ± 0.98 CI for proportions A proportion represents the probability P(X) for some fixed threshold 97% of our customers have to wait less than one minute Distributed according to a binomial law Use Wilson’s formula Wilson’s formula When n > 29, we can use the Wilson’s interval z2 / 2 z2 / 2 qˆ (1 qˆ ) z2 / 2 qˆ (1 qˆ ) z2 / 2 qˆ z / 2 qˆ z / 2 2 2 2 n n 2 n n 4n q 4n 1 P z2 / 2 z2 / 2 1 1 n n where z/2 = 1.96 for a 95% C.I. Example We have want to estimate the proportion of packets that wait more than four slots 400 observations 40 packets waited more than four slots Answer Divisor: 1 + 1.962/400 1.01 (instead of 1.0096) Central term 0.1 + 1.962/(2×400) 0.105 (instead of 1.048) Half width (0.1×0.9)/400 + 1.962/(2×4002) ) sqrt (0.09/400 + (4/800)/400) 1/20 sqrt (0.09 +0.0025) 0.3/20 = 0.015 sqrt( Result is (0.105 ± 0.015)/ 1.01 = 0.104 ± 0.015 Batch means (I) Simulation data are often autocorrelated Packet delays in ALOHA Waiting times in queues … Batch means reduce (but do not completely eliminate) that effect Batch means (II) Group measurements into fixed-size batches of consecutive data Compute mean of each batch If batches are large enough, these means will be independent Can use standard-limit theorem, … In case of doubt, compute autocorrelation function for successive batch means Regeneration (I) The idea Partition simulation data into intervals such that Data measured inside the same interval might be correlated Data measured in different intervals are independent Regeneration (II) How? System goes to a regeneration point each time Its queues become empty All the disk drives are operational … Criterion is system specific Streams When you want to evaluate two different configurations of a system, it is often good idea to use separate random number streams for arrivals and service times Arrival times remain unchanged when we change other parameters of the system Operational Analysis Single server (I) We can measure T the length of the observation period A the number of arrivals during the observation period B the total amount of busy times during the observation period C the number of completions during the observation period Single server (II) We can compute l = A/T X = C/T U = B/T S = B/C There are two ways to compute U U the arrival rate the output rate the utilization the mean service time = B/T = (C/T )(B/C) = XS In general A C and l X Little’s law If W is the total time spent by all tasks inside the system over the observation period, then N = W/T R = W/C Since W/T = (C/T)(W/C) = XR, N = XR This is important A problem An ice-cream parlor Observed during 6 hours Visited by 120 customers Spend an average of 24 minutes inside What is the average number of customers inside the parlor? Answer We compute X and apply Little’s Law Answer We compute X and apply Little’s Law X = 120/6 = 20 customers/hour R = 24 minutes = 0.4 hours N = XR = 8 customers If you did not get it The 120 customers sent a total of 120×24 customer×minutes or 48 customer×hours in the parlor 48 customer×hours/6 hours = 8 customers Same as having 8 customers spending six hours each inside the parlor Network of servers (I) Open network Arrivals Departures Network of servers (II) Closed network Arrivals Departures Operational Quantities Keep same quantities as before but add indices 0 for whole system k for individual servers Two changes We never care about the utilization of the whole system We add number of visits Vk of each server Operational quantities Over the observation period, we measure C = the number of job completions Ck = the number of tasks completed by device k We define X0 = C/T = the system throughput Xk = Ck/T = the output rate at server k Vk = Ck/C = the visit count at server k Important relationships Ck = VkC Since each job requires Vk visits, there are Vk more server completions than job completions Xk = Vk X0 Same property applies to throughputs ni System response time (I) We define Nbar = average number of jobs in the system nbari = average number of jobs at device i Nbar = Σi nbari System response time (II) Applying Little’s law, we have R = Nbar/X0 and nbari = RiXi = RiViX0 Hence R = Σi ViRi Note This result is trivial The total time spent by a job in the system is the sum of the times spent at each server This includes the time spent waiting in the server queues Problem 1 A job requires 100 ms of CPU time 9 disk accesses Each disk access takes 7 ms We want VCPU and SCPU Answer We now that jobs get CPU first and last VCPU = 10 Then SCPU = 100/10 =10s Bottleneck analysis (I) A system has one CPU and one disk drive It processes transactions such that VCPU = 12 and SCPU = 5ms VDisk = 11 and SDISK = 8ms What is the maximum system throughput? Bottleneck analysis (II) We compute first the maximum device throughputs Maximum XCPU = 1/0.005 = 200 requests/s Maximum Xdisk = 1/0.008 = 125 requests/s Since Xi = Vi X0 Maximum throughput compatible with CPU workload is 200/12 = 16.7 transactions/s Maximum throughput compatible with disk workload is 125/11 = 11.4 transactions/s Bottleneck analysis (III) The disk is this the bottleneck It has highest ViSi product Identifying feature of any bottleneck device Increasing the system throughput might require Sharing disk requests with a second disk Increasing the efficiency of the system I/O buffer Problem 2 In the previous example, which device was the bottleneck? What would be the throughput of the system if the bottleneck utilization was 80%? Answer We compare VCPUSCPU VdiskSdisk Answer We compare VCPUSCPU = 100ms VdiskSdisk = 9×7 = 63 ms The CPU is the bottleneck Answer If the bottleneck was operating at 100% utilization, It could process one job each VCPUSCPU time units Or 1/(VCPUSCPU) job per time unit At UCPU utilization, It will process UCPU/(VCPUSCPU) job per time unit Answer X0 = UCPU/(VCPUSCPU) = 0.80/0.10 seconds 8 jobs/second Systems with terminals Whole system M Terminals Interactive response time formula We have M terminals Think time Z between the completion of a job and the submission of the next job Applying Little’s law to the whole system M = (R + Z ) X0 then R = M/X0 – Z Very Important Problem 3 We have M = 50 users Z = 20 s X0 = 2 transactions/s What is the system response time? Answer We apply R = M/X0 – Z Answer We apply R = M/X0 – Z and obtain R = 50/2 – 20 = 5 seconds Problem 4 A system Processes 5 transactions/seconds Has 60 users Achieves a response time of 4 seconds What is the think time? Answer We apply R = M/X0 – Z, Z = M/X0 – R Answer We apply R = M/X0 – Z, Z = M/X0 – R = 60/5 – 4 = 8 seconds Problem 5 We have M = 50 users Z = 20 s R = 4 s What is the system throughput? Answer From R = M/X0 – Z, we have X0 = (R + Z)/M Hence X0 = (20 + 4)/50 = 0.48 tasks/s Problem 6 A system Can process up to 4 transactions/second Has 60 users User think time is 12 seconds Can the system achieve a response time of 2 seconds? Answer Applying R = M/X0 – Z, we compute a lower bound for the response time Rmin = M/X0,max – Z Answer Applying R = M/X0 – Z, we compute a lower bound for the response time Rmin = M/X0,max – Z = 60/4 – 12 = 3 seconds Answer is no Problem 7 Compute the response time of a system knowing the following parameters M = 50 users Z = 15 s VCPU SCPU = 200ms UCPU = 50% Answer Since Xk = Uk /Sk and Xk = VkX0, X0 = Uk /(VkSk) The response time is then given by R = M/X0 – Z Answer Let us compute first the throughput X0 Applying X0 = Uk/(VkSk) X0 = 0.50/0.200 = 2.5 interactions/s The response time is then R = M/X0 – Z = 50/2.5 – 15 = 5 s Simulation Case Studies A simple reminder If interarrival times are Independent identically distributed (i. i. d.) According to an exponential law then the probability of having exactly n arrivals during a fixed interval is distributed according to a Poisson law Explanation (II) Assume that The probability of one arrival during a small interval Dt is lDt The probability of two arrivals during the same small time interval is negligible lDt lDt lDt lDt lDt lDt Explanation (I) The probability of having exactly k arrivals during n slots is n k nk (lDt ) (1 lDt ) k What would happen if the number of time intervals goes to infinity while their total duration T = nDt remains constant Explanation (III) We rewrite the previous expression as n! lT k lT n k ( ) (1 ) k!(n k )! n n n! (lT ) lT n lT k ( 1 ) ( 1 ) k n (n k )! k! n n k and compute separately the limits of its four factors Explanation (IV) lim n n! n(n 1)...( n k 1) 1 k k n (n k )! n (lT ) remains unchanged k! lT n lT lim n (1 ) e n lT k lim n (1 ) 1 n k Explanation (V) We obtain the Poisson distribution ( l T ) lT e k! k The probability that there are no arrivals in the same time interval T (or in any time interval T) is e lT Explanation (VI) This last expression is the probability that the time interval between two consecutive arrivals is greater than T The probability that the time interval between two consecutive arrivals is equal or lesser than T is 1 e lT which is the cdf of the exponential distribution A final observation Use the Poisson distribution to generate number of arrivals during a time interval Use the exponential distribution to generate interarrival times Linear Regression Most important point Compute a regression line Compute regression coefficient Example x[i] y[i] 1.5 3 2.4 4.8 3.5 9 4.4 11 5.1 13.7 Linear Regression We have one independent variable One dependent variable We must find Y = + bX minimizing the sum of squares of errors Si (yi - - bxi)2 Formulas b x y n x x ni xi yi i i 2 2 i y bx i i i i i Calculations (I) Sum Average Squared Sum x[i] y[i] x[i]^2 x[i]y[i] 1.5 3 2.25 4.5 2.4 4.8 5.76 11.52 3.5 9 12.25 31.5 4.4 11 19.36 48.4 5.1 13.7 26.01 69.87 16.9 41.5 65.63 165.8 3.38 8.3 285.6 Calculations (II) Sum Average Squared Sum n beta alpha x[i] y[i] x[i]^2 x[i]y[i] 16.9 41.5 65.63 165.8 3.38 8.3 285.6 5 3 -1.84 Outcome 14 12 10 8 6 4 2 1 2 3 4 5 6 More notations S xx (x x) 1 x n x ( x x ) ( y y ) 2 i i 2 2 i i i i S xy S yy 1 i xi yi i x i i y i n 2 i ( yi y ) 1 2 2 i yi i y i n i i i i More notations (II) Solution can be rewritten b S xy S xx y bx Coefficient of correlation r S xy S xx S yy bS xx S xx b S yy S xx S yy r = 1 would indicate a perfect fit r = 0 would indicate no linear dependency Calculations Sum Squared Sum n Sxx Syy Sxy r x[i] y[i] x[i]^2 y[i]^2 x[i]y[i] 16.9 41.5 65.63 421.7 165.8 285.6 1722 5 8.508 77.28 25.52 0.995