Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational complexity The cost of computing “Complexity” in the algorithm analysis context means the cost of a program's execution – (running time, memory, ...) rather than The cost of creating the program – (# of statements, development time) In this context, less-complex programs may require more development time. Complexity--an introduction 2 Time complexity –the time required to solve a problem of a particular size Space complexity—analysis of the computer memory Several questions can be asked about a particular algorithm 1. 2. 3. Is it correct?, i.e., does it do what it claims to do when given valid input data? How long does it take to run? How much space is needed to execute the algorithm? i.e., how much storage is needed to hold data, arrays, program variables, etc. Functions associated with a program Consider a program with one natural number as input. Two functions that can be associated with the program are: f(n), the function computed by the program T(n), the running time of the program Possible size measures for T(n) Total number of bits used to encode the input. Number of data values in the input (e.g. size of an array) The second is viewed as an approximation to the first. Primitive Operations These are operations which we don't further decompose in our analysis. They are considered the fundamental building blocks of the algorithm, e.g. + * - / if( ) Typically, the time they take is assumed to be constant. This assumption is not always valid. Is multiplication really “primitive”? In doing arithmetic on arbitrarily-large numbers, the size of the numerals representing those numbers may have a definite effect on the time required. 2x2 – vs. – 26378491562329846 x 786431258901237 Size of Numerals For a single number (n) input, size of the corresponding numeral is typically on the order of log(n) e.g. decimal numeral encoding: – – size(n) = #digits of n = log10n x = smallest integer > x (called the “ceiling of” x) Asymptotic Analysis Typically in measuring complexity, we look at the growth-rate of the time function rather than the value itself. There is therefore a tendency to pay less attention to constant factors. This is called asymptotic analysis. It is only one part of the story; sometimes constants are important. Step Counting Using exact running time to measure an algorithm requires calibration based on the type of machine, clock rate, etc. Instead, we usually just count steps taken in the algorithm. Often we will assume primitives take one step each. This usually gives us an accurate view of the growth rate. Straight-Line Code x = x + 1; v = x / z; 3 operations, therefore w = x + v; 3 steps Loop Code (These count as steps too.) for( int i = 0; i < n; i++ ) { x = x + 1; v = x / z; w = x + v; n iterations x 5 steps + 1 } = 5n+1 steps Non-Numeric Loop Control for( Polylist L = A; !L.isEmpty(); L = L.rest(); ) { ... loop body ... } # of iterations = A.length() Recursive Code fac(n) = n == 0 ? 1 : n*fac(n-1) 2n +1 steps, if we count multiply and - as steps; Steps are involved in the overhead for function calls too. The number of such steps would be proportional to n in this case. Time and space requirements Ex/ Suppose, given a set X consisting of n elements, some labeled “red”, some labeled “black”. Like to find the number of subsets of X that contain at least one red item A set with n elements has how many subsets? 2^n Hence, at least 2^n steps to execute Infeasible except for small values of n Finding largest value in a finite sequence 1. 2. Algorithm searches sequence S(1),S(2),…,S(N) for the largest value and returns it as LARGE [Initialization] Set I:=1, LARGE:=S(1) // “I”= current element under examination [Find a larger value] If S(I)>LARGE, then LARGE:=S(I) //If larger value is found, update LARGE 1. 3. [Terminate?]If I=N, then STOP//largest value is LARGE [Continue Search]I:=I+1, Go to STEP 2 Execution—24,56,19,99,41 I=1, LARGE = 24 I=2,56>24,LARGE=56 I=3,19<56 I=4,99>56,LARGE =99 I=5, 41<99,DONE Time/space To measure space requirements—determine amount of memory required to hold variables, array elements, etc. To measure time requirements, count the number of instructions executed (# of times each loop is executed, or # of comparisons Usually, interested in complexity of input selected from a particular class Might select input of size n that gives best time or space performance Look at best case, worst case, also average case Definition 4 Let f and g be functions on {1,2,3,…}. Write f(n)=O(g(n)) and say f(n) is of order at most g(n) if there exists a positive constant C such that |f(n)|<=C|g(n)| for all but finitely many positive integers n Ex/n^2 + n + 3 < = 3n^2 +3n^2 + 3n^2=9n^2; C=9, so n^2 + n + 3 = O(n^2) Ex/1 + 2+…+n=n(n+1)/2=(n^2)/2 +(n/2) <= (n^2/2)+(n^2/2)=n^2; deduce 1 + 2+…+n=O(n^2) Ex/3n^3+6n^2-4n+2=? “O” Notation “O” is letter “Oh” (for “order”) This is “big-Oh”; “little-Oh” has a different meaning Used to express upper bounds on running time (number of steps) T e O(f) means that (Think of O(f) as the set of functions that grow no c)( n) T(n) < cf(n) faster than a( constant times f.) A E “O” Notation Constants are irrelevant in “O” comparisons: If g is a function, then any function n => d*g(n), where g is a constant, is e O(g). Examples: – – n => 2.5*n2 e O(n => n2 ) n => 100000000*n2 e O(n => n2 ) Notation Abuse It is customary to drop the n => and just use the body expression of the anonymous function Examples: – – – O(n2) instead of O(n => n2 ) O(n) instead of O(n => n ) O(1) instead of O(n => 1) Notation Abuse Examples: – – – 2.5*n2 e O(n2 ) 100000000*n e O(n) 10000000000 e O(1) Definition 3 ANS: 6n^3+6n^3+6n^3+6n^3=24n^3, so O(n^3) If an algorithm requires f(n) units of time to terminate for output of size n, and f(n) = O(g(n)), then the time required by the algorithm is of order at most g(n) or O(g(n)) Ex/ Suppose an algorithm is known to require exactly n^2 + n + 3 units of memory for an input of size n. n^2 + n + 3id O(n^2), so the space required by this algorithm is O(n^2). Searching an unordered sequence 1. 2. 3. 4. Algorithm searches sequence S(1),S(2),…,S(N) for the first occurrence of KEY and returns its location I. If KEY is not found, the algorithm returns the value 0 [Initialization] Set I:=1 [Not found?] If I>N, set I:=0 and terminate (not successful) [Test]If KEY = S(I), then terminate (search successful) [continue search] If I:=I+1, Go to STEP 2 Analysis of Searching an unordered sequence Best case analysis: If S(1)=KEY, Step 3 is executed once Best case: runtime (O(1)) Worst case analysis: KEY not in sequence, execute N times Worst case: runtime (O(N)) Analysis of Searching an unordered sequence Average: If KEY is found in the Ith position, Step 3 is executed I times If KEY not in sequence, Step 3 is executed N times Hence, average is ((1+2+3+…+N)+N)/(N+1)= ((N(N+1)/2)+N)/(N+1)<= ((N(N+1)/2)+N+1)/(N+1)=N/2 +1<=N+1<=2N, so O(N) is average Analysis of Searching an unordered sequence Normally, choose the best g(N) for O(g(N)) Beware: 300N, 5N^2when N=5, second is better; when N=500, first is better Big Oh Name O(1) Constant O(log_2(log_2(N)) Log Log O(log_2(N)) Logarithmic O(N) Linear O(Nlog_2(N)) NlogN O(N^2) Quadratic O(N^3) Cubic O(N^N) Polynomial Big Oh O(m^N) O(N!) Name Exponential Factorial Examples { void method1(int n) for (int i=1; i <= n; i++) System.out.println(n);} Method1—prints numbers from 1 to N— N steps to complete—O(N) Examples { void method2(int n) for (int i = 0; i < 2*n; i++) for (int j = 0; j < n; j++) System.out.println(i*j);} Method 2– 2 loops—inner and outer. For every fixed i in the outer loop, the inner loop will run from j=0 to j=n-1 in steps of 1. Therefore, for every fixed I there are n steps in the inner loop. Since the outer loop runs from i=0 to i=2n-1, it will execute 2n times. Together with the inner loop, the inside call to S.O.P. executes 2n*n =2n^2 times—O(n^2) Examples void method5(int n) for (int i = -2; i < n; i++) for (int j = 4; j < n-3; j++) System.out.println(j+i);} 2 loops, like method 2—outer loop executes (2+n) times, inner loop executes n-7times. Inner S.O.P executes(2+n)*(n-7) { – { --O(n^2) void method6(int n) for (int i = 0; i < 10; i++) System.out.println(i*n);} Exactly 10 times—O(1) Examples { int method8(int n) int sum = 0; for (int i = 0; i < n; i++) sum += i; return sum;} Adds numbers from 1 to n—n steps—O(n) Examples--recursive void method7(int n) { if (n > 2) { method7(n-2); System.out.println(n);}} Ex/method7(10) calls method7(8) then method7(6) then method7(4) --others—roughly, n/2 times – O(n) Examples--recursive void method3(int n) { if (n > 0) { method3(n-1); System.out.println(n); method3(n-1);}} ·method3(1) calls method3(0) twice, so if n = 1, the method makes two recursive calls ·method3(2) calls method3(1) twice, each of which calls method3(0) twice (as we already figured out); therefore, the method makes 2 * 2 = 4 recursive calls Examples--recursive void method3(int n) { if (n > 0) { method3(n-1); System.out.println(n); method3(n-1);}} ·method3(3) calls method3(2) twice. That method, as we already say, makes 4 recursive calls, so that method3(3) will make 2*4 = 8 recursive calls. ·method3(4) calls method3(3) twice. That method, from before, makes 8 recursive calls, so that method3(4) will make 2*8 = 16 recursive calls. Examples--recursive void method3(int n) { if (n > 0) { method3(n-1); System.out.println(n); method3(n-1);}} Now, however, the pattern should be clear: if the input to method3 is n, the method will make 2n recursive calls. Therefore, this method is O(2n). Again, instead of relying on "pattern recognition" one could use the principle of induction to mathematically prove that this is indeed the proper order of growth. Induction proof for complexity of method3 3 Base: Let n=1; method3(1) calls method3(0) -- prints out 1 – calls method3(0)--then jumps out –2^(1)=2 steps Induction Hypothesis: Assume method3(n) has 2^n steps Induction Step: method3(n+1) makes 2 calls to method3(n), and this has 2^n +2^n=2*2^n=2^(n+1) Examples--recursive void method4(int n) { if (n > 0) { System.out.println(n); method4(n / 2); }} method(16)makes 4 recursive calls to itself; if n=64, calls(32,16,8,4,2,1) Examples--recursive void method4(int n) { if (n > 0) { System.out.println(n); method4(n / 2); }} if the input is a power of 2: if n = 2k, the method calls itself k times. n = 2k == > log_2(n)= k O(log(n )) Summary method1: single loop--O(n) method2: double loop--O(n2) method3: double recursion--O(2n) method4: recursion with half the input O(log(n)) method5: double loop--O(n2) method6: no dependence on n--O(1) method7: simple recursion--O(n) method8: single loop--O(n) Analysis > plot({2^x,x^3,log_2(x),x,1}, x=-10..10,y=-10..10); > Why it Matters Running Time as a function of Complexity log n 3.3219 6.6438 9.9658 13.287 16.609 19.931 log2n sqrt n n 10.361 3.162 10 44.140 10 100 99.317 31.622 1000 176.54 100 10000 275.85 316.22 100000 397.24 1000 1000000 n log n 33.219 664.38 9965.8 132877 1.66*106 1.99*107 n1.5 31.6 103 31.6*104 106 31.6*107 109 n2 100 104 106 108 1010 1012 n3 1000 106 109 1012 1015 1018 2n 1024 1030 10301 3 628 800 9.3*10157 102567 103010 1035659 1030103 10456573 10301030 105565710 n! Values of various functions vs. values of argumen t n. Even a computer a trillion times faster won’t help in this region Allowable Problem Size as a Function of Available Time Time Multipl e 10 100 1000 10000 100000 1000000 log n 1024 1030 10300 103000 1030000 1030000 log2n 8 1024 3*109 1.2*1030 1.5*1095 1.1*10301 sqrt n 100 104 106 108 1010 1012 n 10 102 103 104 105 106 n log n 4.5 22 140 1000 7.7*103 6.2*104 n1.5 4 21 100 210 2100 10000 n2 3 10 32 100 320 1000 n3 2 4 10 21 46 100 2n n! 3 3 6 4 9 6 13 7 16 8 19 9 Increase in size of problem that can be run based on increase in allowed time, assuming algorithm runs prob lem size 1 in time 1. Doubling Input Size Complexity O(1) O(log n ) O(n1/2) O(n) O(n log n) O(n2) O(nk) O(kn) Doubli ng the input causes execution time to stay the same increase by an additive constant increase by a factor of sqrt(2) double double, plus increase by a cons tant factor times n increase by a factor of 4 increase by a factor of 2k square Black-box vs. White-box Complexity Black-box: We have a copy of the program, with no code. We can run it for different sizes of input. White-box (aka “Clear box”): We have the code. We can analyze it. Black-Box Complexity Run the program for different sizes of data set; try to get a fix on the growth rate. What sizes? – An approach is to repeatedly double the input size, until testing becomes infeasible (e.g. it takes too long, or the program breaks). Black-Box Complexity Run on sizes 32, 64, 128, 512, … or 10, 100, 1000, 10000, ... For each n, get time T(n). How can we estimate the order of run-time (e.g. O(n2), O(n3), etc.)? Black-Box Complexity Suppose we are trying to establish correctness of the hypothesis T(n) e O(f(n)) From the definition, we know this means there is a constant c such that for all n, T(n) < c f(n) If our hypothesis is correct, then we expect for all n, T(n)/ f(n) < c We can simply examine this ratio Black-Box Complexity If we see T(n)/ f(n) < c then the hypothesis T(n) e O(f(n)) is supported. If T(n)/ f(n) is approximately a constant as n increases, then the bound appears to be tight. If T(n)/ f(n) decreases as n increase, the bound is loose. If T(n)/ f(n) increases, we don’t have a bound. Algorithm—find the largest element in a sequence of integers 1. 2. 3. 4. Set temp max=first element in the sequence (temporary= largest integer at every stage) Compare next int to temp; if larger than temp, set temp equal to the integer Repeat 2 if there are more integers Stop when no integers are left—temp is the largest in the sequence Analysis of finding the max element of a set with n elements 2 “comparisons” for each term—one to determine that the end of the list has not been reached, and another to determine whether to update the temporary maximum 2(n-1) 2 comparisons each for the second through the nth element 1 to exit the loop 2(n-1) + 1 = 2n-1 Hence, the order of the algorithm is generally polynomial; specifically, it is“linear”—O(n) Consider a linear search for a particular element in a list Worst case analysis: # of comparisons: each step of the loop has 2 comparisons—one to see if we are at the end of the list, and one to compare the element with the terms of the list One comparison is made outside the loop 2i+1 comparisons inside At most—2n+2 when not in the list (2n in loop, 1 to exit loop, one outside of loop) Once again, polynomial; specifically, it is“linear”—O(n)