Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Functional decomposition wikipedia , lookup
List of important publications in mathematics wikipedia , lookup
Halting problem wikipedia , lookup
Proofs of Fermat's little theorem wikipedia , lookup
Function (mathematics) wikipedia , lookup
Collatz conjecture wikipedia , lookup
History of the function concept wikipedia , lookup
Dirac delta function wikipedia , lookup
SQUARE 2 MAGAZINE Number 17. December 2008. Welcome to our latest edition. We remind you that news about what you are doing should be sent to the Editor, David Penman, email address [email protected] Note that the web address has changed slightly in the not-too distant past: it is now http://www.essex.ac.uk/maths/dept/square2/index.shtm. You will be able to access all copies of the magazine there. Departmental News. Several postgraduate students in the Department have recently been approved for their Ph.Ds. (In some cases, the award may yet be subject to minor corrections to the thesis). Rong Gao, who was an undergraduate student in the Department, was recently awarded her Ph.D for a thesis entitled “Some colouring problems for pseudo-random graphs”. Dr. Penman was her supervisor. Also recently awarded a Ph.D. (to go with an earlier one in geophysics) is Dan Brawn for a thesis on fitting gamma distributions to observed drop size distributions. Prof. Upton was his supervisor. Tim Earl was jointly supervised with Biology, and his thesis was on levels of nutrients in water. He was supervised by Prof. Upton and Prof. Nedwell (Biology). Caroline Johnston has also recently received her Ph.D for a thesis about microarray analysis. Dr. Harrison was her supervisor. Wavelets and their applications. The aim of this article is to introduce, in a rather informal way, the theory of wavelets which has been of some importance recently. We will also talk about some of the areas in which these ideas are applied. The main sources culled for this article are the websites http://www.amara.com/IEEEwave/IEEEwavelet.html and http://www.pacm.princeton.edu/~ingrid/parlez-vous%20wavelets.pdf for basics, with some additional corroborative detail taken mainly from the textbook by Daubechies. These websites suggest further reading too. Once upon a time there was Fourier analysis. Remember the idea: Suppose you are given a function (in engineer-speak, a signal) f (t ) , which has period (let us say) 2π - the period doesn’t really matter, it’s just a matter of scaling. What functions can you think of with period 2π ? If you don’t think of cos(nt ) and sin(nt ) for integers n, please change to Media Studies: if you don’t then think of linear combinations of the functions, please take a course in linear algebra. At this point, it becomes rather more justifiable to run out of ideas: the rough idea is that all “sensible” periodic functions can be obtained as infinite sums of these functions with appropriate weights. If we are going to write ∞ f (t ) = a 0 + ∑ (a n cos(nt ) + bn sin(nt )) (*) n =1 then (if one is suitably cavalier about exchanging infinite sums and integrals, and uses standard trigonometric identities), it is not hard to show that we “must” have (for n>0) an = 1 π∫ 2π 0 f ( x) cos(nx)dx, bn = 1 π∫ 2π 0 f ( x) sin( nx)dx, a 0 = 1 2π ∫ 2π 0 f ( x)dx One can then show various rigorous results on when this series on the right-hand side of (*) converges, and whether it converges to f (t ) : we will mostly avoid concerns of rigour in this article, but cannot resist mentioning a key (and hard) result of Carleson, proved as recently as 1966, that for “square-integrable” functions on [0,1] (a class which (comfortably) includes continuous functions) the partial sums converge almost everywhere (i.e. except on a negligible (measure zero) set) to the original function. Thus the rough idea is largely true, and in some sense reduces the study of the (possibly very complex) function to understanding the simpler set of coefficients a n . (One can do better for some special functions, e.g. for a function which is twice differentiable with continuous second derivative, the Fourier series converges (uniformly) for all values of t, and we will work through this later). The next thing one normally looks at is the Fourier transform for a general (not necessarily periodic) signal f (t ). This is (ignoring detailed convergence questions: up to small tweakings of sign conventions and constant factors: of course i = − 1) ) 1 (ℑf )(ω ) = ∫ ∞ f (t )e −iωt dt . 2π This is the same idea: it is changing round to representing the frequency content of the function f. Note that if f is only non-zero on [0,2π ] then there is a very simple relationship between the numbers ℑf (n) for integers n and the a n and bn in the Fourier series before. −∞ A feature in Fourier analysis is that the Fourier transform of a highly peaked function − ax 2 tends to be flat and vice versa. For example, if f ( x) = e is (a constant times) the probability density function of a normal with mean 0 and variance 2/a, a function which gets more and more sharply peaked as a → ∞, then we can easily check that 1 2π (ℑf )(ω ) = = 1 2π ∫ ∞ ∫ ∞ −∞ e −at e −iωt dt = 2 e −a (t +iω /( 2 a )) e −ω 2 −∞ 2 / 4a2 dt = 1 2π ∫ ∞ e −a (t +iω /( 2 a )) e a( iω / 2 a ) dt 2 2 −∞ 1 −ω 2 / 4 a 2 e 2a where you are urged to check the details of the argument, familiarity with simple properties of normals should see you through. This of course will be very flat as a → ∞. This feature can be problematic: it is not easy to read off information about “blips” in a graph from the Fourier coefficients. (A blip is obvious in the graph of the original function/signal, but if you are looking at the coefficients it will, as the transform of the blip will be nearly flat, it will be hard to detect). Partly, this is dealt with by “time-localisation”; “windowing” the function by instead choosing a “window function” g and calculating the windowed Fourier transform ∞ (ℑ win f )(ϖ , t ) = ∫ f ( s ) g ( s − t )e −iωs ds . −∞ The naïve guess for the window function g might well be a function which is (say) one on some interval [a,b] and zero elsewhere, but this upsets the smoothness of the functions and so smoother window functions are desirable. Certainly it is desirable that both g and ĝ should be concentrated near zero, as then one can (informally) say that (ℑ win f )(ϖ , t ) will provide a description of the function near time t and frequency We now turn (at last…) to the wavelet transform. The absolutely basic summary of a wavelet transform is that it allows one to cut up data into different frequency components, and then analyse each component according to its scale. In mathematical terms, we will have (ℑ wav f )(ϖ , t ) = 1 ϖ ∫ ∞ −∞ f ( s )ψ ( s −t ϖ )ds for a well-chosen function ψ (t ) with the property that ∫ ∞ −∞ ψ (t )dt = 0. We deliberately obfuscate for now the choice of ψ , which rejoices in the (sexist) title of the mother wavelet, but the flexibility will be part of the point. The functions s−b 1 ψ a ,b ( s ) = ψ( ) are the eponymous wavelets. The rough idea is that the time a a analysis is carried out with a contracted, high-frequency version of the wavelet and frequency analysis with a dilated, low-frequency version of the wavelet. Crudely, one can see both the wood and the trees. There are both similarities and differences between the wavelet transform and the Fourier transform. Both involve simplifying (in some sense) the description of a function by turning it into a simpler (in some ways) object. Both are, for those of a puremathematical bent, inner products of the function f being analysed with a family of functions with two labels. Both have well-defined, and well-understood, inversion formulae. Very importantly from the point of view of applications, both have discretized versions (the “discrete Fourier transform” and “discrete wavelet transform”) to use in practice, and even more importantly, these can be computed quickly and efficiently. (In the case of the Fourier transform, this is the Fast Fourier transform, which could easily fill an edition of Square2 all by itself: for now, we merely note that the key idea in speeding up the process, due to Cooley and Tukey in the 1960s, of expressing the discrete Fourier transform of “size” N = N 1 N 2 (we omit the detailed definition of size) in terms of discrete Fourier transforms of “size” N 1 and N 2 , was little more really than a trick used by Gauss in 1805 to interpolate the orbits of the asteroids Pallas and Juno. In fact the idea had also been discovered before Cooley and Tukey by Good and Yates in the context of experimental design and by Danielson and Lanczos in the context of X-ray scattering: see http://www.wisdom.weizmann.ac.il/~naor/COURSE/fft-lecture.pdf. We are digressing…) There are, however, also key differences. A principal one is that wavelet functions are localized in space, whereas the sine and cosine functions used in Fourier analysis are not. This means that often, when we take the wavelet transform, we end up with functions which are, in some sense, “sparse” – this in turn leads to the applications we shall discuss below. Historically, the first occurence of a wavelet was by Haar in (an appendix to) his thesis in 1909. (Yes, the same Haar, for those in the know, as the invariant measure on locally compact topological groups: see http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Haar.html). By about the 1980s, mathematician Yves Meyer was interested in the problem, as were geophysicist Jean Morlet (who was trying to model seismic phenomena – an archetypal home of “sudden shock” behaviour). Indeed Meyer constructed a wavelet with two good properties that the mother function is “smooth” (basically differentiable infinitely often) and orthogonality (i.e. that the actual wavelets are orthogonal functions). Eventually Belgian mathematician Ingrid Daubechies extended the ideas of Meyer to get a family of wavelets which were not only smooth and orthogonal, but also had compact support. These points are very helpful in getting the theory to work in a slick way. What does one do with these things? One absolutely basic idea is to use them to “get rid of noise”. This follows on from the point above about being able to look at things on the scale appropriate to them. Take a wavelet transform of your original signal (function). Take the attitude that wavelet coefficients (the discretization referred to above is tacitly being used here) less than a certain magic number – the “threshold” – are just “noise” and should be set equal to zero. Take the inverse wavelet transform – hopefully you should have a “cleaned-up” version of the signal/function which you are better placed to look at. This applies in particular in music: for example, researchers at Yale University took a very old recording (made by the inventor Thomas Edison) of Brahms playing, in 1889, one of his Hungarian Dances 1 . It was recorded on a wax cylinder which partially melted: this, plus the low-tech available at the time, made it extremely hard to hear Brahms. With the wavelet techniques, it became much clearer. Similarly, one can remove noise from visual images - the FBI in the United States is using a wavelet-based standard for computerising its fingerprint files (a difficult process, as obviously the detail in the images cannot be lost but at the same time the image has to be compressed if it is to be sent by computer). An interesting recent idea is an attempt to distinguish the paintings of Van Gogh from those of other famous painters by examining the wavelet transforms of the paintings. (A 1 In general, it is striking how many of the great composers of the late 19th century and very early 20th century did leave some, albeit usually rather rudimentary, recordings in which they play, or record, their own works. . The list includes, in addition to Brahms, (in no particular order) Mahler, Fauré, Verdi… Slightly later, there are various recordings by Debussy and Richard Strauss: Elgar conducted some of his pieces more than once (with notably different interpretations on the two occasions….). Some of the famous performers from early recording technology include violinist Joseph Joachim (for whom Brahms wrote his Violin Concerto). You can apparently hear the Brahms recording we talk about in the main article at http://www.youtube.com/watch?v=BZXL3I7GPCY Note that crackly old recordings also extend to (e.g.) Browning and Tennyson reading their own poetry: see http://www.poetryarchive.org/poetryarchive/historicRecordings.do We are digressing….. painting is of course a 2-dimensional signal, if one measures each colour by (say) its frequency). It has been a cliché amongst many people interested in art history for some time that Van Gogh’s paintings “feel different” from those of most other painters. See http://www.pacm.princeton.edu/~ingrid/VG_swirling_movie/ for this: there does seem to be some evidence that Van Gogh’s paintings are revealed as “different” from those of a number of other painters. Quite how far this idea will run is not yet clear, but again the basic idea of removing unimportant noise to find “hard-core structure” seems, at least in this case, to yield some suggestive results. Another key area is in quantum mechanics – indeed, one of the original impetuses for the development of the theory was from the effort to understand coherent states. For a comparatively straightforward example in this direction, look at http://arxiv.org/abs/cond-mat?papernum=9511063 Applications of any particular piece of theory, of course, include applications in other parts of mathematics. We indicate only two areas where wavelet ideas have been used. One of these is in so-called discrepancy theory. Suppose we have a collection of n points {u i }1n in the unit square in 2 dimensions, so that u i = (u i (1) , u i ( 2 ) ) with u i ( j ) ∈ [0,1]. How well spread out can these points be: or, essentially equivalently, how far can the number in a given rectangle vary from the number of them that one would expect to find in that rectangle? For α = (α 1 , α 2 ) ∈ [0,1] × [0,1] define D(α ) =| {u i }1n | ∩[0, α 1 ] × [0, α 2 ] | −nα 1α 2 , i.e. the actual number in the bottom left-hand corner rectangle minus the number one would expect to be in that rectangle if they were evenly spread out. The basic lower bound estimate on this problem was proved by K. F. Roth: he showed that log(n) = O( ∫ ∫ D(α ) 2 dα 1 dα 2 ) Montgomery (http://www.nato-us.org/analysis2000/papers/montgomery.pdf) states that the proof is “ a construction reminiscent of wavelets”, and also refers to work on related problems by Pollington which does use wavelets. A second area of application to other parts of mathematics is in numerical analysis, especially solving (partial) differential equations. To some extent the idea of looking on things at several scales (“multiscale methods”) had already taken hold in the numerical analysis community, and perhaps the progress here has been less striking than in the fields of image processing discussed earlier, but the fundamental idea of approximating a function by a small number of coefficients is of course a staple in approximation theory and numerical analysis, and there do seem to be some cases where the wavelet mentality helps. Problems Corner Recall our problems from last time: Problem 1. (This arises from the article on Borsuk’s problem earlier in Edition 16). Show carefully (using sups etc. correctly) that a 1-dimensional set of diameter 1 can be written as the union of 2 sets of diameter strictly less than 1. Not hard, but try to make sure you get all the details in. Solution. Since the diameter is finite, the set (S, say) must in particular be non-empty. (Otherwise the supremum in the definition of diameter would be over an empty set and by convention the supremum of an empty set is minus infinity). Let s ∈ S : as the diameter is 1, we have d ( z , s ) ≤ sup d ( x, y ) = 1 for all z ∈ S . (Here x , y∈S of course d is just the normal distance between two points on the line). In particular S ⊆ [ s − 1, s + 1] so is bounded. Thus (as S is non-empty and bounded) it has a (finite) infimum m and a (finite) supremum M. The idea is now basically that we will partition S into the two sets S ∩ [m, (m + M ) / 2) and S ∩ [(m + M ) / 2, M ] (it is obvious that this is indeed a partition of S) and show that both these sets are indeed of smaller diameter. Note that m < M as otherwise S would consist only of the common value m = M so would not have diameter 1. Next note that, given ε > 0, there is x ∈ S such that x ≤ m + ε and y ≥ M − ε . Since m < M , we can by taking ε small enough, ensure that x < y. Thus x − ε ≤ m and y + ε ≥ M . Thus d (m, M ) ≤ d ( x − ε , y + ε ) = ( y + ε ) − ( x − ε ) = y − x + 2ε = d ( x, y ) + 2ε ≤ 1 + 2ε where in the last line we used the fact that S has diameter 1. This holds for all ε > 0 so m+M M −m 1 we conclude that d (m, M ) ≤ 1. Since m < M , we have d (m, )= ≤ and 2 2 2 m+M M −m 1 d( ,M) = ≤ . All claims have now been proved. 2 2 2 Problem 2. (Again, arising from the article). Prove that, for non-negative integers k ≤ n ⎛ n ⎞ ne we have ⎜⎜ ⎟⎟ ≤ ( ) k . k ⎝k ⎠ ⎛ n⎞ nk [Hint: First prove (easily) ⎜⎜ ⎟⎟ ≤ ( ) . Then think about the series for exponential and k! ⎝k ⎠ how it might be relevant]. Solution. For the first step, note that, just crudely bounding each n − j above by n, we have ⎛n⎞ n! n(n − 1)(n − 2)....(n − k + 1)(n − k )! nnn...n(n − k )! n k ⎜⎜ ⎟⎟ = = ≤ ≤ . k!(n − k )! k!(n − k )! k! ⎝ k ⎠ k!(n − k )! Thus it remains to prove that ne nk kk ≤ ( ) k , or equivalently (noting that n and k are non-negative) that ≤ ek . k! k k! ∞ Now recall that e x = ∑ n =0 xn . Thus, taking x = k , one of the terms in the series for e k is n! k kk . Since all the terms in the series are non-negative, we deduce ≤ e k as required. k! k! k Problem 3. (Unsolved problem). Is, to the best of my knowledge, still unsolved. New Problems As usual, we do not want to know about your solutions to Problems 1 and 2 which are standard (in a sense). Your valid solutions to Problem 3 are very welcome, but the Editor will not be holding his breath Problem 1. (This arises from the article about Fourier analysis/wavelets earlier). Show that a twice continuously differentiable function (on [−π , π ] , say) 2 has a Fourier series which is convergent everywhere. Somewhat more precisely, show that if f if a function on [−π , π ] and ak = 1 π π ∫ f (t ) cos(kt )dt −π for k ≥ 0 , b k = then we have that, for any x ∈ [−π , π ] 1 π π ∫ f (t ) sin(kt )dt for k ≥ 0 , −π a0 + a1 cos( x) + a 2 cos(2 x) + ... + b1 sin( x) + b2 sin( 2 x) + .... 2 2 When one has the result for this interval, it is obviously very easy to move it to any other interval of your choice. is a convergent function and that it converges to the original function f. You should adapt the level of rigour of your proof to the level you can cope with, and if you know about uniform convergence you should show that the convergence is also uniform. Hint: Use integration by parts (those acquainted with rigour: why is this justified?) to show that having a continuous derivative implies −1 π ' f (t ) sin(kt )dt k ∫−π and that having a continuous second derivative implies that πa k = π ∫π f ' (t ) sin( kt )dt = − 1 π " f (t ) cos(kt )dt k ∫−π Now let M = max | f " (t ) | . (Why does this maximum exist?). Deduce an upper bound on [ −π ,π ] π 2M . A similar bound holds for | bk |: now how do k2 we deduce that the series is convergent? (Rigourists: why is it uniformly convergent?). ∫π f − " (t ) cos(kt )dt and so that | a k |≤ You should note that we have not yet proven that it is convergent to the original function – for this, the usual approach is to say that we then take the Fourier series of the function g ( x) = a0 + a1 cos( x) + a 2 cos(2 x) + ... + b1 sin( x) + b2 sin( 2 x) + .... 2 we have just shown is convergent and show that its Fourier coefficients are the same as those of the original f. Now think why this should (with the above restrictions on the functions) lead to the two functions being equal. Those doing it rigorously will have a number of things to justify en route…. Problem 2. This idea is due to the Hungarian mathematician György Pólya (usually anglicised as George Pólya: http://www-gap.dcs.st-and.ac.uk/~history/Biographies/Polya.html). We are going to give an alternative proof of the fact that there are infinitely many primes, using Fermat numbers. Recall that the Fermat numbers are defined by Fn = 22 + 1. n Readers of Square2 should not need reminding that this is, so to speak, Fn = 2( 2 ) + 1 n rather than the much smaller number (22 ) n + 1. For example, F0 = 3, F1 = 5, F2 = 17, F3 = 257, F4 = 65337 . These numbers are (with varying degrees of ease) all checked to be prime numbers, and Fermat seems to have conjectured that all these numbers are prime. However Euler observed that F5 = 2 32 + 1 = 4294967297 = 641 × 6700417 and it turns out that the next few Fermat numbers known are not prime. It remains an open problem whether there are infinitely many Fermat numbers which are prime, or indeed whether there are infinitely many Fermat numbers which are composite, and this seems very hard. However, our problem for today is much simpler, namely use the Fermat numbers to show that there are infinitely many primes. Try doing this one without the hints first. Hints: explain first why it is enough to show that the numbers Fn and Fm for m ≠ n, are coprime. Without loss of generality, n > m. Consider Fn − 2 . Factorise (hard). What small number would any common divisor of Fn and Fm have to divide? Finish off. For bonus marks, those who know a little more number theory might like to upgrade their proof to a proof that there are infinitely many prime numbers congruent to 1 mod 4. You will need to know about the quadratic character of -1 (including what this means…). (There are many proofs that there are infinitely many primes – see http://primes.utm.edu/notes/proofs/infinite/ for a list of some of them, and the book by Ribenboim for more). Problem 3 (unsolved problem). If you, like many of us, are frequently tempted to believe that everything about finite-dimensional linear algebra is already known, consider the following simple-to-state conjecture due to G. C. Rota. Problem. Suppose V is an n-dimensional vector space. Then suppose B1 , B2 ,....Bn are n disjoint bases of V. Then there are n disjoint bases C1 , C 2 ,...C n of V such that | C i ∩ B j |= 1 for all i and j. An equivalent formulation is: Suppose given any n 2 vectors in V , and that one can arrange them as an n × n matrix in such a way that each column is a basis. Then the entries inside the columns can be permuted in such a way that each row is also a basis. There are slightly more general forms of the conjecture too. Those who have the word “transversal” in their vocabulary can reformulate the conjecture in that language. Those who have the word “matroid” in their vocabulary can ask the more general question in that language. However the problem can be savoured and thought about even without these refinements. The conjecture is true for n = 1 and even you lot should be able to do that one. The case n = 2 is also quite easy: Let the first basis be {e1 , e2 } and the second { f1 , f 2 }. The basic observation to bear in mind is that in a 2-dimensional vector space, given any vector v, any vector which is not a scalar multiple of v can be used to extend v to a basis. Now if f 1 is a scalar multiple of e1 then f 2 cannot be a scalar multiple of e1 (otherwise the span of { f1 , f 2 } would just be 1-dimensional). Thus {e1 , f 2 } is a basis. It now remains (in this case) to show that {e2 , f 1 } is a basis: for this, if it was not, then we have both f1 is a scalar multiple of e1 and f 2 is a scalar multiple of e2 , so not of e1 and then we could just take {e1 , f 2 } and {e2 , f 1 } as the bases. The other case is when f1 is not a scalar multiple of e1 : in that case {e1 , f 1 } is a basis and the only way in which {e2 , f 2 } could fail to be a basis is if e2 is a scalar multiple of f 2 . However in that case we can just take {e2 , f 1 } and {e1 , f 2 } to be the bases as before. However even the case n = 3 appears to be non-trivial: see http://www-math.mit.edu/~tchow/dinitz.pdf In general, it is known that one can (thinking in terms of permuting the vectors in the columns to make as many rows as possible bases too) that one can obtain n of the rows being bases. The case where n is even and the field over which the vector space is defined has characteristic zero, i.e. no finite subfield, is a consequence of the following conjecture: For a positive even integer n, we consider Latin squares of order n, i.e. n × n matrices all of whose entries are from {1,2,...n} with, for each 1 ≤ i ≤ n , precisely one occurrence of i in each row and precisely one occurrence of i in each column. A Latin square is either even or odd. To make clear which is which, consider each row of the Latin square as a permutation (we are using here notation and ideas discussed in Square 2 Edition 11): it is then a product of disjoint cycles, and the sign of a permutation is the product of the signs of its disjoint cycles, and cutting a long story short, the sign of a cycle (a1 ,...a r ) is − 1 if r is even and 1 if r is odd). If we then take the product of the signs of all the rows of our Latin square, we will clearly get 1 or − 1 . If we get 1, the Latin square is even: if − 1 , the Latin square is odd. We then have the following Conjecture For even positive n, the number of even Latin squares of order n and the number of odd Latin squares of order n are not equal. As just noted, this conjecture would imply that the case n even and characteristic zero of Rota’s conjecture is true. The problem turns out to be closely linked with certain combinatorial problems related to the so-called Combinatorial Nullstellensatz developed by Noga Alon and co-workers (another topic which could easily fill an edition of Square 2…), and this link allows one to prove the conjecture whenever n = 2 r p or 2 r ( p + 1) for an odd prime number p. See http://garden.irmacs.sfu.ca/?q=op/rotas_basis_conjecture by Matt de Vos for more information on this.