Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Smoothness of Densities on Compact Lie Groups David Applebaum, School of Mathematics and Statistics, University of Sheffield, Hicks Building, Hounsfield Road, Sheffield, England, S3 7RH e-mail: D.Applebaum@sheffield.ac.uk Abstract We give necessary and sufficient conditions for both square integrability and smoothness for densities of a probability measure on a compact connected Lie group. Keywords and Phrases. Lie group, Haar measure, unitary dual, Fourier transform, convolution operator, Lie algebra, weight, Sugiura space, smooth density, central measure, infinitely divisible, deconvolution density estimator. AMS 2000 subject classification. 60B15, 43A05, 43A25, 43A77, 60E07 1 Introduction The study of probability measures on groups provides a mathematical framework for describing the interaction of chance with symmetry. This subject is broad and interacts with many other areas of mathematics and its applications such as analysis on groups [19], stochastic differential geometry [6], statistics [5] and engineering [4]. In this paper we focus on the important question concerning when a probability measure on a compact group has a regular density with respect to Haar measure. We begin by reviewing work from [1] where Peter-Weyl theory is used to find a necessary and sufficient condition for such a measure to have a square-integrable density. This condition requires the convergence 1 of an infinite series of terms that are formed from the (non-commutative) Fourier transform of the measure in question. We also describe a related result from [2] where it is shown that square-integrability of the measure is a necessary and sufficient condition for the associated convolution operator to be Hilbert-Schmidt (and hence compact) on the L2 -space of Haar measure. In the second part of our paper we turn our attention to measures with smooth densities. A key element of our approach is the important insight of b of the group G can be parameterised Hermann Weyl that the unitary dual G b to investigation by the space of highest weights. This effectively opens up G by standard analytical methods. We introduce Suguira’s space of rapidly decreasing functions of weights which was shown in [18] to be topologically isomorphic to C ∞ (G). We are then able to prove that a probability measure has a smooth density if and only if its Fourier transform lives in Suguira’s space. This improves on results of [3] where the Sobolev embedding theorem was used to find sufficient conditions for such a density to exist. In the last part of the paper we give a brief application to statistical inference. In [13], Kim and Richards have introduced an estimator for the density of a signal on the group based on i.i.d. (i.e. independent and identically distributed) observations of the signal after it has interacted with an independent noise. To obtain fast rates of convergence to the true density, the noise should be in a suitable “smoothness class” where smoothness is here measured in terms of the decay of the Fourier transform of the measure. We show that the “super-smooth” class is smooth in the usual mathematical sense. 2 Fourier Transforms of Measures on Groups Throughout this paper G is a compact connected Lie group with neutral element e and dimension d, B(G) is the Borel σ-algebra of G and P(G) is the space of probability measures on (G, B(G)), equipped with the topology of weak convergence. The role of the uniform distribution on G is played by normalised Haar measure m ∈ P(G) and we recall that this is a bi–invariant measure in that m(Aσ) = m(σA) = m(A), for all A ∈ B(G), σ ∈ G. We will generally write m(dσ) = dσ within integrals. Our main focus in this paper is those ρ ∈ P(G) that are absolutely continuous with respect to m and so they have densities f ∈ L1 (G, m) satisfying ∫ ρ(A) = f (σ)dσ, A 2 for all A ∈ B(G). A key tool which we will use to study these measures is the non-commutative Fourier transform which is defined using representation theory. We recall some key facts that we need. A good reference for the material below about group representations, the Peter-Weyl theorem and Fourier analysis of square-integrable functions is Faraut [7]. If H is a complex separable Hilbert space then U(H) is the group of all unitary operators on H. A unitary representation of G is a strongly continuous homomorphism π from G to U(Vπ ) for some such Hilbert space Vπ . So we have for all g, h ∈ G,: • π(gh) = π(g)π(h), • π(e) = Iπ (where Iπ is the identity operator on Vπ ), • π(g −1 ) = π(g)−1 = π(g)∗ . π is irreducible if it has no non-trivial invariant closed subspace. Every group has a trivial representation δ acting on Vδ = C by δ(g) = 1 for all b is defined to g ∈ G and it is clearly irreducible. The unitary dual of G, G be the set of equivalence classes of all irreducible representations of G with respect to unitary conjugation. We will as usual identify each equivalence class with a typical representative element. As G is compact, for all π ∈ b dπ := dim(Vπ ) < ∞ so that each π(g) is a unitary matrix. Furthermore G, b is countable. in this case G b we define co-ordinate functions πij (σ) = π(σ)ij with For each π ∈ G, respect to a some orthonormal basis in Vπ . √ b is a Theorem 2.1 (Peter-Weyl) The set { dπ πij , 1 ≤ i, j ≤ dπ , π ∈ G} 2 complete orthonormal basis for L (G, C). The following consequences of Theorem 2.1 are straightforward to derive using Hilbert space arguments. Corollary 2.1 For f, g ∈ L2 (G, C) • Fourier expansion. f= ∑ dπ tr(fb(π)π), b π∈G where fb(π) := ∫ G f (σ −1 )π(σ)dσ is the Fourier transform of f . 3 • The Plancherel theorem. ||f ||2 = ∑ dπ |||fb(π)|||2 b π∈G where ||| · ||| is the Hilbert-Schmidt norm |||T ||| := tr(T T ∗ ) 2 . 1 • The Parseval identity. ⟨f, g⟩ = ∑ dπ tr(fb(π)b g (π)∗ ). b π∈G If µ ∈ P(G) we define its Fourier transform µ b to be ∫ µ b(π) = π(σ −1 )µ(dσ), G b For example if ϵe is a Dirac mass at e then ϵbe (π) = Iπ for each π ∈{G. 0 if π ̸= δ and m(π) b = . If µ has a density f then µ b = fb as defined in 1 if π = δ b is the dual group Zd Corollary 2.1. If we take G to be the d-torus Td then G and the Fourier transform is precisely the usual characteristic function of the ∫ measure µ defined by µ b(n) = Td e−in·x µ(dx) for n ∈ Zd , where · is the scalar product. Note that any compact connected abelian Lie group is isomorphic to Td . Fourier transforms of measures on groups have been studied by many authors, see e.g. [12, 10, 9, 16] where proofs of the following basic properties can be found. b For all µ, µ1 , µ2 ∈ P(G), π ∈ G, b2 (π)µb1 (π), 1. µ\ 1 ∗ µ2 (π) = µ b determines µ uniquely, 2. µ 3. ||b µ(π)||∞ ≤ 1, where || · ||∞ denotes the operator norm in Vπ . 4. Let (µn , n ∈ N) be a sequence in P(G). µn → µ (weakly) if and only if µ cn (π) → µ b(π). ∫ Remark. Most authors define µ b(π) = G π(σ)µ(dσ). This has the advantage that Property 1 above will then read µ\ b1 (π)µb2 (π) but 1 ∗ µ2 (π) = µ ∗ the disadvantage that if µ has density f then µ b(π) = fb(π) . It is also worth pointing out that the Fourier transform continues to make sense and is a valuable probabilistic tool in the case where G is a general locally compact group (see e.g. [10, 9, 16].) 4 3 Measures With Square-Integrable Densities In this section we examine the case where µ has a square-integrable density. The following result can be found in [1] and so we only sketch the proof here. Theorem 3.1 µ has an L2 -density f if and only if ∑ dπ |||b µ(π)|||2 < ∞. b π∈G In this case f= ∑ dπ tr(b µ(π)π(·)). b π∈G ∑ µ(π)π). Proof. Necessity is straightforward. For sufficiency define g := π∈Gb dπ tr(b Then g ∈ L2 (G, C) and by uniqueness of Fourier coefficients gb(π) = µ b(π). Using Parseval’s identity, Fubini’s theorem and Fourier expansion, we find that for each h ∈ C(G, C): ∫ ∫ ∑ ∗ h(σ)g(σ)dσ = dπ tr(b h(π)b µ(π) ) = h(σ)µ(dσ). G G b π∈G This together with the Riesz representation theorem implies that g is real valued and g(σ)dσ = µ(dσ). The fact that g is non-negative then follows from the Jordan decomposition for signed measures. See [1] for specific examples. We will examine some of these in the next section from the finer point of view of smoothness. To study random walks and Lévy processes in G we need the convolution operator Tµ in L2 (G, C) associated to µ ∈ P(G) by ∫ (Tµ f )(σ) := f (στ )µ(dτ ), G for f ∈ L2 (G, C), σ ∈ G. For example Tµ is the transition operator corresponding to the random walk (µ∗n , n ∈ N). The following properties are fairly easy to establish. • Tµ is a contraction. • Tµ is self-adjoint if and only if µ is symmetric, i.e. µ(A) = µ(A−1 ) for all A ∈ B(G). The next result is established in [2]. 5 Theorem 3.2 The operator Tµ is Hilbert-Schmidt if and only if µ has a square-integrable density. Proof. Sufficiency is obvious by the Hilbert-Schmidt theorem. For necessity, suppose that Tµ Hilbert-Schmidt. Then it has a kernel k ∈ L2 (G × G) and ∫ (Tµ f )(σ) = f (τ )kµ (σ, τ )dτ. G In particular for each A ∈ B(G), ∫ µ(A) = Tµ 1A (e) = kµ (e, τ )dτ. A It follows that µ is absolutely continuous with respect to m with density f = kµ (e, ·). Let (µt , t ≥ 0) be a weakly continuous convolution semigroup in P(G) and write Tt := Tµt . Then (Tt , t ≥ 0) is a strongly continuous contraction semigroup on L2 (G, C) (see e.g. [11, 10, 14, 2].) Corollary 3.1 The linear operator Tt is trace-class for all t > 0 if and only if µt has a square-integrable density for all t > 0. Proof. For each t > 0, if µt has a square-integrable density then Tt = T 2t T 2t is the product of two Hilbert-Schmidt operators and hence is trace class. The converse follows from the fact that every trace-class operator is HilbertSchmidt. If for t > 0, µt has a square-integrable density and is symmetric, then by Theorem 3.2, Tt is a compact self-adjoint operator and so has a discrete spectrum of positive eigenvalues 1 = e−tβ1 > e−tβ2 > · · · > e−tβn → 0 as n → ∞. Furthermore by Corollary 3.1, Tt is trace class and Tr(Tt ) = ∞ ∑ e−tβn < ∞. n=1 Further consequences of these facts including the application to small time asymptotics of densities can be found in [2, 3]. 4 Sugiura Space and Smoothness In this section we will review key results due to Sugiura [18] which we will apply to densities in the next section. In order to do this we need to know about weights on Lie algebras and we will briefly review the necessary theory. 6 4.1 Weights Let g be the Lie algebra of G and exp : g → G be the exponential map. For each finite dimensional unitary representation π of G we obtain a Lie algebra representation dπ by π(exp(tX)) = etdπ(X) for all t ∈ R. Each dπ(X) is a skew-adjoint matrix on Vπ and dπ([X, Y ]) = [dπ(X), dπ(Y )], for all X, Y ∈ g. A maximal torus T in G is a maximal commutative subgroup of G. Its dimension r is called the rank of G. Here are some key facts about maximal tori. • Any σ ∈ G lies on some maximal torus. • Any two maximal tori are conjugate. Let t be the Lie algebra of T. Then it is a maximal abelian subalgebra of g. The matrices {dπ(X), X ∈ t} are mutually commuting and so simultaneously diagonalisable, i.e. there exists a non-singular matrix Q such that Qdπ(X)Q−1 = diag(iλ1 (X), . . . , iλdπ (X)). The distinct linear functionals λj are called the weights of π. Let Ad be the adjoint representation of G on g. We can and will choose an Ad-invariant inner product (·, ·) on g. This induces an inner product on t∗ the algebraic dual of t which we also write as (·, ·). We denote the corresponding norm by | · |. The weights of the adjoint representation acting on g equipped with (·, ·) are called the roots of G. Let P be the set of all roots of G. We choose a convention for positivity of roots as follows. Pick v ∈ t such that P ∩ {η ∈ t∗ ; η(v) = 0} = ∅. Now define P+ = {α ∈ P; α(v) > 0}. We can always find a subset Q ⊂ P+ so that Q forms a basis for t∗ and every α ∈ P is an linear combination of elements of Q with integer coefficients, all of which are either nonnegative or nonpositive. The elements of Q are called fundamental roots. It can be shown that every weight of π is of the form ∑ µ π = λπ − nα α α∈Q where each nα is a non-negative integer and λπ is a weight of π called the highest weight. Indeed if µπ is any other weight of π then |µπ | ≤ |λπ |. The 7 highest weight of a representation is invariant under unitary conjugation of b and the the latter and so there is a one-to-one correspondence between G space of highest weights D of all irreducible representation of G. We can thus b by D and this a key step for Fourier analysis on nonabelian parameterise G compact Lie groups. In fact D can be given a nice geometrical description as the intersection of the weight lattice with the dominant Weyl chamber, but in order to save space we won’t pursue that line of reasoning here. From now on we will use the notation dλ interchangeably with dπ to denote the b has highest weight λ. For a more dimension of the space Vπ where π ∈ G comprehensive discussion of roots and weights, see e.g. [8] and [17]. 4.2 Sugiura Theory The main result of this subsection is Theorem 4.1 which is proved in [18]. Let Mn (C)∪denote the space of all n × n matrices with complex entries and M(G) := λ∈D Md(λ) (C). We define the Sugiura space of rapid decrease to be S(D) := {F : D → M(G)} such that (i) F (λ) ∈ Md(λ) (C) for all λ ∈ D, (ii) lim|λ|→∞ |λ|k |||F (λ)||| = 0 for all k ∈ N. S(D) is a locally convex topological vector space with respect to the seminorms ||F ||s = supλ∈D |λ|s |||F (λ)|||, where s ≥ 0. We also note that C ∞ (G) is a locally convex topological vector space with respect to the seminorms ||f |U = supσ∈G |U f (σ)| where U ∈ U(g), which is the universal embedding algebra of g acting on C ∞ (G) as polynomials in left-invariant vector fields on G, as described by the celebrated Poincaré-Birkhoff-Witt theorem. Theorem 4.1 [Sugiura] There is a topological isomorphism between C ∞ (G) and S(D) which maps each f ∈ C ∞ (G) to its Fourier transform fb. We list three useful facts that we will need in the next section. All can be found in [18]. • Weyl’s dimension formula states that ∏ α∈P (λ + ρ, α) dλ = ∏ + , α∈P+ (ρ, α) ∑ where ρ := 12 α∈P+ is the celebrated “half-sum of positive roots”. From here we can deduce a highly useful inequality. Namely there exists N > 0 such that dλ ≤ N |λ|m (4.1) where m := #P+ = 12 (d − r). 8 • Sugiura’s zeta function is defined by ∑ ζ(s) = λ∈D−{0} 1 |λ|s and it converges if s > r. b and let ∆ ∈ U(g) be the usual • Let (X1 , . . . , Xd ) be a basis for G Laplacian on G so that ∆= d ∑ g ij Xi Xj i,j=1 where (g ij ) is the inverse of the matrix whose (i, j)th component is (Xi , Xj ). We may consider ∆ as a linear operator on L2 (G) with domain C ∞ (G). It is essentially self-adjoint and ∆πij = −κπ πij b where π ̸= δ ⇒ κπ > 0. The numbers for all 1 ≤ i, j ≤ dπ , π ∈ G, b are called the Casimir spectrum and if λπ is the highest (κπ , π ∈ G} b then weight corresponding to π ∈ G κπ = (λπ , λπ + 2ρ). From here we deduce that there exists C > 0 such that |λπ |2 ≤ κπ ≤ C(1 + |λπ |2 ). 4.3 (4.2) Smoothness of Densities We can now establish our main theorem. Theorem 4.2 µ ∈ P(G) has a C ∞ density if and only if µ b ∈ S(D). Proof. Necessity is obvious. For sufficiency its enough to show µ has an L -density. Choose s > r so that Suguira’s zeta function converges. Then using Theorem 3.1 and (4.1) we have ∑ ∑ dλ |||b µλ |||2 ≤ N |λ|m |||b µλ |||2 2 λ∈D−{0} λ∈D−{0} ≤ N sup |λ|m+s |||b µλ |||2 λ∈D−{0} < ∞. 9 ∑ λ∈D−{0} 1 |λ|s We now investigate some classes of examples. We say that µ ∈ P(G) is central if for all σ ∈ G, µ(σAσ −1 ) = µ(A). b there exists By Schur’s lemma µ is central if and only if for each π ∈ G cπ ∈ C such that µ b(π) = cπ Iπ . Clearly m is a central measure. A standard Gaussian measure on G is central where we say that a measure µ on G is a standard Gaussian if it can (B) (B) be realised as µ1 in the convolution semigroup (µt , t ≥ 0) corresponding to Brownian motion on G (i.e. the associated Markov semigroup of operators is generated by 12 σ 2 ∆ where σ > 0.) For a more general notion of Gaussianity see e.g. [10], section 6.2. To verify centrality, take Fourier transforms of the 1 2 b heat equation to obtain µ b(π) = e− 2 σ κπ Iπ for each π ∈ G. Following [3] we introduce a class of central probability measures on G which we call the CIDR (G) class as they are central and are induced by infinitely divisible measures on R. Let ρ be a symmetric infinitely divisible probability measure on R so we have the Lévy-Khintchine formula ∫ eiux ρ(dx) = e−η(u) for all u ∈ R R 1 where η(u) = σ 2 u2 + 2 ∫ R−{0} (1 − cos(u))ν(du), ∫ with σ ≥ 0 and ν a Lévy measure, i.e. R−{0} (1 ∧ u2 )ν(du) < ∞ (see e.g. [15].) We say µ ∈ CIDR (G) if there exists η as above such that 1 2 b µ b(π) = e−η(κπ ) Iπ for each π ∈ G. Examples of such measures are obtained by subordination [15]. So let (γtf , t ≥ 0) be a subordinator with Bernstein function f so that for all u ≥ 0 ∫ ∞ e−us γtf (ds) = e−tf (u) . 0 √ (B) Let (µt , t ≥ 0) be a Brownian convolution semigroup on G (with σ = 2) so b µbt (π) = e−tκπ Iπ . then we obtain a convolution semigroup that for each π ∈ G f of measures (µt , t ≥ 0) in CIGR (G) by ∫ ∞ f f µt (A) = µ(B) s (A)γt (ds) 0 10 for each A ∈ B(G) and we have c µft (π) = e−tf (κ(π)) Iπ . Examples (where we have taken t = 1): • Laplace Distribution f (u) = log(1 + β 2 u), µ b(π) = (1 + β 2 κπ )−1 Iπ . α • Stable-like distribution f (u) = bα u 2 (0 < α < 2), α ακ 2 π µ b(π) = e−b Iπ . We now apply Theorem 4.2 to present some examples of measures in the CIGR class which have smooth densities (and one that doesn’t). Example 1. η general with σ ̸= 0 (i.e. non-vanishing Gaussian part) Using (4.1) and (4.2) we obtain lim |λ|k |||b µ(λ)||| = |λ|→∞ ≤ 1 2 1 lim |λ|k e−ηκπ dλ2 |λ|→∞ lim |λ|k e− σ2 κ 2 λ |λ|→∞ 1 dλ2 ≤ N 2 lim |λ|k+ 2 e− m 1 |λ|→∞ σ2 |λ|2 2 = 0. Example 2. Stable like laws are all C ∞ by a similar argument. Example 3. The Laplace distribution is not C ∞ . But it is L2 if r = 1 (e.g. SO(3), SU (2), Sp(1).) 5 Deconvolution Density Estimation We begin by reviewing the work of Kim and Richards in [13]. Let X, Y and ϵ be G-valued random variables with Y = Xϵ. Here we interpret X as a signal, Y as the observations and ϵ as the noise which is independent of X. If all three random variables have densities, then with an obvious notation we have fY = fX ∗ fϵ . The statistical problem of interest is to estimate fX based on i.i.d. observations Y1 , . . . , Yn of the random variable Y . We b Our key tool is the assume that the matrix fbϵ (π) is invertible for all π ∈ G. 11 ∑ (n) empirical characteristic function fc (π) := n1 ni=1 π(Yi−1 ). We then define Y the non-parametric density estimator (with smoothing parameters Tn → ∞ as n → ∞) for σ ∈ G, n ∈ N: (n) fX (σ) := ∑ (n) dπ tr(π(σ)fc (π)fbϵ (π)−1 ). Y b π <Tn π∈G:κ The noise ϵ is said to be super-smooth of order β > 0 if there exists γ > 0 and a1 , a2 ≥ 0 such that 1 ||fbϵ (π))−1 ||∞ = O(κ−a exp(γκβπ )) and ||fbϵ (π)||∞ = O(κaπ2 exp(−γκβπ )) π as κπ → ∞. For example a standard Gaussian is super-smooth with ai = 0 (i = 1, 2). For p > 0, the Sobolev space Hp (G) := {f ∈ L2 (G); ||f ||p < ∞} ∑ where ||f ||2p = π∈Gb dπ (1 + κπ )p |||fb(π)|||2 . Theorem 5.1 (Kim, Richards) If fϵ super-smooth of order β and ||fX ||Hs (G) ≤ (n) K for some s > d2 where K > 1 then the optimal rate of convergence of fX s to fX is (log(n))− 2β . A natural question to ask is “how smooth is super-smooth?” and we answer this as follows: Proposition 5.1 If f is super-smooth then it is smooth. Proof. For sufficiently large κπ and using (4.1) and (4.2) we find that there exists C > 0 such that |||fb(π)||| ≤ ||fb(π)||∞ |||Iπ ||| 1 = dπ2 ||fb(π)||∞ 1 m ≤ N 2 |λπ | 2 .Cκaπ2 exp(−γκβπ ) m ≤ K|λπ | 2 (1 + |λπ |2 )a2 exp(−γ|λπ |2β ) from which it follows that fb ∈ S(D) and the result follows by Theorem 4.2. . References [1] D.Applebaum, Probability measures on compact groups which have square-integrable densities, Bull. Lond. Math. Sci. 40 1038-44 (2008), Corrigendum 42 948 (2010) 12 [2] D.Applebaum, Some L2 properties of semigroups of measures on Lie groups, Semigroup Forum 79, 217-28 (2009) [3] D.Applebaum, Infinitely divisible central probability measures on compact Lie groups - regularity, semigroups and transition kernels, Annals of Prob. 39, 2474-96 (2011) [4] G.S.Chirikjian, A.B.Kyatkin, Engineering Applications of Noncommutative Harmonic Analysis, CRC Press LLC (2001) [5] P.Diaconis, Group Representations in Probability and Statistics, Lecture Notes-Monograph Series Volume 11, Institute of Mathematical Statistics, Hayward, California (1988) [6] K.D.Elworthy, Geometric aspects of diffusions on manifolds in École d’Été de Probabilitès de Saint-Flour XV-XVII, 1985-87, 277-425, Lecture Notes in Math, 1362, Springer, Berlin (1988) [7] J.Faraut, Analysis on Lie Groups, Cambridge University Press (2008) [8] H.D.Fegan, Introduction to Compact Lie Groups, World Scientific (1991) [9] H.Heyer, L’analyse de Fourier non-commutative et applications à la théorie des probabilités, Ann. Inst. Henri Poincaré (Prob. Stat.) 4(1968), 143-68 [10] H.Heyer, Probability Measures on Locally Compact Groups, SpringerVerlag, Berlin-Heidelberg (1977) [11] G.A.Hunt, Semigroups of measures on Lie groups, Trans. Amer. Math. Soc. 81, 264-93 (1956) [12] Y.Kawada, K. Itô, On the probability distribution on a compact group I, Proc. Phys.-Mat. Soc. Japan 22 (1940), 977-98 [13] P.T.Kim, D.S.Richards, Deconvolution density estimators on compact Lie groups, Contemp. Math. 287, 155-71 (2001) [14] M.Liao, Lévy Processes in Lie Groups, Cambridge University Press (2004). [15] K.-I.Sato, Lévy Processes and Infinite Divisibility, Cambridge University Press (1999) [16] E.Siebert, Fourier analysis and limit theorems for convolution semigroups on a locally compact group, Advances in Math. 39 (1981), 111-54 13 [17] B.Simon, Representations of Finite and Compact Groups, Graduate Studies in Math. Vol 10, Amer. Math. Soc. (1996) [18] M.Sugiura, Fourier series of smooth functions on compact Lie groups, Osaka J.Math. 8, 33-47 (1971) [19] N.Th.Varopoulos, L.Saloff-Coste, T.Coulhon, Analysis and Geometry on Groups, Cambridge University Press (1992) 14