Download Directional Statistics

Directional Statistics Thomas Verdebout ULB Bruxelles, 2013-2014 ULB Directional Statistics Introduction • The theory of errors was first developed by Gauss in relation to the needs of astronomers • At that time, everybody thought it was natural to make the assumption that errors take values in an Euclidean space. • Of course, the actual topological framework of such errors is the surface of the Earth • Directional (or Spherical) Statistics is concerned mainly with observations which are unit vectors in the plane or in the three-dimensional space • The sample space is a circle, a sphere and sometimes an hypersphere ULB Directional Statistics Introduction Directional Statistics can be a tool for practitioners in many different fields: • Astronomy, Earth Sciences: the surface of the Earth is approximately a sphere so that spherical data arise readily in the Earth Sciences and Astronomy. • Meteorology: wind directions constitute natural circular data • Biology: study of the moving of animals. Do the animals tend to take a particular direction or are the directions uniformly distributed? • Also in Physics, Psychology, Medicine, Social Sciences ULB Directional Statistics Outline 1 The uniform distribution on hyperspheres 2 Other distributions on hyperspheres 3 Inference for the location based on the spherical mean 4 Depth 5 Depth on hyperspheres ULB Directional Statistics Books, References • Watson G., Statistics on spheres (The University of Arkansas lecture notes in the mathematical sciences), John Wiley, NY, 1983 • Fisher NI., Statistical Analysis of Circular Data, Cambridge University Press, 1993 • Mardia KV. and Jupp P., Directional Statistics (2nd edition), John Wiley and Sons Ltd., 2000 ULB Directional Statistics The uniform distribution on hyperspheres • One of the most important probability distribution in Multivariate Statistcs is certainly the uniform distribution on S k−1 • The first use of the uniform distribution on S 2 can be traced back to Bernoulli (1734) who won a prize from the French Academy for an essay on orbits of the planets • Latter, Lord Rayleigh (1880) was interested in the intensity of a superposition of a large number of vibrations of the same frequency but with i.i.d unif(0, 2π) phases. ULB Directional Statistics The uniform distribution on hyperspheres • Considering a sample of n i.i.d unif(0, 2π) phases θ1 , . . . , θn , he was interested in the distribution of the norm of the partial sum Sn := X1 + . . . , Xn , where Xi := (cos(θi ), sin(θi ))0 • Translating to “our world”, he used the facts that E(cos(θi )) = E(sin(θi )) = 0, E(cos2 (θi )) = E(sin2 (θi )) = 1/2, E(cos(θi ) sin(θi )) = 0 togheter with the Central Limit Theorem to obtain that (2/n)kSn k2 →d χ22 ULB Directional Statistics The uniform distribution on hyperspheres • Definition. A k-random vector X has a uniform distribution (X ∼ Unif (S k−1 )) iff X = U/kUk with U ∼ Unif(B k ), where B k stands for the unit ball in Rk • The uniform distribution on the hypersphere S k−1 has density funifk (x) := Γ(k/2)/2π k/2 I[x ∈ S k−1 ], where Γ(.) the well-known Gamma function defined by R ∞is z−1 Γ(z) := 0 t e −t dt • For S 1 and S 2 , we recover the well-known funif2 (x) = 1/2π and funif3 (x) = 1/4π ULB Directional Statistics The uniform distribution on hyperspheres • Definition. A k-random vector X is rotationally invariant iff X = OX for all O ∈ SOk := {V, det(V) = 1 and V0 V = Ik } • We have the following characterization of the rotationally invariant distributions: • Proposition. A k-random vector X = (X1 , . . . , Xk )0 is rotationally invariant iff u0 X =d X1 for all u ∈ S k−1 • Proposition. A k-random vector X is rotationally invariant on S k−1 iff X ∼ Unif (S k−1 ) ULB Directional Statistics The uniform distribution on hyperspheres • Let X ∼ Unif(S k−1 ), θ = E[X] and Σ := E[(X − θ )(X − θ )0 ] • Since X ∼ Unif(S k−1 ), X is rotationally symmetric, θ = Oθθ ΣO0 for any rotation O and Σ = OΣ • This implies that θ = 0 and that Σ is proportional to the identity matrix Ik • Furthermore, we have that Σ) = tr(E[XX0 ]) = E[tr(XX0 )] = E[tr(X0 X)] = 1 tr(Σ • This directly entails that Σ = Ik /k ULB Directional Statistics The uniform distribution on hyperspheres • The CLT implies that for a sequence X1 , . . . , Xn of i.i.d unit random vectors Unif(S k−1 ), n −1/2 n X Xi i=1 converges weakly to a centered multivariate Gaussian distribution with covariance matrix Ik /k. • Therefore, n T (n) := k X k Xi k2 n i=1 converges to a central chi-square distribution with k degrees of freedom. We find a multivariate version of the Lord Rayleigh result. ULB Directional Statistics The uniform distribution on hyperspheres • The statistic T (n) can be used to test the null hypothesis of a uniform distribution on the sphere. More precisely, one can reject the null hypothesis of a uniform distribution on the sphere when T (n) exceeds the α-upper quantile of a chi-square distribution with k degrees of freedom • Nevertheless, one can imagine that there exist several alternatives under which the asymptotic test based on T (n) has no power.Typically under non-uniform alternatives with θ=0 ULB Directional Statistics The uniform distribution on hyperspheres • The uniform distribution on S k−1 plays an important role in Multivariate Statistics • In particular, when X ∼ Nk (0, Ik ), X/kXk ∼ Unif(S k−1 ) • In the context of elliptical distributons. A k-random vector X has an elliptical ditrsibution with location θ and scatter matrix Σ if it can we written as X =d θ + d Σ 1/2 U, where U ∼ Unif(S k−1 ), d > 0 is independent of U ULB Directional Statistics Other distributions on hyperspheres • Non-uniform distributions came to the attention of mathematicians and statisticians in the 20th century • The analysis of spherical statistics essentially started with R.A. Fisher (1953). From the mid-1950s,Watson further developed methodologies for spherical (and circular) statistics ULB Directional Statistics Other distributions on hyperspheres • To describe distributions on hyperspheres, two different Statistical/Probabilistic ways are possible • First, one can find distributions which are relatively easy to handle mathematically and which reasonably “fit data at hand” • Another way to find distributions is to define an estimator and ask for which distribution the estimator is always the MLE ULB Directional Statistics Other distributions on hyperspheres • For example, we may ask: for which distributions on R with density ofP the form f (x − θ) is the sample mean x̄ := n−1 ni=1 xi always the MLE for θ? • By differentiation, the problem can be reformulated. Letting ϕf := f 0 /f , an equivalent question is to ask which density f is such that n X ϕf (xi − x̄) = 0 i=1 • Gauss showed that the only density for which the equation just above hold is the gaussian density... ULB Directional Statistics Other distributions on hyperspheres • An analog of the result obtained by Gauss can be discussed on the sphere. Consider densities on the hypersphere S k−1 of the form f (θθ 0 x) for some θ ∈ S k−1 • A natural question is for which distributions on S k−1 is the MLE of θ given by the spherical mean Pn i=1 xi /k Pn i=1 xi k ? • Wrapping up, the objective is to find the density f such that θ̂θ = Pn i=1 xi /k Pn i=1 xi k maximize (in θ ) f (θθ 0 x) ULB Directional Statistics Other distributions on hyperspheres • Consider the MLE θ̂θ f for f (θθ 0 x). Using a Lagragian multiplier λ for this constrained maximization problem (kθθ k = 1), one directly obtains that θ̂θ f must satisfy ( P n 0 θ ) = 2λθ̂ θf f i=1 xi ϕf (xi θ̂ 0 θθ̂ f θ̂θ f = 1 • From the system just above, we deduce Pn θ̂θ f = k 0 θ) i=1 xi ϕf (xi θ̂ 0 θ )k i=1 xi ϕf (xi θ̂ Pn ULB Directional Statistics Other distributions on hyperspheres • Now, consider another density g (θθ 0 x) and the corresponding MLE θ̂θ g . Duerinckx and Ley (2012) recently obtained that the equality Pn 0 θ) i=1 xi ϕf (xi θ̂ Pn k i=1 xi ϕf (x0i θ̂θ )k Pn = θ̂θ f = θ̂θ g = k 0 θ) i=1 xi ϕg (xi θ̂ 0 θ )k i=1 xi ϕg (xi θ̂ Pn holds for n ≥ 3 if and only if ϕf = κϕg for some positive constant κ • In particular, this entails that the function f which corresponds to the spherical mean is such that ϕf = κ. That is f (u) = exp(κu) ULB Directional Statistics Other distributions on hyperspheres • One of the most famous distribution on the sphere is the Fisher-von Mises-Langevin (FVML) distribution • Its density is of the form x 7→ cκ exp[κx0θ ] • In fact, this density had arisen in Langevin’s (1905) statistical mechanism discussion of magnetism. Von Mises (1918) suggested it in a problem related with atomic weights. Then, it was clearly introduced by the seminal paper of Fisher (1953)...FVML is fine ULB Directional Statistics Other distributions on hyperspheres • The parameter κ > 0 is called the concentration parameter of the FVML distribution • When κ is big, many observations are expected in the vicinity of the modal direction θ • On the contrary, when κ is closed to zero, the data is less concentrated around θ • When κ tends to zero, we get closer to the uniform case ULB Directional Statistics Other distributions on hyperspheres • Another useful density is given by x 7→ cκ exp[κ cos−1 (x0θ )] • It was first introduced by Purkayastha (1991) who showed that it is characterized by the property that the MLE of θ is the so-called sample median direction ULB Directional Statistics Other distributions on hyperspheres • There exist many other distributions. Sometimes the observations are not directions but rather axes. That is, the vectors x and −x are undistinguishable so that it is +x or −x which is observed. In this context, it is natural to consider probability density functions which are antipodally symmetric in the sense that f (x) = f (−x) • A typical example of antipodally symmetric distributions is the family of Watson distributions x 7→ cκ exp[κ(x0θ )2 ] ULB Directional Statistics Other distributions on hyperspheres • A very important property of the FVML distributions is that they are rotationally symmetric about their modal directions θ • Saw (1978) has abstracted this property by considering general distributions with densities of the form f (θθ 0 x) • The rotationally symmetric distributions enjoy many attractive mathematical properties ULB Directional Statistics Inference for the location based on the spherical mean • Consider first the tangent-normal decomposition X = (X0θ )θθ + (Ik − θθ 0 )X = (X0θ )θθ + k(Ik − θθ 0 )XkSθ (X), where Sθ (X) := (Ik − θθ 0 )X/k(Ik − θθ 0 )Xk ULB Directional Statistics Inference for the location based on the spherical mean • Under any rotationally symetric distribution, Sθ (X) := (Ik − θθ 0 )X/k(Ik − θθ 0 )Xk is uniformly distributed on Sθk−2 := {v, kvk = 1, v0θ = 0} ⊥ • If X has a density f (θθ 0 x), the density of X0θ is given by t 7→ c f (t)(1 − t 2 )(k−3)/2 • Lemma. Let U = (U1 , . . . , Uk )0 ∼ Unif (S k−1 ). Then the density of U1 is given by u 7→ Γ(k/2) 2 k−3 (1 − u ) 2 π k/2 Γ((k − 1)/2) ULB Directional Statistics Inference for the location based on the spherical mean • Proof. First, we give a definition of the multivariate Dirichlet distribution (or multivariate beta distribution) • Definition. We say that X has a Dirichlet distribution Dk (p; pnk+1 ), p = (p1 , . . . , pk ) (on o P T k := x ∈ Rk , xi > 0, ki=1 xi < 1 ) iff X =d Z , T where Z1 , . . . , Zk+1 are independent and such that P Zi ∼ Gamma(pi ), Z := (Z1 , . . . , Zk )0 and T := k+1 i=1 Zi ULB Directional Statistics Inference for the location based on the spherical mean • Recall that a random variable Z has a gamma distribution Z ∼ Gamma(p) with parameter p iff fZ (z) = (Γ(p))−1 z p−1 e −z I[z > 0] • To obtain the density of X = (X1 , . . . , Xk )0 ∼ Dk (p; pk+1 ), we first consider the joint density of Z1 , . . . , Zk+1 which in view of the independence is given by fZ1 ,...,Zk+1 (z1 , . . . , zk+1 ) = k+1 Y i=1 !−1 Γ(pi ) k+1 Y i=1 zi > 0 ∀i ULB Directional Statistics ! zipi −1 exp[− k+1 X i=1 zi ], Inference for the location based on the spherical mean • Then, we can obtain the joint distribution of (X1 , . . . , Xk ) and T = Pk+1 i=1 Zi • Use simply the transformation (z1 , . . . , zk+1 ) onto (x1 , . . . , xk , t), where zi = txi , i = 1, . . . , k and zk+1 = t(1 − k X i=1 ULB Directional Statistics xi ) Inference for the location based on the spherical mean • Computing the Jacobian of this transformation, we obtain (the absolute value of)  t 0 x1 ..  ..  . . det   0 t xn P −t . . . −t 1 − ki=1 xi        = det     0 x1 ..  .. . .   0 t xn  0 ... 0 1 t = tk • We obtain that the join pdf of (X1 , . . . , Xk ) and T = Pk+1 k+1 Y i=1 i=1 Zi which is given by !−1 Γ(pi ) k Y ! xipi −1 i=1 1− n X !pk+1 −1 xi i=1 ULB Directional Statistics t Pk+1 i=1 pi −1 −t e Inference for the location based on the spherical mean • Finally, integrating with respect to t, we obtain P Γ( k+1 pi ) fX1 ,...,Xk (x1 , . . . , xk ) = Qk+1i=1 i=1 Γ(pi ) k Y ! xipi −1 i=1 1− n X !pk+1 −1 xi i=1 • Note that the Dirichlet distribution Dk (1; 1) is the uniform distribution on T k • Furthermore, we have that when X = (X1 , . . . , Xk )0 ∼ Dk (p; pk+1 ), then, the marginals are also Xil )0 ∼ Dl (pi ; q) with Dirichlet. More precisely, (Xi1 , . . . , P P l pi = (pi1 , . . . , pil )0 and k+1 i=1 pi − j=1 pil = q ULB Directional Statistics Inference for the location based on the spherical mean • Now, we get back to the objective which is to show that he density of U1 is given by u 7→ Γ(k/2) 2 k−3 (1 − u ) 2 π k/2 Γ((k − 1)/2) • The univariate Gamma distribution si such that y =d χ2m iff y =d 2z where z ∼ G (m/2). 2 /2 • Consider X1 , . . . , Xk+1 i.i.d. N (0, 1). Then, X12 /2, . . . , Xk+1 are i.i.d. G (1/2). Pk+1 As2 a direct2consequence, Pk+1 2 the vector 2 Z = ((X1 / i=1 Xi ), . . . , Xk / i=1 Xi )) is Dk ( 21 1, 12 ). ULB Directional Statistics Inference for the location based on the spherical mean • But Z is a k-dimensional “sub-vector” of P Pk+1 2 2 2 ((X12 / k+1 i=1 Xi ), . . . , Xk+1 / i=1 Xi )) which has the same 2 2 2 )0 with distribution as U := (U1 , . . . , Uk+1 0 k U = (U1 , . . . , Uk+1 ) ∼ Unif(S ) • This directly entails, since the sub-vector of a Dirichlet vector is Dirichlet vector, that any “sub-vector” of “the square of a uniformly distributed on the unit sphere” is Dirichlet • In particular, the square of the first marginal of U = (U1 , . . . , Uk )0 ∼ Unif(S k−1 ) is D1 ( 12 , 12 (k − 1)) • Its density is given by x 7→ = Γ(k/2) x −1/2 (1 − x)(k−3)/2 (Γ(1/2))k Γ((k − 1)/2) Γ(k/2) x −1/2 (1 − x)(k−3)/2 k/2 π Γ((k − 1)/2) ULB Directional Statistics Inference for the location based on the spherical mean • Finally, the result is obtained by considering the transformations u 7→ √ √ u and u 7→ − u. • Remember that for such “many-to-one” transformation, δxR L “fY (y ) = f (xL )| δx δy | + f (xR )| δy |” ULB Directional Statistics Inference for the location based on the spherical mean • Then, using the fact that for U ∼ Unif(S k−1 ), z0 U =d U1 for any z ∈ S k−1 , we have that (let ωk := 2π k/2 /Γ(k/2)) be the “area” of S k−1 ) Z S k−1 ωk−1 f (x0θ ) dx = E[f (U0θ )] = E[f (U1 )], ULB Directional Statistics Inference for the location based on the spherical mean • Using the density of U1 , we obtain that Z 1 E[f (U1 )] = Z−1 k−3 Γ(k/2) (1 − u 2 ) 2 f (u) du − 1)/2) π k/2 Γ((k = S k−1 ωk−1 f (x0θ ) dx ULB Directional Statistics Inference for the location based on the spherical mean • Wrapping up, we have the following results in the rotationally symmetric case • Under any rotationally symetric distribution, Sθ (X) := (Ik − θθ 0 )X/k(Ik − θθ 0 )Xk is uniformly distributed on Sθk−2 := {v, kvk = 1, v0θ = 0} ⊥ • If X has a density f (θθ 0 x), the density of X0θ is given by t 7→ c f (t)(1 − t 2 )(k−3)/2 • The sign Sθ (X) and X0θ are independent ULB Directional Statistics Inference for the location based on the spherical mean • Consider the tangent-normal decomposition X = (X0θ )θθ + (Ik − θθ 0 )X = (X0θ )θθ + k(Ik − θθ 0 )XkSθ (X), • Using results obtained just before, we have that (in the rotationally symmetric case) E[X] = E[(X0θ )]θθ ULB Directional Statistics Inference for the location based on the spherical mean • Then, the CLT implies that n1/2 (X̄ − E[(X0θ )]θθ ) converges weakly to a Gaussian distribution with mean 0 and covariance matrix Vf given by Vf := E[XX0 ] − E[X](E[X])0 = E[(X0θ )2 ]θθθ 0 − E2 [(X0θ )]θθθ 0 + E[1 − (X0θ )2 ] = θθ 0 Var(X0θ ) + E[1 − (X0θ )2 ] (Ik − θθ 0 ) (k − 1) (Ik − θθ 0 ) (k − 1) • The asymptotic distribution of the spherical mean X̄/kX̄k can be obtained using directly the delta method ULB Directional Statistics Inference for the location based on the spherical mean • We have that n1/2 (X̄/kX̄k) − θ ) converges weakly to a Gaussian distribution with mean 0 and covariance matrix cf (Ik − θθ 0 ) • The constant cf = E[1 − (X0θ )2 ]/(k − 1)E2 [X0θ ] can be estimated consistantly ULB Directional Statistics Inference for the location based on the spherical mean • One can use this result to construct asymptotic tests for H0 : θ = θ 0 • More precisely, it directly follows that under the null, T n (θθ 0 ) := n (X̄/kX̄k) − θ 0 )0 (Ik − θ 0θ 00 )(X̄/kX̄k) − θ 0 ) ĉf is asymptotically chi-square with k − 1 degrees of freedom • One can use T n (θθ ) to construct confidence bands for θ ULB Directional Statistics Inference for the location based on the spherical mean • Using the previously presented result, we can also perform ANOVA... ULB Directional Statistics Depth • Depth functions represent a recently emerging powerful methodology in nonparametric multivariate inference. They provide multivariate notions of order statistics and generate quantile contours, outlyingness functions, and sign and rank functions. • Univariate nonparametric analysis relies heavily on signs and ranks, order statistics, quantiles,etc • In Rk , there is no natural order and therefore no straightforward extension of the above concepts. ULB Directional Statistics Depth • For example, whereas the median of a univariate data set represents a notion of center, the k-vector of coordinatewise medians can lie outside the convex hull of the data. • Depth functions constructively solve this problem by introducing a notion of center as the maximal-depth point and providing a center-outward ordering of points x in Rk • Many interesting approaches toward construction of suitable depth functions have been put forth, beginning with the seminal paper of Tukey (1975) ULB Directional Statistics Depth • Let F be a cdf and x ∈ Rk • The halfspace depth (Tukey 1975): for x ∈ Rk , DH (x, F ) = inf{F (H) : x ∈ H closed halfspace}, the minimal probability attached to any closed halfspace with x on the boundary. • In particular, the sample halfspace depth of x is the minimum fraction of data points in any closed halfspace containing x ULB Directional Statistics Depth • Simplicial Depth: the wide potential scope of depth functions became clear with the introduction of an important second one, the simplicial depth (Liu 1988): for x ∈ Rk , DS (x, F ) = P(x ∈ S[X1 , ..., Xk+1 ]), where X1 , ..., Xk+1 represent independent observations from F and S[X1 , ..., Xk+1 ] denotes the simplex in Rk with vertices X1 , ..., Xk+1 . • For a data set in R2 , the sample simplicial depth of a point x is obtained by considering all triangles formed with three data points as vertices and taking the fraction of them that cover x. ULB Directional Statistics Depth • Some basic properties are desired of any depth function. For example, affine invariance requires that a depth function D(x, F ) be independent of the coordinate system. • When F is symmetric about θ in some sense, D(x, F ) should also be symmetric about θ as well as maximal at this point. • Also desirable is that D(x, F ) decrease along each ray outward from the deepest point. ULB Directional Statistics Depth • Not only the underlying pointwise functions D(x, F ), but also the associated contours, or equivalence classes of points of equal depth, play special roles. • Linked with the contours are useful central regions {x, D(x, F ) ≥ α}, α > 0 • Thus, for example, the univariate boxplot may be extended for F on Rk by using central regions to describe a middle half or middle 75% of the population. ULB Directional Statistics Depth • A quantile function and a rank function can be associated with any depth function as discussed above, but also a quantile function and a rank function. • For D(x, F ) possessing nested contours enclosing the median θ Med and bounding central regions {x, D(x, F ) ≥ α}, α > 0, the depth contours induce a quantile representation ULB Directional Statistics Depth • For the median (the point with maximal depth) let it be Q(0, F ). • For x 6= θ Med denote it by Q(u, F ) with u = pv where p is the probability weight of the central region with x on its boundary and v is the unit vector joining x and θ Med • In this case, u = R(x, F ) indicates direction toward x = Q(u, F ) from θ Med ULB Directional Statistics Depth on hyperspheres • Regina Liu and Kesar Singh (Annals of Statistics, 1992) introduced concepts of data depth on circles and spheres • Three different concepts: angular simplicial depth, angular Tukey’s depth and arc distance depth • Three medians are derived from these depth concepts • The concept of depth on spheres leads to a proper notion of “center” and “center-outward ranking” of directional data ULB Directional Statistics Depth on hyperspheres • The ranking induced by a notion of depth can be used for example in Classification problems • Suppose that you have two training samples (X1 , . . . , Xm ) and (Y1 , . . . , Yn ) from two different populations on the sphere. The problem is to classify a new vector Z in one of those two populations. • The proposed rule is to classify Z in X if rX /m < rY /m where rX and rY denote respectively the center-outward ranks of Z among the Xi ’s and the Yi ’s. ULB Directional Statistics Depth on hyperspheres • In general, a depth function gives of a point x ∈ Rk is a measure of “how central” the point x is relative to a probability measure. • In the general Multivariate case, different concepts of depth are well-known. They have different properties; in general, a “nice” depth function is (i) monotone relative to any deepest points, (ii) vanishing at ∞, (iii) maximal at center, etc • In this couse, I recall the notions of Tukey half space depth and simplicial depth ULB Directional Statistics Depth on hyperspheres • We start with the angular simplicial depth. Angular simplicial depth is an analog for directional data of the simplicial depth for data on Euclidean spaces • In Rk , a simplex S(x1 , . . . , xk+1 ) with k + 1 vertices is defined by the closest convex hull with extremities at these points • Let F be a cdf and x ∈ Rk . The simplicial depth of x with respect to F is then defined to be the probability that x belongs to a simplex S(X1 , . . . , Xk+1 ) where X1 , . . . , Xk+1 are i.i.d. F . ULB Directional Statistics Depth on hyperspheres • The edges of a simplex in Rk are the line segments connecting vertices • Moving to spheres, the idea is to replace “line segments” by “shortest curve” joining a pair of points on the sphere • The shortest curve joining a pair of points x1 , x2 on the sphere is the short arc joining the points x1 and x2 on the circle which passes through x1 and x2 and which has the same center as the sphere (a great circle) ULB Directional Statistics Depth on hyperspheres • Definition. Let x ∈ S 1 . The angular simplicial depth of x with respect to a cdf F is defined by ASD(x) := P[x ∈ arc(X1 , X2 )], where X1 and X2 are i.i.d. F . • Definition. Let x ∈ S 2 . The angular simplicial depth of x with respect to a cdf F is defined by ASD(x) := P[x ∈ ∆(X1 , X2 , X3 )], where ∆(x1 , x2 , x3 ) stands or the spherical triangle bounded by the short arcs arc(x1 , x2 ), arc(x1 , x3 ) and arc(x2 , x3 ) and where X1 , X2 and X3 are i.i.d. F . ULB Directional Statistics Depth on hyperspheres • The generalization to the hypersphere case is evident • A maximum point for ASD is called an angular spherical median • Let F be a cdf on the unit circle S 1 and f be the corresponding density • Proposition (Monotonicity of ASD). Suppose that f is symmetric about θ ∈ S 1 and decreases monotonically on both sides of θ 0 until the opposite point −θθ . Then, ASD is also monotonically nonincreasing in both directions from θ to −θθ . ULB Directional Statistics Depth on hyperspheres • Definition. Let F be the cdf of a random vector X taking values in S k−1 . F is antipodally symmetric if X =d −X. If F has a continuous density f , then f (x) = f (−x). • ASD provides a characterization of the antipodally symmetric distributions • Proposition. Assume that f is continuous. Then ASD(x) = c for some constant c if and only if f (x) = f (−x). Moreover, 1 2 On S 1 , the constant c must be 1/4 On S 2 , the constant c must be 1/8 ULB Directional Statistics Depth on hyperspheres • Proof of 1 (circular case). First, note that It can be shown that ASD a admits a “differential formula”. Let θ ∈ [0, 2π[, then (using the abuse of notation ASD(θ)) d ASD(θ) = 2(Aθ − Cθ )f (θ), dθ where Aθ and Cθ stand for the probabilities of the semicircles joining θ and −θ in the counterclockwise and clockwise directions respectively. This comes from the fact that for some perturbation δθ, we have Z θ+δθ ASD(θ + δθ) − ASD(θ) = 2(Aθ − Cθ ) f (u)du + o(δθ) θ ULB Directional Statistics Depth on hyperspheres • To obtain the formula just above, one can use the equality P(E1 ) − P(E2 ) = P(E1 − E2 ) − P(E2 − E1 ), we have that ASD(θ + δθ) − ASD(θ) = P(A) − P(B), where A := {θ ∈ / arc(X1 , X2 ), θ + δθ ∈ arc(X1 , X2 )} and B := {θ ∈ arc(X1 , X2 ), θ + δθ ∈ / arc(X1 , X2 )} ULB Directional Statistics Depth on hyperspheres • Then, to prove →, assume that ASD(θ) = c for a positive constant c and for all θ ∈ [0, 2π[. Then, the differential formula directly entails that Aθ = Cθ for all θ and therefore, f (θ) = f (−θ) for all θ • Now, for ←, since f (θ) = f (−θ) for all θ, we have that Aθ = Cθ and therefore ASD(θ) = c for a positive constant c for all θ. Now, to show that c = 1/4, we show that ASD(0) = 1/4. Antipodal symmetry (f (θ) = f (−θ) for all θ) implies that Z π 1 ASD(0) = 2 − F (a) f (a)da 2 0 ULB Directional Statistics Depth on hyperspheres • The result follows easily using the fact that R π 0 F (a)f (a)da = 1/8. • The latter result has immediate statistical implications. • To test whether the underlying distribution has an antipodal symmetric distribution or not, one can compare an “empirical version” of ASD with 1/4. • More precisely, such an empirical ASD is given by (n) ASD (θ) := n 2 −1 X I[θ ∈ arc(Xi1 , Xi2 )], ∗ P where ∗ stands for the sum over all the possible pairs (Xi1 , Xi2 ). ULB Directional Statistics Depth on hyperspheres • Then a test can be build for example on (S 1 ) Tn := sup |ASD(n) (θ) − 1/4| θ • Large values of Tn • Needless to say, a reasonable test has to be build using the fixed-n or the large sample distribution of Tn . Unfortunately, we do not have any. So one can boostrap or doing something else or cry • Of course, the equivalent test on the sphere is given by (S 2 ) Tn := sup |ASD(n) (θ) − 1/8|, θ where ASD(n) (θ) is here constructed using spherical triangles ULB Directional Statistics Depth on hyperspheres • There exists a link between the “traditional” notion of simplicial depth and ASD. • Let F be a cdf and θ some fixed “point” on the unit circle. • There is a natural “length-preserving” mapping gθ from [−π, π] to the tangent line Lθ . • For a “point” φ 6= −θ, |gθ (φ)| is the length of arc(θ, φ) and the sign of gθ (φ) is − or + depends on whether the direction in going from θ to φ is counterclockwise or clockwise ULB Directional Statistics Depth on hyperspheres • Then, let Fθ denote the cdf of the resulting distribution of the tangent line Lθ • The ASD and the simplicial depth (SDθ ) on the tangent line Lθ with respect to Fθ are linked by • Proposition. Let F be a continuous cdf on S 1 and θ ∈ [0, 2π[. Then, ASD(θ) + ASD(−θ) = SDθ (0). ULB Directional Statistics Depth on hyperspheres • Proof. The following events are clearly equivalent (except for a null set) E1 := {0 ∈ segment(g (φ1 ), g (φ2 )} E2 := {φ1 and φ2 are on two different sides of line(φ1 , φ2 )} (1) (2) E3 := E3 ∪ E3 := {θ ∈ arc(φ1 , φ2 )} ∪ {−θ ∈ arc(φ1 , φ2 )} (1) (2) The result follows from the fact that E3 ∩ E3 probability 0; has SDθ (0) := P(E1 ) = P(E3 ) = ASD(θ) + ASD(−θ) ULB Directional Statistics Depth on hyperspheres • A similar result exists on S 2 • We now turn to the angular Tukey depth • Definition. The angular Tukey depth (ATD) for a given distribution F on the hypersphere is given by ATDF (x) := inf PF (S), S:x∈S where the infimum is taken over the set of all closed hemispheres S containing x • A maximum point is an angular Tukey’s median ULB Directional Statistics Depth on hyperspheres • On the circle as well as on the sphere, ATD( ) is bounded above by 1/2. The value 1/2 is achieved at a point θ on a sphere if and only if each hemisphere containing θ has probability greater than or equal to 1/2. • For a discussion on the robustness aspects of the medians associated with those depth concepts, see Liu and Singh (Annals of Statistics, 1992) ULB Directional Statistics

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Directional Statistics