Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evolutionary Inference for Function-valued Traits: Gaussian Process Regression on Phylogenies Nick S. Jones1 and John Moriarty2 1 2 Department of Mathematics, Imperial College London, SW7 2AZ School of Mathematics, University of Manchester, Oxford Road, Manchester M13 9PL, UK Biological data objects often have both of the following features: (i) they are functions rather than single numbers or vectors, and (ii) they are correlated due to phylogenetic relationships. In this paper we give a flexible statistical model for such data, by combining assumptions from phylogenetics with Gaussian processes. We describe its use as a nonparametric Bayesian prior distribution, both for prediction (placing posterior distributions on ancestral functions) and model selection (comparing rates of evolution across a phylogeny, or identifying the most likely phylogenies consistent with the observed data). Our work is integrative, extending the popular phylogenetic Brownian Motion and Ornstein-Uhlenbeck models to functional data and Bayesian inference, and extending Gaussian Process regression to phylogenies. We provide a brief illustration of the application of our method. I. INTRODUCTION In this paper we consider statistical inference for function-valued data which are correlated due to phylogenetic relationships. A schematic example is given in Figure 1A: in this case, given functional data observed at the tips of a phylogeny, the task is to perform inference on the (unobserved) functional data at the root of the phylogeny. Alternatively, if the phylogeny is uncertain we may wish to perform phylogenetic inference, or our interest may be inferring the dynamics of the evolutionary process which produced the data. The term ‘function-valued’ is meant in the sense of [1], where a datum is a continuous function f (x) of a variable x, such as time or temperature: examples are therefore curves for ambient temperature versus growth rate for caterpillars, a heart rhythm time series [2], or a spectrogram of audio data. Our approach is to combine the theory of Gaussian processes with assumptions from phylogenetics, to obtain a flexible nonparametric model for such data. Since this model e↵ectively specifies the evolutionary dynamics of the data through the phylogeny, we note that (i) our approach generalises the Brownian Motion and Ornstein-Uhlenbeck models of continuous-time character evolution from quantitative genetics [3], and (ii) the phylogenetic tree will play the role of evolutionary time: to avoid confusion, we therefore refer to the indexing variable x above as ‘space’. The model may be used as a prior for Bayesian inference, which opens up phylogenetically aware approaches to the prediction of ancestral functions and to model selection. Because of their generality, flexibility and mathematical simplicity, there has been substantial recent interest in the use of Gaussian process priors for Bayesian nonparametric regression [4– 6]. This paper may be viewed as an extension of Gaussian process regression to take account of functional data and a tree topology. Our work relates to the field of spatial statistics, in that it involves multidimensional index sets with both distance and topology: however, it is the tree topology and the conditional independence of siblings given their common ancestors that makes our approach particularly suited to the study of phylogenetically correlated data. Functional representations of data have been in use for at least 30 years [1, 7]. Their use in (non-phylogenetic) evolutionary studies was proposed for quantitative genetics in [8]. While the wider debate concerning classical versus functional data types is outside the scope of the current report, in [1] it is argued that statistical challenges for classical data have corresponding dual statistical challenges for functional data, so we may ask: in the application of our models to statistical inference, which classical approaches are dual to our functional approach? Classical data 2 types investigated in the phylogenetic context include sequences of discrete symbols (e.g. encoding the presence or absence of certain characters [10], or alternatively in models for the evolution of genetic sequences), the spatial locations of a number of fixed landmark points in geometric morphometrics [11], and multivariate vectors of continuous characters or summary statistics [12]. Within the evolutionary study of multivariate vectors, the relative e↵ects of phylogenetic and (species-)specific variation have been studied [13], which relates closely to our equation (14) below, and the challenges of ancestor prediction and model selection have been addressed and explored statistically [14–16]. We address model selection in the supplement to this report and more fully in a companion paper [17], which also contains a discussion of statistical issues and performance. Beyond this duality, however, our model is suitable for use as a prior in Bayesian analysis, and in the remainder of this paper we derive our models and provide details for their use in Bayesian regression. II. PHYLOGENETIC GAUSSIAN PROCESSES A. Gaussian Processes and Regression A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. Examples are the Wiener (Brownian Motion) and Ornstein-Uhlenbeck processes which have received considerable recent attention in the study of evolution (see, for example, [12, 14, 15, 18]). Gaussian Processes are characterised by their first two (cross-)moments (although, unless otherwise stated, the mean will be assumed to be zero - a common assumption for Gaussian process priors, see [4]). The covariance function , which specifies how the sample values (co-)vary, typically depends on a vector ✓ of parameters. Suppose that a Gaussian process f is observed at a vector of co-ordinates L. Then the resulting vector of sample values f (L) has a multivariate Gaussian distribution of dimension equal to |L|, the number of points of measurement: f (L) ⇠ N (0, (L, L, ✓)). Here N represents the Gaussian distribution, its two arguments being the mean vector and the covariance matrix (L, L, ✓), which is the matrix of the covariances between all pairs (`i , `j ) of observation co-ordinates in L (where `i 2 L) so that [ (L, L, ✓)]ij = (li , lj , ✓) = E[f (li )f (lj )|✓]. (1) In subsection II C below we derive the structure of some particular covariance functions, including the phylogenetic Ornstein-Uhlenbeck process. The log-likelihood of the sample f (L) is then [4] log p(f (L)|✓) = 1 f (L)T (L, L, ✓) 2 1 f (L) 1 log(det( (L, L, ✓))) 2 |L| log 2⇡. 2 (2) We might be interested in making inferences about the unobserved values of our random function f at a vector M of co-ordinates, given samples at the co-ordinates L. The posterior distribution of the vector f (M ) given f (L) is also Gaussian and of the form [4]: f (M )|f (L) ⇠ N (A, B) (3) where A = B = (M, L, ✓) (L, L, ✓) (M, M, ✓) 1 f (L), (M, L, ✓) (L, L, ✓) (4) 1 (M, L, ✓) T (5) 3 and (M, L, ✓) denotes the |M | ⇥ |L| matrix of the covariance function evaluated at all pairs mi 2 M, lj 2 L. From Eq. (4) the posterior mean vector A consists of linear combinations of the observations while the posterior covariance matrix B, given by (5), is independent of the observations. Gaussian process regression is nonparametric in the sense that no assumption is made about the structure of the model: the more data gathered, the longer the vector f (L), and the more intricate the posterior model for f (M ). We are able to combine evolutionary dynamics with functional data because the index variable ` introduced above can be a point in a space of arbitrary dimension. Therefore, if we wish to model the time evolution of a function-valued trait f (x) with indexing variable x, we could consider each point of observation ` as corresponding to a point (x, t) in both space and evolutionary time. Then f (L) would represent the values of a random space-time surface at various space-time co-ordinates, L (analogously, the values f (L) are like a set of altimeter recordings recorded at di↵erent locations (L) on a map). A cross-section through this space-time surface at a fixed value of time t yields a random curve which can be viewed as a single function-valued trait at the fixed time t in its evolution. In the next section we will extend this view to processes indexed by phylogenies. B. Phylogenetic covariance function Our aim in this subsection is to build a Gaussian process model for the evolution of a functionvalued trait along a phylogenetic tree T by allowing T to play the role of evolutionary time, rather than the linear time variable t used above. Suppose, therefore, that each observation ` corresponds to a point (x, t) in the space-phylogeny S ⇥ T: that is, x 2 S is the value under consideration of the spatial (indexing) variable, and t 2 T is the point under consideration on the phylogeny (t is not just a time co-ordinate but also indicates a branch of the phylogeny). We will do this by constructing a covariance function ⌃T (li , lj ) when the li , lj are points in S ⇥ T, calling it the phylogenetic covariance function. In order to obtain a unique phylogenetic covariance function ⌃T we will make two assumptions which are natural in the context of evolution (see, for example, [12]): Assumption II.1. Conditional on their common ancestors in the phylogenetic tree T, any two function-valued traits are statistically independent. Assumption II.2. The statistical relationship between a function-valued trait and any of its descendants in T is independent of the topology of T. Assumption II.2 means that our statistical model of the evolutionary process is identical along paths through T from the root to any tip, and we call this the marginal process (this assumption could be generalised, for example to model unequal rates of evolution along di↵erent branches of T.) As in [12] and related work, we need the assumption that T is a chronogram, having both a tree topology and a distance metric. We will call the distance between a point t 2 T and the root of T the ‘depth’ of t, and denote it by the plain typeface symbol t. The marginal process is of course Gaussian, and so it is sufficient to specify its covariance function ⌃ on S ⇥ T , where T is the set of all dates in T. In fact, each di↵erent marginal covariance function specifies a di↵erent phylogenetic covariance function, and so a wide class of phylogenetic covariance functions may be constructed. In turn, these models o↵er an equally wide choice of priors for Bayesian evolutionary inference with function-valued traits. In the following Proposition we present our mathematical results in the simplest case when the marginal covariance function is space-time separable: that is, when there exist a space-only covariance function K(x1 , x2 ) and a time-only covariance function k(t1 , t2 ) such that ⌃((x1 , t1 ), (x2 , t2 )) = K(x1 , x2 )k(t1 , t2 ). (6) 4 Proposition II.1. 1. If the marginal covariance function ⌃ is space-time separable then the phylogenetic covariance function ⌃T is also space-time separable, i.e. ⌃T ((x1 , t1 ), (x2 , t2 )) = K(x1 , x2 )kT (t1 , t2 ) (7) where kT (t1 , t2 ) is the phylogenetic covariance function constructed from the time-dependent component k in (6) and K(x1 , x2 ) is the space-dependent component. 2. When the time-dependent component k of (6) specifies a process that is Markovian in time, we have the simple expression kT (t1 , t2 ) = k(t1 , t12 )k(t12 , t12 ) 1 k(t2 , t12 ) (8) where t12 is the most recent common ancestor of t1 and t2 (and t12 is its depth in T). In particular, we have the following corollaries: (a) if k(t1 , t12 ) = min(t1 , t12 ) so that we have a Wiener process in evolutionary time as in [12], then kT (t1 , t2 ) = t12 , which is the variance of the evolutionary time component of variation evaluated at the most recent common ancestor t12 . (b) if k is isotropic so that k(t1 , t2 ) is a function of |t1 t2 | only, it does not necessarily follow that kT (t1 , t2 ) is isotropic (meaning a function of the patristic distance between t1 and t2 only). In fact, kT is only isotropic when k is the Ornstein-Uhlenbeck covariance. 3. Let Y be a phylogenetic Gaussian process with a space-time separable covariance function ⌃T which factorises as in (7). If K is a continuous degenerate Mercer kernel then there exist an integer n and deterministic functions i : S ! R and univariate Gaussian processes Xi , for i = 1 . . . n, such that the Gaussian process given by f (x, t) = n X i (x)Xi (t) (9) i=1 has the same distribution as Y . Further mathematical detail can be found in the supplement. Part 2b of Proposition II.1 helps clarify the relationship between phylogenetic Gaussian process models and studies such as [19] based on autocorrelation functions of patristic distance, namely that the two are compatible only when the phylogenetic Ornstein-Uhlenbeck process is assumed. Part 3 establishes that a convenient expansion using basis functions can be used for a wide range of phylogenetic Gaussian processes with space-time separable covariance functions. This expansion is useful for statistical inference, as it justifies the use of dimension reduction techniques (see companion paper [17]). C. Examples We now illustrate two ways in which Proposition II.1 may be used to make priors for Bayesian inference on phylogenetically related function-valued traits (henceforth referred to simply as ‘traits’). They di↵er in their approach to modelling spatial variation, and may be called the spatially inhomogeneous (figure 1A) and spatially homogeneous (1B) models respectively. In both cases we 5 FIG. 1: Schematic illustration for Bayesian inference on function-valued traits related by a phylogeny, giving a posterior distribution for the function at the root. The red curve at the root indicates the predicted mean surface given only the data from the tips, the blue curves represent one standard deviation uncertainties at each point. The three curves at the top right of each panel are notional samples from the Gaussian process at the root. Panel A: spatially inhomogeneous modelling, where the prior is fitted to observed data (see [17] for detail). Panel B: spatially homogeneous modelling (see the supplement for detail). assume the simplest possible structure for the marginal covariance function ⌃: that it is space-time separable as in (6). We also assume that, conditional on any given trait, its ancestor and progenitor traits are statistically independent: this corresponds to choosing a temporal component which is Markovian. The only Markovian Gaussian processes are the class of Ornstein-Uhlenbeck processes, and the stationary examples have the covariance function k(t1 , t2 ) = exp ( |t1 t2 |/✓2 ) where the hyperparameter ✓2 specifies the characteristic length scale for the evolutionary dynamics. From Proposition II.1 we obtain kT (t1 , t2 ) = exp ( dT (t1 , t2 )/✓2 ) (10) where dT (t1 , t2 ) denotes the patristic distance between t1 and t2 (note that the phylogenetic Ornstein-Uhlenbeck model is isotropic, see end of Proposition II.1). 1. Spatially homogeneous model If our prior belief is that variation in the trait is homogeneous over all values of x 2 S then we should choose the spatial covariance K to be stationary, and if the traits are typically smooth we should choose K to generate smooth random functions. An example is the squared exponential covariance function: K(x1 , x2 ) = exp (x1 x2 )2 /2✓12 . (11) 6 Here the hyperparameter ✓1 fixes the characteristic length scale of the random functions. This choice of K gives ⌃T ((x1 , t1 ), (x2 , t2 )) = K(x1 , x2 )kT (t1 , t2 ) = exp (x1 x2 ) 2 (12) /2✓12 dT (t1 , t2 )/✓2 . (13) This simple example may be combined by summation to construct other spatially homogeneous phylogenetic covariance functions, although the separability property is typically then lost. The following phylogenetic covariance function, ⌃0T , contains an uncorrelated noise term whose influence is controlled by the choice of the parameter n2 : ⌃0T ((x1 , t1 ), (x2 , t2 )) = (1 2 n ) exp (x1 x2 )2 /2✓12 dT (t1 , t2 )/✓2 + 2 n t1 ,t2 x1 ,x2 (14) where is the Kronecker delta. When functional data are sampled at a finite set of space-time points L, the phylogenetic covariance function (13) may be used to specify a prior for Bayesian regression as described in subsection II A. Figure 1B gives a schematic representation of functional data and the posterior distribution for the root function, when all functions are discretely sampled on a regular lattice. An illustrative example developing Figure 1B is provided in the supplement. 2. Spatially inhomogeneous model We may alternatively construct a prior using the representation (9). This involves choosing deterministic spatial basis functions 1 , . . . , n : S ! R and univariate phylogenetic Gaussian processes X1 , . . . , Xn . The spatial basis may be specified a priori, or alternatively obtained by functional decomposition of observed data: in the companion paper [17] we present a practical methodology for this empirical Bayesian approach. If we assume that the Xi are independent phylogenetic Ornstein-Uhlenbeck processes on T with noise, then we have i kT (t1 , t2 ) = (1 ( i 2 n ) ) exp ( dT (t1 , t2 )/✓2 ) + ( i 2 n ) t1 ,t2 . (15) The full covariance function of our prior distribution is then ⌃inhom ((x1 , t1 ), (x2 , t2 )) = E [f (x1 , t1 )f (x2 , t2 )] = T n X i kT (t1 , t2 ) i (x1 ) i (x2 ). (16) i=1 Figure 1A gives a schematic representation of functional data and the posterior distribution for the root function, when all functions are constructed from the spatial basis 1 , . . . , n . III. DISCUSSION In this paper we have exploited the powerful Bayesian inference architecture provided by Gaussian processes to address evolutionary questions for function-valued data. The closed-form posterior distribution (3) o↵ers a straightforward approach to the prediction of unobserved function-valued traits, and the associated explicit likelihood function is an attractive feature for Bayesian inference in the presence of uncertainty over the phylogenetic tree or the evolutionary process. The approach is suitable both for complete (or dense) observations of function-valued traits and for sparsely and even irregularly sampled traits, with missing observations. [1] Ramsay, J.O. 1982 When the data are functions. Psychometrika 47(4), 379-396. 7 [2] Ramsay, J. O. & Silverman, B.W. 2005 Functional data analysis. Springer. [3] Lande, R. 1976 Natural Selection and Random Genetic Drift in Phenotypic Evolution. Evolution 30(2), 314-334. [4] Rasmussen, C.E. & Williams, C.K.I. 2006 Gaussian processes for machine learning. MIT Press. [5] Stein, M.L. 1999 Interpolation of spatial data: some theory for kriging. Springer. [6] Patil, G.P. & Rao, C.R. 1993 Multivariate environmental statistics. North-Holland. [7] Kingsolver, J.G., Gomulkiewicz, R. & Carter, P.A. 2001 Variation, selection and evolution of functionvalued traits. Genetica 112-113, 87-104. (DOI 10.1023/A:1013323318612.) [8] Kirkpatrick, M. and Heckman, N. 1989 A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters. Journal of Mathematical Biology 27(4), 429-450. [9] Aston, J.A.D, Buck, D., Coleman, J., Cotter, C.J., Jones, N.S., Macaulay, V., MacLeod, N., Moriarty, J., Nevins, A. 2011 Phylogenetic inference for function-valued traits: speech sound evolution. Trends in Ecology and Evolution 27:3, 160-166 (2012). [10] Wiens, J.J. 2001 Character analysis in morphological phylogenetics: problems and solutions. Syst Biol. 50(5), 689-99. [11] Catalano, S.A., Golobo↵, P.A., Giannini, N.P. 2010 Phylogenetic morphometrics (I): the use of landmark data in a phylogenetic framework. Cladistics 26 (5), 539549. (DOI 10.1111/j.10960031.2010.00302.x) [12] Felsenstein, J. 1985 Phylogenies and the Comparative Method. Am. Nat. 125(1), 1-15. [13] Cheverud, J.M., Dow, M.M. and Leutenegger, W. 1985 The Quantitative Assessment of Phylogenetic Constraints in Comparative Analyses: Sexual Dimorphism in Body Weight Among Primates Evolution 39 6, 1335-1351. [14] Hansen, T.F. and Martins, E.P. 1996 Translating Between Microevolutionary Process and Macroevolutionary Patterns: The Correlation Structure of Interspecific Data. Evolution 50(4), 1404-1417. [15] Butler, M.A. and King, A.A. 2004 Phylogenetic Comparative Analysis: A Modeling Approach for Adaptive Evolution. The American Naturalist 164(6), 683-695. [16] Martins, E.P. & Hansen, T.F. 1997 Phylogenies and the Comparative Method: A General Approach to Incorporating Phylogenetic Information into the Analysis of Interspecific Data. Am. Nat. 149(4), 646-667. [17] Hadjipantelis, P.Z., Jones, N.S., Moriarty, J, Springate, D, Knight, C.G. Ancestral Inference from Functional Data: Statistical Methods and Numerical Examples arXiv 2012. [18] Diniz-Filho, J.A. F. 2001 Phylogenetic autocorrelation under distinct evolutionary processes. Evolution 55, 1104-1109. [19] Gittleman, J.L. and Kot, M. 1990 Adaptation: Statistics and a Null Model for Estimating Phylogenetic E↵ects. Systematic Zoology 39(3), 227-241. [20] Grafen, A. 1989 The Phylogenetic Regression. Proc. R. Soc. B 326(1233), 119-157. Supplementary Material for Evolutionary Inference for Function-valued Traits Nick S. Jones1 and John Moriarty2 1 2 Department of Mathematics, Imperial College London, SW7 2AZ School of Mathematics, University of Manchester, Oxford Road, Manchester M13 9PL, UK I. SUPPLEMENTARY MATERIAL A: EXPERIMENTS To provide a worked example of the use of our model in Bayesian regression, we performed a public engagement exercise at a secondary school in which empirical data were generated from a version of the game ‘telephone’ (where players are typically nodes in a directed cycle graph and spoken messages are passed between successive neighbours to comic e↵ect) with two di↵erences: the players memorise and reproduce a drawing between neighbours, and the players are arranged as nodes in a tree graph and pass the message from root to the tips. This data was created by students from The Swinton High School and has the virtue that we know the true phylogeny and have functional data at internal nodes of the tree (including the root). The data is displayed in Fig. 1. The challenge is to make inferences about the phylogeny and the curve at the root, given only the curves at the tips. Note, however, that our aim here is only to outline our method, and not to make a scientific claim about this particular dataset: a discussion of statistical issues is reserved for the companion paper [1] and later papers. The experiment was designed to produce variation which was spatially homogeneous (as the writing surface used is uniform) and both Markovian and stationary in time (as the writing surface used has a fixed size), and the drawing process was assumed to produce independent, identically distributed errors at each point of observation: we may therefore model the evolution with Eq. (14) from the main text. For thirty distinct points of observation (ten measurements per curve, for the three curves at the tips of figure 1 of the main text) we use Eq. (14) of the main text with knowledge of the phylogeny T and an appropriate choice of hyperparameters ✓ = (✓1 , ✓2 , n ) to construct the thirty-by-thirty matrix of covariances ⌃0T (L, L, ✓). For our hyperparameters, we simply chose values which seemed reasonable given the experimental design. Note that hyperparameters may also be fitted to data (which is the approach taken in the companion paper [1]) in an empirical Bayesian approach, or alternatively be assigned (hyper)prior distributions in a fully Bayesian treatment. We choose the model selection task of inferring the correct assignment of curves to tips (which is equivalent to inferring the correct tree topology, given knowledge of its branch lengths). Up to equivalences, there are three possible assignments, and each choice results in a di↵erent phylogenetic covariance matrix ⌃0T (L, L, ✓) for use in model selection. One can use Eq. (2) of the main text to calculate Bayes factors for the three assignments. (With our ‘reasonable’ choice of hyperparameters ✓ = (✓1 , ✓2 , n) = (0.33, 30, 0.05), the Bayes factors suggested that the true assignment is a decisive choice over the other two possible assignments.) Given the correct assignment one can also calculate the posterior mean and variance of an ancestral trait using Eqs. (3-5) of the main text. This requires the construction of ⌃0T (M, L, ✓) and ⌃0T (M, M, ✓). These summaries of the posterior are shown, for a set of points M at the root of the phylogeny, by the red and blue curves in Fig. 1B of the main text. We study this task in detail for a synthetic dataset in the companion paper [1], using the spatially inhomogeneous approach described in section IIC2 of the main text. This experiment was part of the EPSRC funded public engagement project “Mutating Messages - A Public Experiment” EP/I017615/1 http://mutatingmessages.blogspot.com/. We would like to thank especially both the students of Class 11S1 and their teacher Ellen Pope. We would also like to thank Liam Moriarty and Karen Bultitude for their assistance with this exercise. 2 Root Leaf1 Leaf2 Leaf3 FIG. 1: Sample data generated by experiments with volunteers from Class 11S1. Each piece of functional data was generated by a di↵erent student. The seating plan of the students was arranged to be in the form of a tree; in this figure the data generated by each student is positioned to correspond to the seating plan. Essentially this was a game of ‘telephone’ but with participants passing on drawings, not sounds, and with the participants sitting in positions along a tree graph, not a directed cycle graph. II. SUPPLEMENTARY MATERIAL B: MATHEMATICAL EXPRESSIONS FOR THE PHYLOGENETIC COVARIANCE FUNCTION (NON-MARKOVIAN CASE) In this section we derive some mathematical expressions for the phylogenetic covariance function. Using the terminology of the main text, we first assume that the marginal covariance function is space-time separable. We do not, however, assume that the time-dependent component of the marginal covariance function is Markovian. Our main result is a convenient expression for the phylogenetic covariance function: Proposition II.1. For t1 , t2 2 T we have kT (t1 , t2 ) = hkt1 , kt2 i12 (1) where h, i12 denotes a Reproducing Kernel Hilbert Space (RKHS) inner product which is defined below along with kt1 and kt2 . Although the technical level of this section is higher than the rest of this report, RKHS are an important topic in Gaussian process regression and necessary background is given in [2], sections 4.3 and 6.1. 1. Notation. For convenience we collect here the various definitions that will be used. Again using the terminology of the main text, let 3 • S be the co-ordinate of the function-valued trait • T be a rooted phylogenetic tree whose branch lengths represent time (a chronogram), and for each t 2 T define its date as the patristic distance between t and the root. Let t1 , t2 be points in T, and let their most recent common ancestor in T be t12 , with date t12 . Let ta , tb be respectively the earliest and latest dates of points in T • k be a continuous covariance function on [ta , tb ] with only finitely many nonzero eigenvalues (that is, a degenerate Mercer kernel). Let k 12 be its restriction to [ta , t12 ], and define kt1 (t2 ) = k(t1 , t2 ) and kt12 (t2 ) = k 12 (t1 , t2 ) 1 (2) Let H12 be the unique reproducing kernel Hilbert space defined by k 12 , with inner product 12 12 12 h·, ·i12 . Let ⌫ 12 be a Borel measure on [ta , t12 ], and let e12 1 , . . . , er and 1 , . . . , r be 12 respectively the eigenfunctions and (strictly positive) eigenvalues of k with respect to the measure ⌫ 12 , that is Z t12 12 12 12 k 12 (s, w)e12 (3) i (w)d⌫t (w) = i ei (s) ta Z where t12 ta 12 12 e12 i (w)ej (w)d⌫t (w) = (i, j) (4) is the Kronecker delta. Then by Mercer’s Theorem, r X 12 k (s, w) = 12 12 12 j ej (s)ej (w) (5) j=1 Let Y be a Gaussian process on [ta , tb ] with covariance function k and Y 12 be a Gaussian process on [ta , t12 ] defined by Y 12 (w) = r q X 12 Z e12 (w) i i i (6) i=1 where the Zi are independent standard Normal random variables. It is easy to check that the covariance function of Y 12 is k 12 . The Gaussian processes Y and Y 12 therefore have the same 12 distribution when restricted to [ta , t12 ]. Recalling (2), for t12 < t < tb , let µ12 1 (t), . . . , µr (t) 12 12 be the Fourier coefficients of kt in the basis e1 , . . . , er : µ12 i (t) = Z t12 ta 12 kt (s)e12 i (s)d⌫ (s) so that kt (s) = r X 12 µ12 i (t)ei (s) (7) i=1 • kT (t1 , t2 ) be the covariance function of the univariate phylogenetic Gaussian process on the phylogeny T with marginal covariance function k(t1 , t2 ) • K be a continuous covariance function on S with only finitely many nonzero eigenvalues S , . . . , S , and corresponding eigenfunctions eS , . . . , eS with respect to a Borel measure ⌫ S . n n 1 1 4 To make this presentation self-contained, we restate here from the main text the two assumptions we use to construct a Gaussian process of function-valued traits indexed by T: Assumption II.1. Conditional on their common ancestors in the phylogenetic tree T, any two traits are statistically independent. Assumption II.2. The statistical relationship between a trait and any of its descendants in T is independent of the topology of T. Proof. Conditioning on common ancestry. By the definition of kT and the Law of Iterated Expectations we have kT (t1 , t2 ) = E[Y (t1 )Y (t2 )] = E[E[Y (t1 )Y (t2 )|Y (t) : t is an ancestor of t12 ]] (8) Applying conditional independence. Then by assumption 2.1, kT (t1 , t2 ) = E[E[Y (t1 )|Y (t) : t is an ancestor of t12 ] ⇥ (9) E[Y (t2 )|Y (t) : t is an ancestor of t12 ]] (10) Conditioning the marginal process. We continue to develop (10) by obtaining an expression for the conditional expectation E[Y (t1 )|Y (t) : t is an ancestor of t12 ] Lemma II.1. With the notation of section II 1 we have E[Y (t1 )|Y (t) : t is an ancestor of t12 ] = hkt1 , Y i12 r X µ12 i (t)Zi = p 12 i=1 (11) i where the final inequality holds in distribution. Proof. Because of Assumption 2.2 and the tree structure of T, the statement (11) is a claim about the marginal covariance function k. Then for s 2 [ta , t12 ] and t1 2 [t12 , tb ], since ks12 is a reproducing kernel we have hkt1 , ks12 i12 = kt1 (s) = k(t1 , s) (12) It follows from Fubini’s Theorem that E[hkt1 , Y i12 Y (s)] = E[Y (t1 )Y (s)] (13) and hence that the conditional expectation of Y (t1 ) given {Y (w) : w 2 [ta , t12 ]} is hkt1 , Y i12 , as required. Taking the inner product of (6) and (7) (see [2], equation (6.1)) establishes the lemma. ⇤ To finish the proof, observe that by (9) and (11) we have kT (t1 , t2 ) = r X µ12 (t1 )µ12 (t2 ) i i=1 as required. i 12 i = hkt1 , kt2 i12 (14) ⇤ 5 A. Proof of Proposition II.1, part 1 (main text) Let kT be the phylogenetic covariance function defined above, which depends only on time, and let Xi , i = 1, . . . , n be independent phylogenetic Gaussian processes on T each with mean zero and covariance function kT . Define a Gaussian process by n q X S X (t)eS (x) f (x, t) = (15) i i i i=1 In the terminology of the main text, it is easy to check that f is then a phylogenetic Gaussian process, indexed by both space and time, with separable covariance function ⌃T ((x1 , t1 ), (x2 , t2 )) = E[f (x1 , t1 )f (x2 , t2 )] = kT (t1 , t2 ) n X S S i ei (x1 )ei (x2 ) (16) i=1 = kT (t1 , t2 )K(x1 , x2 ) (17) It remains only to check that ⌃T has the correct marginal covariance. This follows by choosing t1 to be an ancestor of t2 in T, since then by Assumption 2.2 we have kT (t1 , t2 ) = k(t1 , t2 ) and so ⌃T ((x1 , t1 ), (x2 , t2 )) = k(t1 , t2 )K(x1 , x2 ) (18) = ⌃((x1 , t1 ), (x2 , t2 )) (19) ⇤ as required. B. Markovian case We say that ⌃ (or, equivalently, ⌃T ) is Markovian in evolutionary time if, given any t 2 T, the function-valued traits at all ancestors of t are statistically independent of those at all descendants of t given the value of the trait ft . If ⌃ is time-space separable and its time-dependent component k(t1 , t2 ) is Markovian, formula (16) simplifies further to give the explicit expression ⌃T ((x1 , t1 ), (x2 , t2 )) = K(x1 , x2 )k(t1 , t12 )k(t12 , t12 ) 1 k(t2 , t12 ) (20) A proof may be given similar to the above (with ⌫ 12 the Dirac mass at t12 ), but a simpler proof is the following. Let g be a phylogenetic Gaussian process with marginal covariance function k. Then by the Law of Iterated Expectations we have kT (t1 , t2 ) = E[E[g(t1 )g(t2 )|g(t12 )]] (21) By the Markov property, the functional traits at times t1 and t2 depend on their common ancestry only through the normal random variable g(t12 ). Assumption 2.1 therefore gives that kT (t1 , t2 ) = E[E[g(t1 )|g(t12 )]E[g(t2 )|g(t12 )]] (22) By Assumption II.2 and standard properties of Normal random variables we have E[g(t1 )|g(t12 )] = k(t1 , t12 )k(t12 , t12 ) 1 g(t12 ) from which the required result follows, after substitution into (22). (23) ⇤ [1] Hadjipantelis, P.Z., Jones, N.S., Moriarty, J, Springate, D, Knight, C.G. Ancestral Inference from Functional Data: Statistical Methods and Numerical Examples arXiv 2012. [2] Rasmussen, C.E. & Williams, C.K.I. 2006 Gaussian processes for machine learning. MIT Press.