Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cosmology Statistical cosmol Peter Coles looks at the past, present and future association of astronomy and cosmology with statistical theory. Abstract Over the past decade unprecedented improvements in observational technology have revolutionized astronomy. The effects of this data explosion have been felt particularly strongly in the field of cosmology. Enormous new datasets, such as those recently produced by the WMAP satellite, have at last placed our understanding of the universe on a firm empirical footing. The new surveys have also spawned sophisticated statistical approaches that can do justice to the quantity and quality of the information. In this paper I look at some of the new developments in statistical cosmology in their historical context, and show that they are just the latest example of a deep connection between the fields of astronomy and statistics that goes back at least as far as Galileo. I t has become a bit of a cliché to refer to the modern era of observational cosmology as a “golden age”, but clichés are clichés because they are often perfectly apt. When I started my graduate studies in cosmology in 1985, the largest redshift survey in circulation was the CfA slice (De Lapparent, Geller and Huchra 1986) containing just over a thousand galaxy positions. Now we have the AngloAustralian 2dF Galaxy Redshift Survey (2dFGRS) in which the combination of widefield (two-degree) optics and multifibre spectroscopy has enabled redshifts of about a quarter of a million galaxies to be obtained in just a few years (Colless et al. 2001). Back in 1985, no experiment had been able to detect variations in the temperature of the cosmic microwave background (CMB) radiation across the sky predicted by theories for the formation of the structures seen in galaxy surveys. This changed in 1992 with the discovery, by the COBE team (Smoot et al. 1992), of large-scale “ripples” in the CMB. This, in turn, triggered immense theoretical and observational interest in the potential use of finer-scale structure in the CMB as a cosmological diagnostic. The stunning high-resolution maps produced by the WMAP satellite (Bennett et al. 2003) are to COBE what the 2dFGRS is to the CfA survey. These huge new datasets, and the others that 3.16 1: The huge improvement in angular resolution from degree to arcminute scales reveals unprecedented detail in the temperature structure of the CMB, but also poses challenges. Millions rather than thousands of pixels mean that even simple statistical covariances require more substantial number-crunching for WMAP (below) than COBE (above). More importantly, the high signal-to-noise of WMAP means that systematic errors must be understood even more carefully than in the COBE map. will soon follow them, have led to a recognition of the central role of statistics in the new era of cosmology. The new maps may be pretty to look at, but if they are to play a role in the empirical science that cosmology aspires to be, they must be analysed using objective techniques. This requires a statistical approach. Patterns in the distribution of galaxies and in the distribution of hot and cold spots in the microwave sky must be quantified using statistical descriptors that can be compared with theories. In their search for useful descriptors, cosmologists have borrowed ideas from diverse fields of mathematics, such as graph theory and differential topology, as well as adapting more traditional statistical devices to meet the particular demands of cosmological data (Martinez and Saar 2002). The power-spectrum, for example, a technique usually applied to the analysis of periodicities in time series, has been deployed with great success to the analysis of both 2dFGRS (Percival et al. 2001) and WMAP (Hinshaw et al. 2003). This urgent demand for a statistical approach to cosmology is not new. It is merely the latest manifestation of a fascinating and deep connection between the fields of astronomy and statistics. Astronomy and the history of statistics The attitude of many physicists to statistics is summarized by two quotations: “There are lies, damned lies, and statistics” (Benjamin Disraeli); and “If your experiment needs statistics, you ought to have done a better experiment” (Ernest Rutherford). It was certainly my attitude when I was an undergraduate physics student that statistics was something done by biologists and the like, not by real scientists. I skipped all the undergraduate lectures on statistics that were offered, and only saw the error of my ways when I started as a graduate student and realised how fundamentally important it is to understand statistics if you want to do science with observations that are necessarily imperfect. Astronomy is about using data to test hypotheses in the presence of error, it is about making inferences in the face of uncertainty, and it is about extracting model parameters from noisy data. In short, it is about statistics. This argument has even greater strength when applied to cosmology, which has all the peculiarities of astronomy and some of its own. In mainstream astronomy one can survey populations of similar objects, count frequencies and employ arguments based on large numbers. In cosmology one is trying to make inferences about a unique system, the universe. As we shall see, history has a lesson to teach us on this matter too. The connection between astronomy and statistics is by no means a one-way street. The June 2003 Vol 44 Cosmology ogy in retrospect 2: Great astronomers who also developed statistical methods. Bernoulli Bessel Galileo development of statistics from the 17th century to the modern era is a fascinating story, the details of which are described by Hald (1998). To cut a long story short, it makes sense to think of three revolutionary (but not necessarily chronological) stages. The first involved the formulation of the laws of probability from studies of gambling and games of chance. The second involved applying these basic notions of probability to topics in natural philosophy, including astronomy (particularly astrometry and celestial mechanics). The third, and most recent stage, saw the rise of statistical thinking in the life sciences, including sociology and anthropology. It is the second of these stages that bears the most telling witness to the crucial role astronomy has played in the development of mainstream statistics: an astonishing number of the most basic concepts in modern statistics were invented by astronomers. Here are just a few highlights: ● Galileo Galilei (1632) was, as far as I am aware, the first to find a systematic way of taking errors into account when fitting observations, i.e. assigning lower weights to observations with larger errors. He used absolute errors, rather than the squared error favoured by modern approaches. ● Daniel Bernoulli (1735) performed a pioneering analysis of the inclinations of the planetary orbits, with the aim of determining whether they were consistent with a random distribution on the sky. ● John Michell (1767), although probably more famous for his discussion of what are now known as black holes, engaged in a statistical analysis of the positions of the “fixed stars” on the celestial sphere to see if they were consistent with a random pattern. ● Laplace (1776) studied the problem of determining the mean inclination of planetary orbits. ● Carl Freidrich Gauss (1809) developed the principles of least-squares in fitting observations. ● Laplace (1810) proved a special case of the June 2003 Vol 44 Gauss Jeffreys Central Limit Theorem, namely that the distribution of the sum of a large number of independent variables drawn from some distribution tends towards the Gaussian distribution regardless of the original distribution. ● Friedrich Wilhelm Bessel (1838), best known for his functions, provided a general proof of the Central Limit Theorem. ● John Herschel (1850) developed a theory of errors based on the “normal”, i.e. Gaussian distribution. The phrase “normal distribution” is now standard. ● Sir Harold Jeffreys (1939) published a book on The Theory of Probability and so rekindled a long-standing debate on the meaning of probability ignited by Laplace and Thomas Bayes to which I shall return later on. All these pioneering studies influenced modern-day statistical methods and terminology, and all were done by scientists who were, first and foremost, astronomers. It is only comparatively recently that there has been significant traffic in the reverse direction. The Berkeley statisticians Jerzy Newman and Elizabeth Scott (1952), for example, made an important contribution to the study of the spatial distribution of galaxy clustering. A forensic connection Before resuming the thread of my argument, I can’t resist taking a detour in order to visit another surprising point at which the history of statistics meets that of astronomy. When I give popular talks, I often try to draw the analogy between cosmology and forensic science. We have one universe, our one crime scene. We can collect trace evidence, look for fingerprints, establish alibis and so on. But we can’t study a population of similar crime scenes, nor can we perform a series of experimental crimes under slightly different physical conditions. What we have to do is make inferences and deductions within the framework of a hypothesis that we continually subject to empirical test. This Laplace Quételet process carries on until reasonable doubt is exhausted. Detectives call this “developing a suspect”; cosmologists call it “constraining a theory”. I admit that I have stretched this analogy to breaking point, but at least it provides an excuse for discussing another famous astronomer-come-statistician. Lambert Adolphe Jacques Quételet (1796–1894) was a Belgian astronomer whose main interest was in celestial mechanics. He was also an expert in statistics. As we have seen, this tendency was not unusual for astronomers in the 19th century but Quételet was a particularly notable example. He has been called “the father of modern statistics” and, among other things, was responsible for organizing the first international conference on statistics in 1853. His fame as a statistician owed less to his astronomy, however, than the fact that in 1835 he had written a book called On Man. Quételet had been struck not only by the regular motions displayed by heavenly bodies, but also by regularities in social phenomena, such as the occurrence of suicides and crime. His book was an attempt to apply statistical methods to the development of man’s physical and intellectual faculties. His later work Anthropometry, or the Measurement of Different Faculties in Man (1871) carried these ideas further. This foray into “social physics” was controversial, and led to a number of unsavoury developments in pseudoscience such as the eugenics movement, but it did inspire two important developments in forensics. Inspired by Quételet’s work on anthropometry, Adolphe Bertillon hit upon the idea of solving crime using a database of measurements of physical characteristics of convicted criminals (length of head, width of head, length of fingers, and so on). On their own, none of these could possibly prove a given individual to be guilty, but it ought to be possible using a large number of measurements to establish identity with a high probability. “Bertillonage”, as the system he invented came to be known, was cumbersome but proved 3.17 Cosmology successful in several high profile criminal cases in Paris. By 1892 Bertillon was famous, but nowadays bertillonage is a word found only in the more difficult Sunday newspaper crosswords. The reason for its demise was the development of fingerprinting. It’s a curious coincidence that the first fingerprinting scheme in practical use was implemented in India, by a civil servant called William Herschel. But that is another story. The great debate A particularly interesting aspect of modern statistical cosmology is the attention being given to Bayesian methods. This has reopened a debate that goes back as far as the early days of statistics, indeed to the first of the three revolutionary periods I referred to above. This debate concerns the very meaning of the probabilistic reasoning on which the field of statistics is based. We make probabilistic statements of various types in various contexts. “Arsenal will probably win the Premiership,” “There’s a one-in-six chance of rolling a six,” and so on. But what do we actually mean when we make such statements? Roughly speaking there are two competing views. Probably the most common is the frequentist interpretation, which considers a large ensemble of repeated trials. The probability of an event or outcome is then given by the fraction of times that particular outcome arises in the ensemble. In other words, probability is identified with some kind of proportion. The alternative “Bayesian” view does not concern itself with ensembles and frequencies of events, but is instead a kind of generalization of the rules of logic. While ordinary logic, Boolean algebra, relates to propositions that are either true (with a value of 1) or false (value 0), Bayesian probability represents the intermediate state in which one does not have sufficient information to determine truth or falsehood with certainty. Notice that Bayesian probability is applied to logical propositions, rather than events or outcomes, and it represents the degree of reasonable belief one can attach. Moreover, all probabilities in this case are dependent on the assumption of a particular model. The name “Bayesian” derives from the Revd Thomas Bayes (1702–61) who found a way of “inverting” the probability involved in a particular problem involving the binomial distribution. The general result of this procedure, now known as Bayes’ theorem, was actually first obtained by Laplace. The idea is relatively simple. Suppose I have an urn containing N balls, n of which are white and N–n of which are black. Given values of n and N, one can write down the probability of drawing m white balls in a sequence of M draws. The problem Bayes solved was to find the probability of n given a particular value of value of m, thus inverting the probability P(m|n) into P(n|m). This particular problem is only of acad3.18 emic interest, but the general result is important. To give an illustration, suppose we have a parameter A that we wish to extract from some set of data D that we assume is noisy. A standard frequentist approach would proceed via the construction of a likelihood function, which gives the probability of the data D for each particular value of the parameter A and the property’s experimental noise P(D|A). The maximum of this likelihood would be quoted as the “best” value of the parameter: it is the value of the parameter that makes the measured data “most likely”. The alternative, Bayesian, approach would be to “invert” the likelihood P(D|A) to obtain P(A|D) using Bayes’ theorem. The maximum of P(A|D) then gives the most probable value of A. Clearly these two approaches can be equivalent if P(A|D) and P(D|A) are identical, but in order to invert the likelihood correctly we need to multiply it by P(A), a “prior probability” for A. If this probability is taken to be constant then the two approaches are equivalent, but consistent Bayesian reasoning can involve non-uniform priors. To a frequentist, a prior probability means model-dependence. To a Bayesian, any statement about probability involves a model. It has to be admitted, however, that the issue of prior probabilities and how to assign them remains a challenge for Bayesians. This is not a sterile debate about the meaning of words. The interpretation of probability is important for many reasons, not least because of the implications it holds for the scientific method. The frequentist interpretation, the one favoured by many of the classical statistical texts, sits most comfortably with an approach to the philosophy of science like that of Karl Popper. In such a view, data are there to refute theories, and repeated failure to refute a theory provides no reason for stronger belief in it. On the other hand, Bayesians advocate “inductive” reasoning. Data can either support a model, making it more probable, or contradict it, making it less probable. Falsifiability is no longer the sole criterion by which theories are judged. There is not space here to go over the pros and cons of these two competing approaches, but it is interesting to remark that the early historical development of mathematical statistics owes much more to the Bayesian approach than frequentist ones but these fell into disrepute later. Most standard statistical texts, such as the compendious three-volume Kendall and Stuart (1977), are rigorously frequentist. When Harold Jeffreys (1939) advocated a Bayesian position he was regarded as something of a heretic. The book by Ed Jaynes (2003), published posthumously, is a must for anyone interested in these issues. The Bayesian interpretation holds many attractions for cosmology. One is that the universe is, by definition, unique. It is not impossible to apply frequentist arguments in this arena by constructing an ensemble of universes, but one has to worry about whether such an ensemble is meant to be real or simply a construct to allow probabilities to become proportions. Bayesians, on the other hand, can deal with unique entities quite straightforwardly. Another reason for cosmologists to be interested in Bayesian reasoning is that they are quite used to the idea that things are modeldependent. For example, one of the important tasks set during the era of precision cosmology has been the determination of a host of cosmological parameters that can’t be predicted a priori in the standard Big Bang model. These parameters include the expansion rate (Hubble constant) H0 , the spatial curvature k, the mean cosmic density in various forms of material Ωm (cold dark matter), Ων (neutrinos), Ωb (baryons), and so on. In all, it is possible to infer joint values of 13 or more such parameters from measurements of the fluctuations of the cosmic microwave background. But such parameters only make sense within the overall model within which they are defined. This massive inference problem also demonstrates another virtue of the Bayesian approach. The result (posterior probability) of one analysis can be fed into another as a prior probability. Probable worlds As well as its practical implications for data analysis, a Bayesian approach also leads to useful insights into more esoteric topics; see Garrett and Coles (1993). Among these is the Anthropic Principle which is, roughly speaking, the assertion that there is a connection between the existence of life in the universe and the fundamental physics that governs large-scale cosmological behaviour. This expression was first used by Carter (1974) who suggested adding the word “Anthropic” to the usual Cosmological Principle to stress that our universe is “special”, at least to the extent that it has permitted intelligent life to evolve. Barrow and Tipler (1986) give a complete discussion. There are many otherwise viable cosmological models that are not compatible with the observation that human observers exist. For example, heavy elements such as carbon and oxygen are vital to the complex chemistry required for terrestrial life to have developed. And it takes around 10 billion years of stellar evolution for generations of stars to synthesize significant quantities of these elements from the primordial gas of hydrogen and helium that exists in the early stages of a Big Bang model. Therefore, we could not inhabit a universe younger than about 10 billion years old. Since the size of the universe is related to its age if it is expanding, this line of reasoning, originallly due to Dicke (1961) sheds some light on the question of why the universe is as big as it is. It has to be big, because it has to be old if there June 2003 Vol 44 Cosmology 0.3 t red sh if 0.2 3h 11h 0.1 2h 12h 1h 13h 0.5 0h 1.0 lio 1.5 igh nl 14h ea ty rs has been time for us to evolve within it. Should we be surprised that the universe is so big? No, its size is not improbable if one takes into account the conditioning information. This form of reasoning is usually called the Weak Anthropic Principle (WAP), and is essentially a corrective to the Copernican Principle that we do not inhabit a special place in the universe. According to WAP, we should remember that we can only inhabit those parts of spacetime compatible with human life, away from the centre of a massive black hole, for example. By the argument given above, we just could not exist at a much earlier epoch than we do. This kind of argument is relatively uncontroversial, and can lead to useful insights. From a Bayesian perspective, what WAP means is that our existence in the universe in itself contains information about the universe and it must be included when conditional probabilities are formed. One example of a useful insight gleaned in this way relates to the Dirac Cosmology. Dirac (1937) was perplexed by a number of apparent coincidences between large dimensionless ratios of physical constants. He found no way to explain these coincidences using standard theories, so he decided that they had to be a consequence of a deep underlying principle. He therefore constructed an entire theoretical edifice of time-varying fundamental constants on the socalled Large Number Hypothesis. The simple argument by Dicke outlined above, however, dispenses with the need to explain these coincidences in this way. For example, the ratio between the present size of the cosmological horizon and the radius of an electron is roughly the same as the ratio between the strengths of the gravitational and electromagnetic forces binding protons and electrons. (Both ratios are huge: of order 1040). This does indeed seem like a coincidence, but remember that the size of the horizon depends on the time: it gets bigger as time goes on. And the lifetime of a star is determined by the interplay between electromagnetic and gravitational effects. It turns out that both these ratios June 2003 Vol 44 4h 10h bil 3: The 2dFGRS cone showing more than 200 000 galaxies with recession velocities up to 100 000 km/s, selected from the APM survey. The CfA redshift survey, state of the art in 1986, contained only two thousand galaxies with top recession velocity 15 000 km/s. The 2dFGRS is the closest to a fair sample of the galaxy distribution that has ever been produced. 23h 2.0 22h 2.5 reduce to the same value precisely because they both depend on the lifetime of stellar evolution: the former through our existence as observers, and the latter through the fundamental physics describing the structure of a star. The “coincidence” is therefore not as improbable as one might have thought when one takes into account all the available conditioning information. Some cosmologists, however, have sought to extend the Anthropic Principle into deeper waters. While the weak version applies to physical properties of our universe such as its age, density or temperature, the “Strong” Anthropic Principle (SAP) is an argument about the laws of physics according to which these properties evolve. It appears that these fundamental laws are finely tuned to permit complex chemistry, which, in turn, permits the development of biology and ultimately human life. If the laws of electromagnetism and nuclear physics were only slightly different, chemistry and biology would be impossible. On the face of it, the fact that the laws of Nature do appear to be tuned in this way seems to be a coincidence, in that there is nothing in our present understanding of fundamental physics that requires the laws to be conducive to life in this way. This is therefore something we should seek to explain. In some versions of the SAP, the reasoning is essentially teleological (i.e. an argument from design): the laws of physics are as they are because they must be like that for life to develop. This is tantamount to requiring that the existence of life is itself a law of Nature, and the more familiar laws of physics are subordinate to it. This kind of reasoning appeals to some with a religious frame of mind but its status among scientists is rightly controversial, as it suggests that the universe was designed specifically in order to accommodate human life. An alternative construction of the SAP involves the idea that our universe may consist of an ensemble of mini-universes, each one having different laws of physics to the others. We can only have evolved in one of the mini-universes compatible with the development of organic chemistry and biology, so we should not be surprised to be in one where the underlying laws of physics appear to have special properties. This provides some kind of explanation for the apparently surprising properties of the laws of Nature mentioned above. This latter form of the SAP is not an argument from design, since the laws of physics could vary haphazardly from mini-universe to miniuniverse, and is in some aspects logically similar to the WAP, except with a frequentist slant. Fingerprinting the universe I hope I have demonstrated that the relationship between statistics and cosmology is an intimate and longstanding one. This does not mean that the modern era does not pose new challenges, some of them resulting from the sheer scale of the data flows being generated. To understand some of these new problems it is perhaps helpful to step back about 20 years and look at the sort of methods that could be applied to cosmological data and the sort of questions that were being asked. In the 1980s there was no universally accepted “standard” cosmological model. The curvature and expansion rate of the universe had not yet been measured, the average density of matter was uncertain by a factor of at least 10. Some basic ideas of the origin of cosmic structure had been established, such as that it formed by gravitational condensation from small initial fluctuations perhaps generated during an inflationary epoch in the early universe, but the form of the initial fluctuation spectrum was uncertain. The largest redshift surveys covered a volume less than 100 Mpc deep and contained on the order of a thousand galaxies. The only forms of statistical analysis that could be performed at this time had to be simple in order to cope with the sparseness of the data. The most commonly used statistical analysis of galaxy clustering observations involved the two-point correlation (e.g. Peebles 1980). This measures the statistical tendency for galaxies to occur in pairs rather than individually. The correlation function, usually written ξ(r), measures the excess number of pairs of galaxies found at a separation r compared to how many such pairs would be found if the distribution of galaxies in space were completely random. In other words, ξ(r) measures the excess probability of finding a galaxy at a distance r from a given galaxy, compared with what one would expect if galaxies were distributed independently throughout space. A positive value of ξ(r) indicates that there are more pairs of galaxies with a separation r; galaxies are then said to be clustered on the scale r. A negative value indicates that galaxies tend to avoid each other, so they are anticlustered. A completely random distribution, usually called a Poisson distribution, has ξ(r) = 0 for all r. Estimates of the correlation function of galaxies indicate that ξ(r) is a power-law 3.19 Cosmology 3.20 10 1 power spectrum/smooth model 0.1 0.01 10–3 0.01 0.05 0.1 0.1 wave number k/h Mpc–1 angular scale 2° 0.5° 90° 6000 1 0.2° TT cross power spectrum 5000 l (l+1) Cl / 2π (µK2) 5: The CMB angular power spectrum for WMAP, COBE, ACBAR and CBI data. Note that the WMAP results are consistent with the other experiments and that the combined WMAP and CBI points constrain the angular power spectrum of our sky with great accuracy and precision. The agreement between the measured spectrum and the model shown not only suggests the basic cosmological framework is correct, but also allows parameters of the model to be estimated with unprecedented accuracy. The polarization spectrum shown below is less certain, but does show a perhaps surprising peak at very low spherical harmonics usually attributed to the astrophysical effect of cosmic reionization. (WMAP Science Team images.) 1 0.02 Λ – CDM all data WMAP CBI ACBAR 4000 3000 2000 1000 0 3 (l+1) Cl /2π(µK2) 4: The 2dFGRS power-spectrum. Note the small error bars resulting fron the large sample size. There is some suggestion of “wiggles” in the spectrum that may be the remnants of the acoustic peaks in the WMAP angular spectrum (figure 5). The power spectrum is a useful diagnostic of clustering pattern; a complete statistical description of the filaments and voids in figure 3 will require more sophisticated measures. (2dFGRS Team/Will Percival.) power spectrum ∆2(k) function of the form ξ(r) ≈ (r/r0)–1.8, where the constant r0 is usually called the correlation length. The value of r0 depends on the type of galaxy used, but is around 5 Mpc for bright galaxies. On larger scales than this, however, ξ(r) becomes negative, indicating the presence of large voids. The correlation function is mathematically related to the power spectrum P(k) by a Fourier transformation; the power-spectrum is in many ways a better descriptor of clustering on large-scales than the correlation function and has played a central role in the analysis of the 2dFGRS I referred to earlier. Since they form a transform pair, ξ(r) and P(k) contain the same information about clustering pattern. A slightly different version of the power-spectrum not based on Fourier modes but on spherical harmonics is the principal tool for the analysis of CMB fluctuations especially for the purpose of extracting cosmological parameters. The correlation function and power-spectrum are examples of what are called second-order clustering statistics (the first-order statistic is simply the mean density). Their success in pinning down cosmological parameters relies on the fact that according to standard cosmological models, the initial fluctuations giving rise to large-scale are Gaussian in character, meaning that second-order quantities furnish a complete description of their spatial statistics. The microwave background provides a snapshot of these initial fluctuations, so the spherical harmonic spectrum is essentially a complete description of the statistics of the primordial universe. The universe of galaxies is not quite such a pristine reflection of the initial cosmic fluctuations, but the power-spectrum is such a simple and robust measure of clumping that one can correct for nonlinear evolution, bias and distortions produced by peculiar motions in a relatively straightforward manner, at least on large scales (Percival et al. 2001). This is why secondorder statistics have been so fruitful in the analysis of the latest surveys and in establishing the basic parameters of the standard cosmology. One has to bear in mind, however, that second-order statistics are relatively blunt instruments. There is much more information in 2dFGRS and WMAP than is revealed by their power-spectra, but this information is not so much about cosmological parameters as about deviations from the standard model. The broadbrush description afforded by a handful of numbers will have to be replaced by measurements of the fine details left by complex astrophysical processes involved in structure formation and possible signatures of unknown physics. The strength of the second-order statistics is that they are insensitive to these properties, so their detection will require higher-order approaches tuned to probe particular departures from the standard model. Does the CMB display primordial nonGaussianity (Komatsu et al. 2003), for example? reionization TE cross power spectrum 2 1 0 –1 0 What can one learn about galaxy formation from the clustering of galaxies of different types in different environments (Madgwick et al. 2003)? The recent stunning discoveries are not the end of the line for statistical cosmology, but are signposts pointing in new directions. ● Peter Coles is Professor of Astrophysics at the University of Nottingham. References Barrow J D and F J Tipler 1986 The Anthropic Cosmological Principle Oxford University Press, Oxford. Bennett C L et al. 2003 ApJ submitted, astro-ph/0302207. Colless M M et al. 2001 MNRAS 328 1039. 10 40 100 200 400 multipole moments (l) 800 1400 De Lapparent V, M J Geller and J P Huchra 1986 ApJ 302 L1. Garrett A J M and P Coles 1993 Comments on Astrophys. 17 23. Hald A 1998 A History of Mathematical Statistics from 1750 to 1930 John Wiley & Sons. Hinshaw G et al. 2003 ApJ submitted, astro-ph/0302217. Jaynes E T 2003 Probability Theory CUP, Cambridge. Jeffreys H 1939 The Theory of Probability OUP, Oxford. Kendall M G and A Stuart 1977 The Advanced Theory of Statistics (3 volumes), Charles Griffin & Co, London. Komatsu E et al. 2003 ApJ submitted, astro-ph/0302223. Madgwick D S et al. 2003 MNRAS submitted, astro-ph/0303668. Martinez V J and E Saar 2002 The Statistics of the Galaxy Distribution Chapman & Hall. Neymann J and E Scott 1952 ApJ 116 144. Peebles P J E 1980 The Large-scale Structure of the Universe Princeton University Press, Princeton. Percival W J et al. 2001 MNRAS 327 1297. Smoot G F et al. 1992 ApJ 396 L1. June 2003 Vol 44