Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Variance makes the difference Patrick Simon and David Kitz Krämer 1 Introduction In this letter, we would like to stress the importance of variances in statistics, which – in the opinion of the authors – is sometimes overseen. The following simple example demonstrates that averages of probability distributions alone are not always sufficient to give a definite answer to a statistics related problem (frankly, almost never). Consider the following situation. Two different groups, 100 persons in total and 50 persons of each group, participate in a assessment test. It is known that members of one group score, on average, in this test 100 points. Members of the other group, on the other hand, achieve on average a higher result of 110 points. The test results of all persons altogether are sighted and the persons with the 10 best results are selected regardless what group they belong to (10 percent best). How many persons of the selected sample belong to group one and how many to group two? One might expect that more persons of group two belong to this sample – after all they are on average better than members of group one, right? As we will see, with the above given minimum of information this question cannot be answered. Essential for a quantitative answer is the knowledge of the full probability distribution, p1 (x) and p2 (x), of the individual groups’ test results, x, or, at least, the variance of the distributions, σ1 and σ2 . That is if we resort to a frequency distribution of test results that depends only on the average and the variance. Such a distribution is the Gaussian:1 1 (x − x̄)2 . (1) p(x) = √ exp − 2σ 2 2πσ 2 x̄ is the mean of the distribution. 2 The general solution In this section, I will give a quite general solution to the class of problems the one of the foregoing section is belonging to. We will return to the concrete example of the last section in the next section. Say, we have got N different groups each of which having its own probability distribution (PD) of, say, test results x. We call this PDs pi (x) where i is an group index ranging between 1 and N . Now, we randomly put together n1 persons from group i = 1, n2 persons from group i = 2, etc., into a new mixed group of N X ni (2) ntotal = i=1 1 This can at most only be an approximation because test results are usually always positive, whereas a Gaussian distribution is symmetric about its mean and stretches infinitely into both directions. 1 persons. To decide whether a person of group i belongs to the pbest percent best of the mixed group we need to know the test result limit xbest which divides the best from the rest. For that purpose, we have to work out the PD of the mixed group, ptotal (x), because Z ∞ pbest = dx ptotal (x) = 1 − Ctotal (xbest ) . (3) xbest This means that the (normalised) area under the total test result distribution starting from xbest up to the largest result (infinity) has to be the fraction pbest . This area contains the top results of persons we are seeking for selection. By Z x Ctotal (x) ≡ dx′ ptotal (x′ ) (4) 0 we denote the so-called cumulative distribution of the frequency distribution of all test results. With this definition we can, formally, write down the result that divides best from rest: −1 xbest = Ctotal (1 − pbest ) . (5) It just means that we have to set the limit xbest such that pbest percent of the total distribution of results are exactly beyond that limit. −1 The function Ctotal (p) is the inverse cumulative distribution; it tells you the range of test results, starting from zero, within which p percent of all results can −1 be found. For example, Ctotal (0.5) defines the median of the total distribution because it returns the test result below which half of the results are located. But what is now the total distribution? If the persons in the new mixed group perform exactly the same way than in their individual groups2 (they are statistically independent), then the PD of the mixed group is just the weighted average of all individual PDs, namely N N X 1 X ni pi (x) = ni pi (x) . ptotal (x) = n ntotal i=1 i=1 total (6) This is because a) the probability that one particular person in the mixed group belongs to group i is ni /ntotal , and b) if it belongs to group i it has a test result x with a likelihood of pi (x). The cumulative distribution – the probability to obtain a result between zero and x – is therefore: Ctotal (x) = 1 ntotal N X ni Ci (x) , (7) i=1 where Ci (x) are the cumulative distributions of the individual groups, Z x Ci (x) = dx′ pi (x′ ) . (8) 0 2 For example, it does not matter whether we assemble all people in one room while they are doing their test. 2 Unfortunately, it is among many circumstances quite hard or even impossible to find in general the inverse cumulative distribution from Eq. (7) analytically. In these cases we have to find a (approximate) numerical solution based on a concrete problem. This is what has been done for the example in the final section. The actual question has not been answered yet. How many persons of group i do we expect, on average (!), inside the top sample? This boils down simply to the question about the probability to find a) a person of group i and b) with a test result beyond xbest . Following the reasoning of the previous paragraphs this number is ni [1 − Ci (xbest )] . (9) This number in relation to the total number of persons in the top sample is ni [1 − Ci (xbest )] , ntotal pbest (10) where the expression in the denominator corresponds to the total size of the top sample which we fixed initially. We can see in the final result, Eq. (9), that the answer to the initially raised question does not simply depend on the averages of the PDs but on the full shape of this distribution, encoded within Ci (x). 3 Graphical solution The analytic result, yet exact but often not solvable analytically, can be pictured graphically. We did this for one example with three different groups in Fig. 1. To find the solution you have to proceed basically in three steps: 1. Draw the cumulative PD, ni [1 − Ci (x)], of each group into the same diagram. In words, this distribution tells you how many individuals of a particular group have (on average) scores better than x. In the figure, the groups have in total 40 (one), 30 (two) and 30 (three) members. As a consistency check: for the lowest possible x (here x = 0) the PD has to be identical to the total number of members of the group, and the PD has to decline (or stay constant) for increasingly larger x. 2. Now compute the sum of all individual PDs. This yields the PD of the mixed group that does not discriminate between the individual group members. This is expressed by Eq. (7). Again, at x = 0, or the lowest possible test result, you must find the size of the mixed group (here: 100). 3. Finally, decide how large the top sample should be (here: 9). Employing the PD of the mixed group you can work out the result lower limit xbest that has to be achieved by a member of the top sample (intersect point of #persons = 9 with mixed PD). This corresponds to the Eq. (5). At the intersect points of x = xbest with the PDs of the individual groups you can read off the number of persons of each group contributing to the top sample. This is analogues to Eq. (9). 3 total (sum) group 1 group 2 group 3 1 2 x(best) for top sample with 9 persons: 133 persons of group one in top sample: 5 persons of group two in top sample: 3 persons of group three in top sample: 1 3 x(best) Figure 1: Figure visualising how Eq. (9) works using one particular example with three different groups containing 40, 30 and 30 persons, respectively. The PDs are Gaussians with (x̄1 = 100, σ1 = 25), (x̄2 = 110, σ2 = 20) and (x̄3 = 60, σ3 = 40) (upper left panel for the cumulative distribution of the group specific results; number of persons better than a given x). The mixed group is thus made up of 100 persons (upper right panel for the cumulative distribution of the mixed group; just the sums of the individuals). The nine best persons have on average a score better than xbest = 133 (see lower left panel). The top sample contains on average five persons of group one, three from group two and one belonging to group three. See text for details. 4 Figure 2: PDs of two groups (solid and dashed lines) and the total PD (yellow area in background) obtained by combining the two groups (50:50). The 10% best results are located in the blue shaded area beyond about xbest ≈ 150; this region is shown in more detail in the small inlet panel in the upper right. 4 Example Going back to the beginning, we can now give an answer for a particular example. According to the discussion in the last section we know that a definitive answer can only be given if the PD of the frequency of results for both groups has to be given. We assume here that group one has an average of x̄1 = 100 and a variance of σ1 = 25, while group two has x̄2 = 110 and σ2 = 20, hence a larger average but a somewhat less variance. Both groups are modelled by a Gaussian distribution, Eq. (1). We combine these two groups to a mixed group with a fraction of 50% of group one persons and 50% group two persons (100 persons in total). The individual PDs and the total PD can be seen in Fig. 2. If we now focus on the 10% best (10 persons) in the total distribution we can see that the ratio between group one and group two in the top sample is pretty balanced (50:50; 5 from group one, 5 from group two) even though there is, apparently, only a slight difference in the variances of the distributions and even though group two is clearly better than group one – on average. If we get even more restrictive and chose the top sample of the 1% best (one person: the best person), then it is even more likely to get a member of group one than group two (65% against 36%; in 65 out of 100 of such tests we get for this single candidate somebody from group one). So, why is that? Is there some sort of discrimination at work? Should the fraction of group two members not be larger, which are on average score better in the test? In this example, the answer is no – and no conspiracy is happening. 5 The key is that group one has a larger spread in possible results (variance) than group two. A larger variance not only means that we have a larger probability to obtain worse results but also a better chance to get excellent results. This is why we find more group one than group two persons in the extreme tail of the mixed sample. Therefore a larger variance in a sample can make up for an apparent inferiority due to a low average. This shows that it can be, when using statistical arguments, fatal to reduce the properties of a sample to just the “average”. 6