Download Causation, Equivalence and Group Selection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Causation, Equivalence, and Group-Selection
Abstract: We defend the view that group selection is operative whenever demographic
variables causally influence fitness from the challenges recently posed by Godfrey-Smith
[2008]. We outline a precise form of the causal theory of group selection, and then use it
to show that group level models are possible when neighborhood variables causally
influence individual fitness. We further challenge the equivalence condition imposed by
Godfrey-Smith, and show why cases of pure soft selection may correctly be regarded as
instances of group selection.
Word Count: 9154
Causation, Equivalence, and Group-Selection
Abstract: We defend the view that group selection is operative whenever demographic
variables causally influence fitness from the challenges recently posed by GodfreySmith[2008]. We outline a precise form of the causal theory of group selection, and then
use it to show that group level models are possible when neighborhood variables causally
influence individual fitness. We further challenge the equivalence condition imposed by
Godfrey-Smith, and show why cases of pure soft selection may correctly be regarded as
instances of group selection.
1. Introduction.
Heisler and Damuth [1987] and [1988] introduced a novel conception of group
selection. Following Lande and Arnold [1983], they take the strength of selection on a
trait T to be measured by the partial regression coefficient of fitness on T controlling for
other traits correlated with T. Such coefficients make sense as measures of selection if
one takes selection on T to occur just in case T has non-zero variance in the population
and is a cause of fitness. Under certain conditions, the partial regression coefficient will
reliably measure the degree to which changes in the value of T produce changes in
fitness, and in this sense measure the strength of selection on T.
Heisler and Damuth note that hierarchical regression allows the use of group
‘surrogate’ variables in place of phenotypic variables. If we map individuals to
collections of individuals, we can define a demographic variable D(i) as some moment of
the distribution of T over the collection to which the individual i is mapped. We can then
Draft: Bruce Glymour and Chris French
2
determine the strength of group selection in the population. We regress D on fitness
controlling for T. If the regression coefficient is significantly different from 0, we infer
group selection (of ‘type 1’) is acting, with a strength measured by the regression
coefficient.
The methods suggested by Lande and Arnold are now routine in population
biology, and their application to group selection is common place. Goodnight et al.
provide a succinct statement of the underlying theory of group selection: “We define
multilevel selection to be variation in the fitness of individuals that is due to both
properties of the individuals and properties of the group or groups of which they are
members.”[1992 , pg. 745] According to this view, the presence of higher level selection
processes is not determined by the need to attribute fitnesses to aggregates of lower level
units, but by the causal relevance of demographic variables to the fitness of lower level
units. We will henceforth refer to this basic idea, that group selection is operative when,
and only when, the fitness of individuals is causally influenced by one or more
demographic variables, as the causal theory of group selection.
In a recent paper Peter Godfrey-Smith [2008] develops a critique of the causal
theory of group selection. In this paper, we show that Godfrey-Smith’s argument against
the causal theory is in fact no more than an appeal to tradition, albeit an intuitive
tradition. In the following sections we first develop the causal theory of group selection
with some care, and then rehearse Godfrey-Smith’s challenge, which turns on a particular
kind of demographic variable (so called neighborhoods). We will then show that
neighborhood selection may be correctly modeled as group selection. We extend our
critique to a principle, which we will call the equivalence condition, which Godfrey-
Draft: Bruce Glymour and Chris French
3
Smith employs to reject group models of neighborhood selection. We argue that the
equivalence condition is motivated only by the intuition that group selection must involve
selection of groups, and we argue further that given certain explanatory or predictive
aims, theoretical parsimony provides good, if not compelling reason to reject the
intuition, and with it the equivalence condition.
2. The Causal Theory of Group Selection.
In what follows we offer an idiosyncratic presentation of a set of ideas that are
either explicitly or implicitly developed by a number of biologists going back in some
respects to Sewall Wright [1934], and including especially Lande and Arnold [1983],
Heisler and Damuth [1987] and [1988] and Goodnight et al. [1992]. The formalization
we offer departs in various ways from the work of these authors; none we think would
endorse all of what we say here. Since we do not wish to belabor exegetical details, we
simply note that though we inherit the ideas from the authors cited above, we take
responsibility for the presentation.
2.1. Preconditions. The causal theory of group selection is a version of MultiLevel Selection Theory. As such, it is indifferent to the units which are taken to comprise
a population. These may be particular locations on particular DNA molecules, whole
chromosomes, pairs of chromosomes, whole genomes, individual organisms or small
groups of organisms, e.g. colonies. It is essential only that the units be identified as
tokens, i.e. individual items, and that these units aggregate into collections. In what
follows the units are individual organisms unless we explicitly state otherwise, but
nothing hinges on this.
Draft: Bruce Glymour and Chris French
4
2.2. Variables. Let the variable i index the units in the population, and let
variable T(i) be a measure of some phenotypic trait over which units in the population
vary. Let G(i) be a (possibly arbitrary) function from units to aggregates or collections of
units (i.e. subsets of the total population of units). Aggregates can be agent-centered or
not. An agent-centered aggregate specifies the set G(i) by reference to some relation R to
the focal unit: if G(i)=g then a unit j is in g just in case j bears R to i. Non agent-centered
aggregates are defined without reference to such relations.
Whether agent-centered or not, aggregates can be inclusive or exclusive. An
aggregate is inclusive if every unit is in the group to which it is mapped: i.e. if G(i)=g
then i∈g. An aggregate is exclusive if this is not so. By and large, biologists use
aggregates that are either non-agent centered and inclusive, as e.g. groups delimited by
spatial location, or aggregates that are agent centered and exclusive, as e.g.
neighborhoods (usually the collection of individuals other than the focal individual within
some given distance of the focal individual). Henceforth we denote aggregates of the
first kind as groups, and call demographic variables defined relative to them group
variables. We denote aggregates of the second kind as neighborhoods, and call
demographic variables defined relative to them neighborhood variables.
Given a population of units, a function G(i) from units to aggregates, and
measurements of T for each unit, we can find for each unit i the frequency distribution of
T in the aggregate G(i). Let G(i)=g and let M(g) be some moment of the distribution over
T in g; for explicative ease we will henceforth take M to be the arithmetic mean, but
nothing hinges on this. The values m of M record properties of a distribution in a
collection of units and hence cannot be taken to denote some attribute of any unit i in a
Draft: Bruce Glymour and Chris French
5
collection on which M is measured. However, we can define a surrogate variable,
measured on units: D(i)=M(G(i)). D(i)=m does record a property of the unit i: the
property of being mapped by the function G to an aggregate in which the distribution of T
has the mean m. Henceforth we will call variables such as T individual variables and
variables such as D demographic variables, but the reader is enjoined to remember that
both kinds of variables are measured on units, and hence take values that denote
properties of the units; they differ in that demographic variables track the properties of
the aggregates to which units are mapped, while individual variables do not.
2.3. Selection and Causation. A precise formulation of the causal theory of
group selection depends on a clear conception of causation. We adopt the interventionist
conception used in graphical causal modeling (Spirtes, Glymour and Scheines [2000],
Pearl [2000]). Direct causation is an asymmetric dependence relation between variables,
relative to some set S of variables under consideration. The relation of direct causation
holds between two variables in S whenever without directly changing the value of other
variables in S, there is an intervention changing the first variable which changes the
probability distribution over the second variable. Systems of causal dependencies, or
causal structures, can be represented graphically by treating variables as nodes in a
directed graph, with arrows from direct causes to their immediate effects. Given such a
graph, one variable is a cause, simpliciter, of another if there is a directed path of any
length from the first to the second. According to the causal theory of group selection,
higher level selection acts on a population of units relative to a set of variables S if and
only if S includes absolute fitness, W, and some demographic variable D such that D
Draft: Bruce Glymour and Chris French
6
causes W relative to S. Group selection is the special case in which the units are
individual organisms.
2.4. Models. Group selection and other forms of higher level selection, so
understood, can be usefully modeled in at least three ways: by graphical causal models,
by structural equation models (SEMs) and by population genetics models. First, a
graphical causal model of the causal structure governing fitness is a model of higher level
selection whenever some demographic variable in the model is a cause, direct or distal, of
fitness. For a given variable set S and a correct graph over S, one can by direct
inspection determine whether higher level selection acts on the population relative to that
variable set.
The graph alone, however, is insufficient to determine the strength of selection.
To do this, one needs a structural equation model. Such a model consists in a set of
equations, each of which writes an endogenous variable as some function of its
immediate causes in S: V(i)=F(parents(V))+ε, where parents(V) denotes the set of
immediate causes of V in the causal graphical model.1 According to the causal theory,
the strength of selection on a trait is measured counter-factually by the amount of change
in fitness one would induce by an intervention that changes the value of the trait by some
standardized unit.2
Together, the graphical causal model and the SEM provide the most fundamental
description of the selection process possible, relative to the set of variables S. Given
measurements of the variables in S for units in the population, the fitnesses of those units
can be calculated. If units are then assigned to classes, e.g. genotypes, absolute and
1
2
In non-linear systems it is sometimes necessary to write the equations in the form V(i)=F(parents(V), ε).
In non-linear systems an alternative representation is required, e.g. the logit regression coefficient.
Draft: Bruce Glymour and Chris French
7
relative fitnesses for the classes can be calculated and used in a population genetics
model. Population genetics models enable predictions of future frequency distributions
from the information contained in the causal graphical and SEM models.
But because population genetics models are statistical summaries of the net
influence of selection on various trait variables, those models are generally insufficiently
precise to provide a causally informative description of selection. In particular, such
models do not identify particular variables which causally influence fitness, much less
enable one to estimate the influence of any such variable on fitness. Hence, population
genetics models cannot alone be used to determine whether or not group selection acts on
the population; they cannot be used to estimate the absolute strength of group selection;
and they cannot be used to estimate the relative importance of group and individual
selection. They are in this sense less fundamental, by the lights of the causal theory of
group selection.
2.5. Pluralism. The causal theory of group selection imposes no prior constraints
on the choice of unit, beyond the requirement that units be tokens rather than types, and
the theory imposes no prior constraints on the choice of variables in terms of which to
model a population. As a consequence, the causal theory of group selection
countenances ontologically distinct models of the same population as equally legitimate.
We illustrate with a sequence of examples.
Generative Model. Population P is composed of sexually reproducing diploid
organisms with alleles A and a segregating and assorting independently at a locus L. The
genotypes AA, Aa and aa causally determine strictly distinct phenotypes T=1, 2 and 3,
according to the equation T(i)=1+Num(i), where Num(i) is the number of a alleles in an
Draft: Bruce Glymour and Chris French
8
individual’s genome, and the phenotype T causally determines the reproductive success
of individual organisms according to the function W(i)=T(i)2. Further, individuals are
located in spatially defined groups, each with 10 members, and the function G(i) maps
individuals to groups. The population can then be modeled as follows. Model 1: S1={T,
W}, the causal graphical model is TàW and the SEM is W(i)=T(i)2. Model 1 correctly
diagnoses phenotypic selection, i.e. selection on trait T. But the population can be
equally well described by Model 2: S2={Num, W}, NumàW with W(i)=(1+Num(i))2.
Model 2 correctly diagnoses genotypic selection, i.e. selection on Num. Selection on
both individual phenotype and genotype is correctly diagnosed by Model 3: S3={T, Num,
W}, the causal model is NumàTàW, and the SEM consists in a pair of equations,
T(i)=1+ Num(i) and W(i)=T(i)2. It is worth considering one more individual level model:
Model 4: S4={D, Num, T, W}, where D(i) is the mean of T in the group to which i is
mapped by G. We then have the causal model
D
Num
T
W
and the SEM T(i)=1+Num(i), W(i)=T(i)2. Model 4 correctly diagnoses selection on the
phenotypic trait variable T and the genotypic trait variable Num, and it correctly
diagnoses the absence of group selection on D, since D is not a cause of W. The causal
theory of group selection regards all 4 individual level models as equally legitimate.
Each expresses a true set of claims about which variables do and do not cause other
variables, relative to different choices of the set S with respect to which the causal claims
are made.
Draft: Bruce Glymour and Chris French
9
Models in which the units are higher or lower level entities can also be
constructed. The causal graphical models are straightforward; the associated SEMs are
not, and we omit exact versions of them. Model 5 is a group level model, i.e. the
variables are all measured on groups, which are now the units for the model: S5={M,
W’}, where M(g) is the mean value of T in the group g, and W’(g) is group fitness. The
corresponding causal model is M(g)àW’(g). Model 6 is a chromosomal model; let c
index chromosomes among all zygotes at a given generation, and let I(c) map
chromosomes to individual zygotes, i.e. groups of 2 chromosomes. Define A(c)= 1 if c
carries a and 0 otherwise. Define the demographic variable N(c)=1 if c is paired with a
chromosome that carries a and 0 otherwise, , and let W’’(c) be the number of copies
produced by the chromosome c. Then S6={A, N, W’’}and the causal model is
AàW’’ßN. On model 6, higher level selection acts on the population of (zygotic)
chromosomes, because N is a neighborhood variable. In both models 5 and 6, the
equations predicting fitness must be stochastic. We illustrate one of the reasons why
below, using model 5 as an example; here we write for model 6 the less complex
equation for the expectation over W’’(c), assuming that chromosomal identities and
variable-values are preserved under crossing over: Exp(W(c))= .5 (1+A(c)+N(c))2.
Both models 5 and 6 are, from the perspective of the causal theory of group
selection, perfectly legitimate models of selection in P. Model 6 correctly diagnoses the
occurrence of higher level selection on chromosomes. Further, while model 5 implies
that groups are selected for, in respect of their M values, it does not imply group selection
occurs on units of any type. Recollect that a variable is demographic with respect to a
unit if it tracks properties of the aggregates to which the units belong. Because the units
Draft: Bruce Glymour and Chris French
10
in model 5 are groups, M(g) is not a demographic variable. Hence, there are no
demographic variables in model 5, and so it cannot support any claim at all about group
selection. This illustrates a crucial difference between the causal theory and alternative
theories of group selection. On the causal theory, given the decision to use objects of a
given kind as the units, the presence of higher level selection on those objects is
determined by the causal relevance of demographic variables. It is entirely irrelevant
whether or not there are available models using groups of these objects as the units, and it
is equally irrelevant whether or not there are available models using components of these
objects as the units. It is similarly irrelevant whether groups of units exhibit differential
reproductive success, and if so, whether such differences are caused by differences in
group properties.3
We should note, however, that while the causal theory is permissive with respect
to the choice of unit and variable set, it does not thereby endorse every such choice as
equally useful. Certain choices of unit and/or variable set will ill-serve certain predictive
ends, in various contexts. For example, model 5 above is a disastrous choice if one
wishes to predict the consequences of interventions on the population. To see this,
consider what happens when we intervene to change the value of M(g) for some
particular group g. Recollect that groups have 10 members. Suppose we intervene on a
3
Model 5 also illustrates one of the counter-intuitive consequences of theories of group selection on which
group selection requires merely that group properties cause differential reproductive success of groups: as
model 5 shows, groups of individuals will exhibit differential reproductive success, caused by differences
in the group property M(g). If M(g) evolves in the population of groups, it then will appear to be a group
adaptation, even though, as is clear from the generative model itself, this is a classic case of intuitively
individual selection. Conversely, the causal theory allows for the evolution of etiologically defined group
adaptations even when groups do not differentially reproduce. Fully accommodating pre-theoretic
intuitions about group adaptations is difficult on either theory, and deserves an extensive treatment.
However, Godfrey-Smith’s critique of the causal theory is independent of questions about the nature of
group adaptation; because our aim is to rebut these criticisms, we hereafter omit discussion of group
adaptation.
Draft: Bruce Glymour and Chris French
11
particular group g to set the value of M(g) to 20. The resulting value of W’(g) depends on
just how we do this. If we force all 10 members of g to take the value T=2, then W’(g)
will be 40; but if we force 5 members of g to take the value T=1 while the other 5 take the
value T=3, W’(g) will be 50. The SEM model will therefore have to take W’ to be a
stochastic function of M(g). The ‘group level’ model will thus require a certain level of
imprecision when predicting the fate of the population under interventions, which
imprecision could be avoided by choosing different units.
We emphasize here, because it will be relevant later, that reliable prediction is
possible only with a model that correctly identifies the causal system governing fitness,
and does so using well chosen units. Models that fail to correctly describe the causal
structure will fail with respect to some predictive tasks, and so too will models that
correctly describe the causal structure over a poorly chosen set of variables or units. The
causal theory thus recognizes models with distinct choices of unit as ontologically
legitimate, so long as each model correctly describes the causal relations among included
variables. In so doing, the causal theory offers no endorsement of legitimate models as
well suited to any given predictive or explanatory aim; rather it excludes as illegitimate
models that fail to represent the correct causal structure among included variables,
because those models will necessarily fail to predict well under interventions, and will
not underwrite causal explanations of interest.
3. The Argument.
Godfrey-Smith [2008] argues for two conclusions. The first, and most
fundamental is this: it is not in general true that higher level (group) selection occurs
Draft: Bruce Glymour and Chris French
12
when population structure (i.e. demographic variables) influences the fitness of
individuals. If he is correct, the causal theory of group selection is a mistake. He defends
the conclusion by means of an intermediate lemma, which is a second conclusion of
interest in its own right. The lemma is this: higher levels of selection occur only when
real biological relations partition a population of individuals into equivalence classes
(henceforth, the equivalence condition). The equivalence condition is in turn defended
by appeal to the intuition that group selection requires selection of groups. A sketch of
the argument follows.
Collections of units are legitimate fitness bearing entities only if these collections
partition a population into a set of equivalence classes, where the equivalence relation
implied by this partition has “some basis in” (i.e. is approximately extensionally
equivalent on the population to) “…some real biological relation between…” units,
which must itself therefore be (approximately) an equivalence relation (c.f. GodfreySmith, pp. 29-30). Sometimes the fitness of individuals is influenced by neighborhood
variables. In such cases population structure influences the fitness of individuals in the
population. However, neighborhoods do not partition the population into equivalence
classes (and so violate the equivalence condition), and therefore do not count as stable
collections which might compete with one another. Partitions of the population into
groups will be arbitrary, since it is neighborhood rather than group variables that
influence fitness, and so group models also violate the equivalence condition.
Consequently there are no higher level collections of units that satisfy the equivalence
condition, and hence no higher level units which can be said to compete with one
another. Thus, there is no higher level selection, even though population structure does
Draft: Bruce Glymour and Chris French
13
influence individual fitness. Therefore, according to Godfrey-Smith, at least some cases
in which demographic variables cause fitness admit only of lower-level representations,
representations on which fitness is attributed to individuals.
The equivalence condition plays a crucial role here, and is subject to challenge.
One obvious objection concerns the use of contextual analysis (a particular form of
hierarchical regression) to identify group selection. By substituting demographic
variables tracking properties of neighborhoods rather than groups, contextual analysis can
(sometimes) be used to identify the influence of these properties on the reproductive
success of individuals. Godfrey-Smith allows that contextual analysis does identify the
fact that population structure influences individual fitness.4 However, he notes that in a
special class of cases, so called soft selection, contextual analysis wrongly diagnoses
group selection.
In soft selection there is frequency dependent selection within groups, but the
fitness of each group is fixed over time, and hence does not co-vary with group
composition. Consequently, in soft selection there is no selection of groups: groups do
not vary in fitness. Absent a redefinition of group selection contrary to the quite intuitive
and traditional idea that group selection requires the differential reproductive success of
groups, contextual analysis will wrongly identify cases of soft selection as cases of group
selection. Such false positives show that contextual analysis is not an apt procedure for
identifying group selection in general. Hence, concludes Godfrey-Smith, the fact that
contextual analysis does not presuppose the equivalence condition cannot be used to
sustain objections to that condition. If the condition is right, then it is possible for
4
We are not so sanguine. Contextual analysis is as reliable as any regression method, but regression
methods are in general not very reliable. Fortunately, alternatives are available, though discussion of them
is beyond the scope of this paper (see e.g. Shipley, 2000)
Draft: Bruce Glymour and Chris French
14
demographic variables to influence fitness even when there is no higher level selection.
The causal theory of group selection is therefore mistaken.
We think Godfrey-Smith’s treatment of so called neighborhood selection underrepresents the resources available on the causal theory of group selection, in ways that
lend a kind of misleading support to his later defense of the equivalence condition. There
is an argument against the causal theory of group selection, but that argument is
essentially a naked appeal to the traditional intuition that group selection requires
selection of groups. Below, we first reconsider a specific case of neighborhood selection
in order to make clear what can, and cannot, be said about such selection on the causal
theory of group selection. We then explore of the merits of the equivalence condition and
the associated objection to the causal theory of group selection.
4. A Central Example, Reanalyzed.
4.1. Model 7. Consider a population of individuals of two types, A and B. The
types reproduce non-sexually with perfect heritability. Individuals are arranged on a
10x10 grid. The number of offspring produced by an individual is a function of its type,
and the distribution of types in its neighborhood. We define the phenotypic variable T(i),
with T(i)=1 if i is of type A and 2 if i is of type B. Let N(i) be a function from individuals
to sets of individuals, with j∈N(i) if and only if j is immediately above, below, to the left
or to the right of i (this is the Von Neumann neighborhood). We define the demographic
variable D(i), which takes a value equal to the number of A types in the neighborhood
N(i). Reproductive output is given by the function W(i)=T(i)+2D(i). After reproduction,
parents die and the grid is recolonized by the offspring, sampled at random and without
Draft: Bruce Glymour and Chris French
15
replacement; offspring that do not colonize suffer 100% mortality, offspring that do
suffer 0% mortality. Since the probability of a given offspring colonizing the grid is not
affected by its type, we can take W(i) as a measure of fitness.
We now ask how this population can be modeled using the causal theory of group
selection. Clearly, if we take units to be individuals, and use the variables T(i) and D(i), a
causal model is possible. The generative model itself includes a SEM, and this in turn
implies a causal graphical model: T(i)àW(i)ßD(i). Since the units, here individuals,
vary with respect to a demographic variable D and D is a cause of W, according to the
causal theory of group selection we should say that group selection acts on this
population.
This result is prima-facie counter-intuitive, because, as Godfrey-Smith argues, it
is altogether unclear whether there is selection on collections of individuals, i.e.
neighborhoods. If one requires that group selection operates only when there is selection
on groups of individuals, as Godfrey-Smith implicitly does, then to show that group
selection acts on our notional population, one must produce a model in which the units
are collections of individuals, and show that selection acts on these higher level units.
The demand for such higher level models is both traditional and intuitive, and for this
reason we consider what can be said in response to it. But we note that as a test of group
selection, one imposes this requirement only at the risk of begging the question against
the causal theory of group selection, since, as pointed out in section 2, the availability of
higher or lower level models is simply irrelevant from the perspective of that theory.
We could build a model in which the units are neighborhoods, but the result is, as
Godfrey-Smith notes, a mess. It is not that such models are impossible; it is just that they
Draft: Bruce Glymour and Chris French
16
are unnecessarily complex and yield accurate predictions only when those predictions are
woefully imprecise. And, what is really more pressing for Godfrey-Smith,
neighborhoods are not the kind of discrete entity that can engage in competition, and
hence it doesn’t really make sense to talk of selection on neighborhoods, at least for those
who take seriously the idea that selection requires competition. Fortunately, even though
the causally relevant demographic variable is defined over neighborhoods, it is possible
to produce a model in which the units are groups.
4.2. Model 8. Starting from the top, left corner of the grid, divide the grid into
non-overlapping 2x2 sub-grids and let G(i)=g be a function from individuals to sets of
individuals occupying the same sub-grid. Each value g then denotes a spatially defined
inclusive and non-agent-centered group of four individuals, and our population comprises
25 non-overlapping groups. Define C(g) as the number of As in a group, and DA’(g) as
the mean D, as defined above, among type As in the group, and DB’(g) as the mean D
among Bs in the group. Let W’(g) be the total offspring produced by all members of a
group. Then the number of A offspring produced by the group is given by
W’A(g)=C(g)(1+2DA’(g)), the number of B offspring is given by
W’B(g)=(4-C(g))(2+2DB’(g)),5 and W’(g)=W’A(g)+W’B(g). We have again an associated
graphical model:
DB ’
C
5
W’
DA ’
The number of Bs in a group is 4-C(g), and on average each has 2+2DB’(g) offspring.
Draft: Bruce Glymour and Chris French
17
For each triplet of values for C, DA’ and DB’, we can use the above equations to
compute the number of A and B type offspring produced by any group characterized by
that triplet of C, DA’, and DB’ values. For example, there are 5 possible values for C, each
individuating a class of group types: C=0, 1, 2, 3, or 4. For groups of C-type 0, DA’ is 0,
and DB’ ranges from 0 to 2 in steps of .25. Groups with C=0, DA’=0 and DB’=0 (groups
of all Bs, for each of which D(i)=0), produce 0 A offspring and 8 B offspring; groups
with C=0, DA’=0 and DB’=.25 produce 0 A offspring and 10 B offspring, and so on. In
this way, each group-type can be paired with rates for which that group type produces A
offspring and B offspring. The resulting model predicts group reproductive success,
type-reproductive success, and next generation type frequencies without error (see the
appendix for a proof). Further, predictions about the frequencies of contextualized
individual types, group types and contextualized group types depend on the sampling
function by which offspring are assigned locations on the grid. This function is common
to both group and individual level models, and so models will not differ about these
predictions so long as they do not differ with respect to predicted type frequencies and
numbers among the offspring. And there is no such difference.
Hence, it is true that the causal theory of group selection diagnosis the occurrence
of group selection when neighborhood properties influence the reproductive success of
individual organisms. It is also true that models of the population in which
neighborhoods are the unit are suspect, insofar as it is unclear whether neighborhoods can
be said to compete. On the other hand, in all but a special class of cases (considered
below), when neighborhood properties influence individual reproductive success, there
will be selection of groups, i.e. groups will vary with respect to traits that causally
Draft: Bruce Glymour and Chris French
18
influence group reproductive success, and it is possible to model the fate of such
populations at the group level. We must emphasize here that we are not recommending
the use of the group level Model 8 (or the test of group selection which forces its
consideration). Our point is not that Model 8 is somehow superior to the individual level
Model 7 (this depends entirely on the explanatory jobs one wants the model to do).
Rather, our point is this: the fact that the causal theory of group selection implies that a
population evolving in accordance with Model 7 is subject to group selection should not
be seen as objectionable or counter-intuitive in itself. In this population, group properties
cause (differential) group reproductive success, so even those who think that group
selection requires selection of groups prima facie need not reject the idea that group
selection is operative when neighborhood variables causally influence individual fitness.
5. The equivalence condition.
Groups, as defined above in Model 8, are equivalence classes, and same-groupmembership is an equivalence relation. But using the function G to structure the
population may well violate the Godfrey-Smith’s equivalence condition. Godfrey-Smith
explicitly calls groups such as those above ‘arbitrary’, and hence discounts such models
as violations of the equivalence condition. Whether or not groups constructed using a
given function G violate the equivalence condition depends on whether the specific
function G in question reflects some ‘real biological’ relation. Since our models and
Godfrey-Smith’s are abstract, there can be no such relation to reflect, of course. In real
world cases, it may perhaps be sufficient that groups are identified with particular
physical locations and individuated by more or less precise geographic boundaries, but
Draft: Bruce Glymour and Chris French
19
perhaps not. Godfrey-Smith is not altogether clear about what should count, here, as a
suitable physical or biological relation. Whatever precise standards may be suggested,
we think the imposition of them on demographic variables amounts to special pleading,
motivated only by a desire to save a particular intuition about the nature of group
selection. To see why special pleading is required, we need to recognize the nature of the
constraints the equivalence condition imposes on legitimate models.
Variables are functions, from some domain of units on which they are measured,
to some range of values. In most of science, the domain is some class of objects, e.g.
organisms, DNA molecules and cells in biology, sub-atomic particles and atoms in
physics, molecules in chemistry, people in psychology, and so on. The range of such
variables is commonly some restricted subset of the real or natural numbers. Valence, for
example, maps atoms to a subset of the integers while beak length maps individual birds
to a subset of the reals. But some variables map units to sets of units. Models employing
demographic variables, whether group or neighborhood variables, implicitly rely on an
indicator variable that maps units to their respective groups or neighborhoods, as for
example G in model 8 and N in model 7. The range of such an indicator variable is sets
of units. Call variables like this set-valued. The equivalence condition imposes
constraints on set-valued variables. In particular, these: when measured on a population
of units, the images of those units under a set-valued variable must be 1) each an
equivalence class of the population in which the set-valued variable is measured, such
that 2) each equivalence class has more than one member, and 3) each such class is
united by some underlying real physical relation R, where (approximately) all members
Draft: Bruce Glymour and Chris French
20
of an image bear R to one another, and none bear R to members of other sets which are in
the range of the set-valued variable.
Note that 2 and 3 are cogent demands only if 1 is a cogent demand, and 1 is
violated by any variable whose images are neighborhoods. So the equivalence condition
forbids the use of neighborhood variables in models. However, neighborhood variables
are used, and to all appearances used well, across the sciences. Outside of biology,
neighborhood variables have been used to good effect in sociology (e.g. centrality in
social networks, Freeman [1977], and the status of alter in a social pair, Christakis and
Fowler, [2007], as well as any number of neighborhood regression models). In the
physical sciences, such variables as coordination number and Miller indices are used in
crystallography and parts of chemistry (see e.g. Hartshorn et al. [2007] and Stolzberg et
al. [1998]), while other neighborhood variables are sometimes used in predicting
molecular activity (e.g. Guha et al. [2006]). Inside biology, neighborhood variables play
an important role in models of population dynamics (e.g. Pacala and Silander [1987] and
Dixon et al. [1999]) and ecology (e.g. Chou et al. [1993] on fire dynamics). Set-valued
variables that violate the equivalence condition are not common. But they are routine,
perfectly acceptable variables, employed in domains which span the range of science.
It may be that there is some methodological problem with models employing
neighborhood variables in these other domains. But Godfrey-Smith has said exactly
nothing about what these flaws may be: not a word about how such variables induce
inferential difficulties or measurement problems, or fail for reasonable explanatory or
predictive ends tout-court. Godfrey-Smith has only one, quite local, complaint. When
the equivalence condition is violated by models of selection, and therefore when
Draft: Bruce Glymour and Chris French
21
neighborhood variables are employed by models of selection, it is possible to diagnose
higher-level selection even when groups can not be said to compete with one another.
Said otherwise, the whole of Godfrey-Smith’s argument for the equivalence condition
rests on the appeal of the intuition that group selection requires selection of groups. If
one denies that group selection requires selection of groups, as those who endorse the
causal theory of group selection do, the equivalence condition is an unmotivated
imposition.
Much depends, therefore, on what we say about cases in which demographic
variables really do cause fitness, and yet there clearly is no selection of groups, as e.g. in
pure soft selection. If we stand by tradition, and endorse the intuition that group selection
requires selection of groups, the causal theory of group selection must be rejected. If, on
the other hand, there are good reasons for rejecting the intuition, the causal theory may
yet be saved.
6. Soft Selection, Group Selection and Intuitions. ‘Soft selection’ was introduced by
Wallace in 1967 to describe certain cases of frequency or density dependent selection
within local groups (see Wallace [1975]). Though the term has come to be used in a
variety of ways, it is still most commonly applied to frequency or density dependent
selection within groups, when group reproductive success exhibits relatively little
variance. Our concern is with cases of ‘pure’ soft selection, in which the variance of
group fitness is strictly zero: it is only under this condition that the causal theory of group
selection delivers counter-intuitive verdicts about the presence of group selection.
Draft: Bruce Glymour and Chris French
22
According to the causal theory of group selection, group selection acts on
populations subject to pure soft selection, even though in such cases there is no variance
in the fitness of groups. That result is counter-intuitive, because the formalisms of
population genetics require that selection acts only when there is non-zero variance in
fitness among units. When groups are the units, and the population is subject to pure soft
selection, there is no such variance. Hence, there is no selection of groups. And, it must
be admitted, group selection without selection of groups just sounds wrong. Nonetheless,
there are reasons to disdain the intuition.
The first thing to note is that, outside of the experimenter’s lab and the breeder’s
farm, pure soft selection is a measure zero event with a vanishingly small frequency. In
the wild, groups exhibit variation in reproductive success nearly always. This is just a
straightforward consequence of the myriad causes of survival and reproductive success,
and the stochastic nature of the causal dependencies. So if, when adopting the causal
theory of group selection, we must say intuitively silly things about particular cases, we
will not need to do so very frequently.
That cuts no ice unless there are positive advantages to be gained by adopting the
causal theory of group selection. And so there are. To see them, we should ask when
and why it is thought to be important to produce group level population genetic models.
The when is easily answered: it is commonly thought to be important to use group level
models when individuals exhibit both within and between group variance in fitness, and
the between group variance in fitness is non-spuriously associated with group
composition. Though it is perfectly possible in such cases to model the population by
attributing fitnesses only to individuals, it is widely held by those who endorse group
Draft: Bruce Glymour and Chris French
23
selection that doing so is a mistake (e.g. Sober and Wilson [1998]). Why? Because, it is
said, doing so misrepresents the causal structure in predictively and/or explanatorily
important ways.
How do individual level population genetics models misrepresent the causal
structure in such cases? Exactly in that the only causal claim even implicitly made by a
population genetics model is that the types to which fitnesses are attributed causally
influence fitness. Since the types in question are phenotypic or genotypic types of
individual, or worse, types of allele, such models are commonly read as implying that
only individual phenotypic or genotypic traits (or allelic types) causally influence fitness.
Individual level models are a mistake when this is not true.
So we need to ask, when is it that non-phenotypic/genotypic properties can
causally influence individual fitnesses so as to produce between group variance in fitness
in such a way that group fitness is associated with group composition? One possibility is
that phenotypes are non-randomly distributed over some environmental cline. In such
cases the association between group fitness and group composition is generally regarded
as spurious, resulting from accident or a common cause, e.g. biased migration rates.
Such cases are generally not regarded as instances of group selection.6 The other
possibility is that demographic variables cause fitness. Whenever it is the case that both
i) demographic variables causally influence individual reproductive success and ii) group
fitness causally depends on group fecundity, groups of different composition will exhibit
different fitnesses, and they will do so non-accidentally.
6
Special cases in which migration is non-random warrant further consideration, but this is beyond the
scope of the present paper.
Draft: Bruce Glymour and Chris French
24
So standard methods diagnose the presence of group selection when there is
between group variance in fitness which is non-spuriously associated with group
composition. The underlying causal mechanism generating such variance is the causal
influence of demographic variables on individual fitness. Exclusively individual level
population genetics models are misleading when demographic variables cause individual
fitness because such models suggest, wrongly, that only the genotypic and phenotypic
traits of individuals influence fitness. When the failure to recognize the relevance of
demographic variables leads to predictive or explanatory error, such models are more
than misleading, they are mistaken.
For example, one way to conceal the relevance of demographic variables is to
treat them as a special kind of environmental variable; Godfrey-Smith in fact suggests
such a strategy. One danger threatened by so treating demographic variables is that
environmental variables are only very rarely explicitly represented in population genetics
models. However, when demographic variables cause fitness, they generally produce
between group variance in individual fitness. Failure to control for this variance will lead
to incorrect estimations of selection coefficients, and hence to mistaken predictions,
especially about the consequences of interventions on the population. But one can
statistically control for the influence of a given cause only if it is represented. The
predictive and explanatory error consequent to such failure to represent the causes of
individual fitness is exactly the worry which excites those who think it important to test
for and represent group selection. And traditional ANOVA methods for detecting group
selection, not surprisingly, seek to identify exactly this kind of misrepresentation.
Draft: Bruce Glymour and Chris French
25
We are now in a position to see the merits of the way in which the causal theory
of group selection treats pure soft selection. The essential thing about models of
selection is that they represent the causes of fitness precisely enough to enable reliable
predictions and correct explanations. Individual level population genetics models
relevantly misrepresent the causes whenever demographic variables causally influence
fitness. The presence of both within and between group variance in individual fitness is a
(reasonably) reliable diagnostic for the causal relevance of one or more demographic
variables. However, the absence of between group variance in fitness, and hence the
absence of variance in group fitness, is not a reliable diagnostic for the causal irrelevance
of demographic traits. Such diagnostic failure is exactly what happens in pure soft
selection: demographic variables cause individual fitness, but their influence does not
generate fitness differences among groups, because group fitness does not depend on
group fecundity—condition ii) above is violated. The failure to represent the causal role
of demographic variables in such cases is, for all that, no less a misrepresentation of the
causes of individual fitness, and the implications for prediction under intervention no less
grievous. In this sense, pure soft selection is group selection: prototypical cases of group
selection and cases of pure soft selection are generated by the same kinds of causal
systems governing individual reproductive success.7
In essence, the causal theory of group selection replaces a phenomenological
conception of group selection, on which the presence of group selection is defined by a
statistical phenomena (within and between group variance in fitness) commonly
generated by a certain kind of causal system (systems in which demographic variables
7
The causal mechanisms governing group reproductive success are of course different: in prototypical
cases of group selection, group fecundity influences group fitness, while in pure soft selection this is not
true.
Draft: Bruce Glymour and Chris French
26
cause fitness), with a generative conception, on which group selection is defined by the
presence of a certain kind of causal system, which sometimes but not always produces
the once definitional but now diagnostic phenomenon. Our intuitions have been tutored
by the phenomenological definition; hence we are loath to see group selection when the
phenomenon is absent.
But the phenomenon is important for explanatory or predictive reasons only
because it is generated by a particular kind of causal system. Since the causal systems in
which demographic variables influence individual fitness do not always produce the
diagnostic phenomenon, both phenomenological and generative conceptions of group
selection will have to reject some deep-seated intuitions. Accept the causal theory of
group selection, and one is forced to allow that pure soft selection is group selection,
even though there is no variance in group fitness. Contrary to our intuitions, group
selection is not selection of groups, or anyway, it is not always selection of groups. On
the other hand, if one endorses a phenomenological conception, one must explain the
relevance of the phenomenon, namely between group variance in fitness, by appeal to the
importance of representing the causes of fitness, and then disavow the importance of
doing so in other cases indistinguishable in causal structure or degree of causal influence.
Neither option is entirely happy; but the former serves important explanatory and
predictive ends, namely those of correct causal explanation and reliable prediction under
intervention.
7. Conclusions.
Draft: Bruce Glymour and Chris French
27
The most important desiderata for biological models is that they enable correct
explanation and reliable prediction. There are a variety of explanatory and predictive
aims which require of models of selection that they correctly specify and identify the
causal structure governing survival and reproductive success, as for example correctly
identifying the traits causally responsible for change in population phenotype or
predicting the evolutionary consequences of interventions on a population. For those
who, with us, privilege such aims, it is essential that our theory of selection, at any level,
use terms and endorse models that enable precise and reliable descriptions of the causal
systems governing survival and reproductive success. Given our aims, we have very
good reasons for endorsing generative over phenomenological definitions of selection
processes, even when doing so forces us to say counter-intuitive things about some
particular cases. It is better, of course, if the number of such cases is small, and so it is
fortunate that pure soft selection is rare. But even were this not so, treating soft selection
as group selection would be sufficiently well motivated if doing so preserved principles
that focus on generative mechanisms. The causal theory of group selection does exactly
this, and at what we regard as a quite minimal cost in intuition.
There are other ways to enforce attention to demographic variables when they are
causally relevant. For example, Godfrey-Smith suggests treating them as relevant
environmental variables and explicitly representing them by contextualizing fitness on
them. There is nothing intrinsically wrong with doing so, but models employing
contextualized fitnesses are less useful, and in various ways more unreliable, than SEMS
and causal graphical models. For example, one can contextualize on any variable that is
associated with fitness, whether or not it is a cause of fitness, and contextualization does
Draft: Bruce Glymour and Chris French
28
not offer or require the use of any procedure for separating associations due to common
causes from those due direct or indirect causal influence. Contextualization therefore
invites errors in both specification and identification. This can be remedied by carefully
attending to the variables on which one contextualizes, but such care requires that one
first have specified and identified the causal system. That is, one can reliably
contextualize in non-misleading ways only if one has already produced a causal graphical
model and associated SEM by one or another reliable method. We prefer to avoid the
epicycle: the demands of theoretical parsimony favor the causal theory of group
selection over alternatives, given our predictive and explanatory aims.
It is not incumbent on everyone to share these explanatory and predictive aims.
Those who do not may well have good cause to reach a quite different reflective
equilibrium. After all, a phenomenological definition of group selection unifies a class of
statistical phenomena. So long those who employ such definitions respect the facts about
what causes what in their explanatory and predictive practices there is no reason to
object. If one is clear about the differences in purpose served by the different
conceptions of group selection, the dispute is merely terminological.
But the requisite clarity depends on recognizing certain facts, whatever
terminology one uses to express them. In some populations demographic variables cause
individual fitness. These variables are sometimes neighborhood variables. Such
populations can in fact be modeled at the group level. While neighborhood variables do
not partition the population into equivalence classes, and group variables may do so only
arbitrarily, such variables are in these respects no different from many variables in many
sciences, including other areas of biology. Special pleading for non-arbitrary equivalence
Draft: Bruce Glymour and Chris French
29
classes in models of selection is motivated by the idea that group selection requires
selection of groups, but the motivation is compelling only if one does not require a
generative conception of group selection. Those who endorse the causal theory of group
selection do so because they do require a generative conception of group selection in
order to pursue their explanatory and predictive aims. Their requirements are well met
by the causal theory of group selection, and are not met by any alternative presently
available.
Draft: Bruce Glymour and Chris French
30
References.
Brandon, Robert [2008]: “Natural Selection”, Stanford Encylopedia of Philosophy.
Chou, Y., R. Minnich and R. Chase [1993]: “Mapping Probability of Fire Occurrence in
San Jacinto Mountains, California, USA”, Environmental Management, 17,
pp.129-140.
Christakis, N. and James Fowler, [2007]: “The Spread of Obesity in a Large Social
Network over 32 Years”, New England Journal of Medicine, 357, pp. 370-379.
Damuth, J. and I. L. Heisler [1988]: “Alternative formulations of multilevel selection.”
Biology and Philosophy, 3, pp. 407-430.
Darwin, C. [1856]: The Origin of Species.
Dixon, P., M. Milicich, and G. Sugihara, [1999]: “Episodic Fluctuations in Larval
Supply”, Science, 283, pp. 1528-1530.
Freeman, Linton [1977]: “A Set of Measures of Centrality Based on Betweeness”,
Sociometry, 40, pp 35-41.
Godfrey-Smith, Peter [2008]: “Varieties of Population Structure and the Levels of
Selection”, British Journal for the Philosophy of Science, 59, pp. 25-50.
Goodnight, Charles, James Schwarts and Lori Stevens [1992]: “Contextual Analysis of
Models of Group Selection, Soft Selection, Hard Selection and the Evolution of
Altruism”, The American Naturalist, 140, pp. 743-761.
Guha, R., D. Dutta, P. Jurs, and T. Chen [2006]: “Local Lazy Regression: Making Use of
the Neighborhood to Improve QSAR Predictions”, Journal of Chemical
Information and Modeling, 46, pp. 1836-1847.
Draft: Bruce Glymour and Chris French
31
Hartshorn, R., E. Hey-Hawkins, R. Kalio, and G Leigh, [2006], “Representation of
Configuration in Coordination Polyhedra and the Extension of Current
Methodology to Coordination Numbers Greater than Six”, Pure Applied
Chemistry, 79, pp. 1779-1799.
Heisler, I.L. and J. Damuth [1987]: “A Method for analyzing selection in hierarchically
structured populations.” The American Naturalist, 130, pp. 582-602.
Lande, R. and Arnold S. [1983]: ‘The Measurement of Selection on Correlated
Characters’, Evolution, 37, pp. 1210-26.
Pacala, S. and J. Silander Jr., [1987]: “Neighborhood Interference among Velvet Leaf,
Abutilon theophrasti, and Pigweed, Amaranthus retroflexus”, Oikos, 48, pp. 217224.
Pearl, Judea. [2000]: Causality, Cambridge: Cambridge University Press.
Sober, Elliott [1984]: The Nature of Selection, Cambridge, MA: MIT Press.
Sober, Elliott and D. S. Wilson[1998]: Unto Others, Cambridge, MA: Harvard University
Press.
Soltzberg, L., O., Carneiro, G. Joseph, Z. Khan, T. Meretsky, M. Ng and S. Ofek, [1998]:
“Prediction of Early- and Late Growth Morphologies of Ionic Crystals”, Acta
Cryst., B54, pp. 384-390.
Shipley, William [2000]: Cause and Correlation in Biology. Cambridge, Cambridge
University Press.
Spirtes, Peter, Glymour, C. and Scheines, R. [2000]: Causation, Prediction and Search,
2nd Edition, Cambridge, MA: MIT Press.
Wallace, Bruce [1975]: “Hard and Soft Selection Revisted”, Evolution, 29, pp 465-473.
Draft: Bruce Glymour and Chris French
32
Wright, Sewall [1934]: “The Method of Path Coefficients”, Annals of Mathematical
Statistics, 5, pp. 161-215.
Appendix.
In this appendix we show that Model 7 and Model 8 predict identical pre-assignment
rates of reproductive success for types A and B individuals. Post-assignment rates of
success depend only on these rates and the sampling function by which offspring are
assigned to positions on the grid. This function is identical whether one uses model 7 or
model 8. Similarly, predictions of next-generation frequencies for contextualized
individual types, for group types and for contextualized group types depend, and depend
only, on this sampling function and the number of pre-assignment offspring in each type.
Hence, the predictive equivalence of Models 7 and 8 follows immediately from the fact
that they predict identical pre-assignment rates of reproductive success.
We have a population of 100 individuals; let the first n of these be A-type individuals
with the remaining 100-n individuals being B-type. We define WA and WB to be absolute
rates of reproductive success for types A and B respectively. Recollect that D(i) is a
neighborhood variable indicating the number of A-type individuals in the neighborhood
N(i) of the ith individual, and that T(i) is a variable indicating the type to which the ith
individual belongs, with T(i)=1 if i is type A and T(i)=2 if i is type B. An individual’s
pre-assignment reproductive success is given by W(i)=T(i)+2D(i). Then:
and
Draft: Bruce Glymour and Chris French
33
WB =
100
100
i = n +1
i = n +1
∑ 2 + 2D(i) = 2(100 − n) + 2 ∑ D(i)
Let g index groups, and let W’A(g) be the number of A-type offspring produced (preassignment) by group g; similarly, W’B(g) is the number of B-type offspring produced
(pre-assignment) by group g. The population is divided into 25 non-overlapping groups
of 4 individuals, with C(g) giving the number of A-type individuals in group g, and
DA’(g) giving the mean D(i) for A-type individuals in the group. Then
and
.
Let
and
.
We wish to show that W’A = WA and that W’B=WB. We now define indicator functions
IA(g,i) and IB(g,i) such that IA(g,i)=1 if i∈g and is type A and 0 otherwise, and IB(g,i)=1 if
i ∈ g and is type B and 0 otherwise. Note that the first two of the following lemmas are
immediate consequences of the definition of the indicator functions and a group size of 4;
the third requires also that individuals are indexed so that the first n individuals are type
A and the remaining individuals are type B.
Lemma1:
Lemma 2:
Draft: Bruce Glymour and Chris French
34
Lemma 3:
Then to show that W’A = WA we take:
Taking the sum in the first term and substituting
for
in the
second we have:
For W’B=WB we take:
Taking the sums in the first term and substituting
for
in the
second we have:
Draft: Bruce Glymour and Chris French
35
Q.E.D.
Draft: Bruce Glymour and Chris French
36