Download Simulation of Population Genetics Models with SAS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Dominance (genetics) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

Epistasis wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Designer baby wikipedia , lookup

Genetic drift wikipedia , lookup

Medical genetics wikipedia , lookup

The Selfish Gene wikipedia , lookup

Group selection wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
SIMULATION OF POPULATION GENETICS MODELS WITH SAS
Edward
L~
Spitznagel, Jr.
Washington University
1) "Infinite" population.
2) All involved loci lie on a single
chromosome.
3) One major-effect locus, positioned at
the end of a string of five minor loci, with
equal recombination distance between adjacent
loci.
4) All effects additive.
5) Truncation selection, with individuals
above threshold exhibiting decreased but
non-zero fitness.
ABSTRACT
One of the most important problems of
population genetics is the understanding of
multi-locus natural selection in the presence
of all the realistic conditione that make it
so complicated: truncation selection, linkage,
and small numbers of major loci superimposed
on a background of multi-locus minor effects.
SAS has proved to be an excellent vehicle for
exploratory simulations, with successive work
data sets representing the generations of an
evolving population. The saving of programmer
time is substantial t and the SAS program forms
an excellent outline for
the writing of
production programs in Fortran.
the various
Step one
ie to define
parameters of the model. This is conveniently
done by use of macros:
MACRO N 4000%
*POPULATION SIZE;
MACRO LA .95% *PROBABILITY OF NONCRossoifER BETWEEN MAJOR LOCUS AND
ADJACENT MINOR LOCUS;
MACRO LB .95% *PROBABILITY OF NONCROSSOV'ER BETWEEN ADJACENT MINOR
LOCI;
MACRO FA .05% *PROBABILITY OF
"OOMINANT" GENE AT MAJOR LOCUS;
MACRO PB .05% *PROBABILITY OF
"OOM:rnANT" GENE AT MINOR LOCUS;
MACRO HETA 1%
*DIFFERENTIAL EFFECT
OF HETEROZYGOTE AT MAJOR LOCUS;
MACRO HETE .,% *DIFFERENTIAL EFFECT
OF HETEROZYGOTE AT MINOR LOCUS;
MACRO HOMA 2% *DIFFERENTIAL EFFECT
OF DOMINANT HOMOZYGOTE AT MAJOR LOCUS,
MACRO HOMB .2% *DIFFERENTIAL EFFECT
OF DOMINANT HOMOZYGOTE AT MINOR LOCUS,
MACRO THRESa 2% *THRESHOLD VALUE;
MACRO AFF .8% *PROBABILITY THAT AN
INDIVIDUAL ABOVE THRESHOLD WILL NOT
REPRODUCE;
MACRO U UNIFORM(O)% *ABBREVIATION FOR
UNIFORM RANDOM NUMBER GENERATOR;
MACRO ENVIRON • 75*NORMAL{O); *NORMAL
ENVIRONMENTAL CONTRIBUTION TO
PHENOTYPE;
INTRODUCTION
Much of the early work in population
genetics has fooused
on models involving
single loci o~ small numbers of linked lo¢i
(usually two) [1]. However, there is mounting
evidence for high degrees of polymorphism
[1,2J, and it has
been established that
synergistic effects
"totally unpredictable
from. two-locus theory" [1J can occur in such
complex systems [2,3].
The situation is
furth~r
complicated by the faot that many
quantitative
characters
seem to
be
skewed,
multi-modal, or in other ways indicative of
one or a few major-effe~t genes superimposed
on a background of multi-locus minor effects
[4].
When
some form
of
discontinuous
selection
function
such
as
truncation
selection [5,6] is used, the problem becomes
so complex that no
exact solution seems
possible~
An alternative is computer simulation.
and SAS offers a number of advantages in the
simulation process.
One advantage is that
successive generations of a population can be
stored in a very natural way, as s~cceBaive
SAS data sets. Another advantage is that the
sort and merge capabilities can be used in a
variety of ways to model the mating process,
with or without assortment for phenotype.
Still another advantage is that procedures
like MEANS, SUMMARY, FREQ, and CHART can be
used to "snapshot" succeeding generations of
the evolving population.
Step two is to initialize the population,
prior to selection. The most convenient way
is to generate gametes. then "mate" them. The
code to generate the gametes is as follows:
DATA GAMEfi;
ARRAY B(I) Bl-B5;
DO
TO 2*N ;
A - (U <PA T;
DO I"-TO 5;
B = (U_<PB_);
N-'
METHODOLOGY
END;
OUTPUT;
The following simple model will
to illustrate the techniques:
be used
END;
KEEP A B'-B5;
605
Steps three,
five, seven, and so on are
to produce members of the
population subject to selection.
We also
compute the inherited character, which we will
call
PHENOTYF.
The
following
code
accomplishes these goals:
to
The last two blocks of code (steps 3 and
4) represent
one complete cycle
in the
selection process and hence must be repeated
for as many generations as we wish to study.
The easiest way to repeat them is to convert
them into a macro, then list the macro name
for aa many generations as desired.
mate gametes
DATA MATINGS;
MERGE
GAMETE(RENAME-(A-Q Bl-Rl B2-R2
B3-R3 B4-R4 B5-R5)
GAMETE(RENAME-(A=V Bl=Wl B2-W2
B3=W3 B4=W4 B5-W5) FIRSTOBS-2);
N+l; IF MOD(N.2)=I;
ARRAY R(I); ARRAY WeI);
PHENOTYF - RETA *(Q-=V)+HOMA *(Q&V);
DO I-I TO 5;
.
PHENOTYP - HETA_*(R--W)+HOMB_(R&W);
The techniques illustrated above can be
varied and generalized in a number of ways:
gene
extinction
in
s~all
populations,
assortative
mating,
monogamous
mating
behavior, etc.
While eventually one would
choose to do more extensive computations in a
language such as Fortran, the simplicity of
wri ting in SAS makes it posei.ble to explore
uncharted territory much more quickly and
easily.
END;
KEEP Q V Rl-R5 Wl-W5 PHENOTYF;
ACKNOWLEDGEMENTS
Steps four, six, eight, and so on are to
combine the inherited charaeter with a random
normal disturbance due to environment, and
then to determine whether the individual is
"fit" to reproduce. If the individual is
determined to be fit, we produce a number of
gametes from each such individual, allowing
for the possibility of recomb~nation, shuffle
them, and select 2 times N of them. At this
last stage, one full cycle--of the process is
completed. The code is as follows:
The author is indebted to Dr. T. Reich
for much information regarding the important
questions in population genetics, especially
in relation
to disease.
This
work was
perfo1"l'Q.ed wi th the support of USPHS Grant
MH-31302.
REFERENCES
[1] Lewontin, R.C. (1974) Population genetics.
Annual Review of Genetics 7, 1-17.
DATA GAMETE; SET MATINOS;
ARRAY B(I) Bl-B5;
ARRAY R(I) Rl-R5;
ARRAY wCI) WHI5;
TOTAL • PHENOTYF + ENVIRON ;
AFFECTED • LIABLE * (u (AFT);
IF NOT AFFECTED;
-DO GAMETES-I TO 3;
CHR = (U <.5);
A • Q*(ciiR=O) + V*(CHR=l);
CHR = MOD(CHR+(U >LA ).2);
Bl - Rl*(CHR=O)+ lIf"(OHR-l);
DO 1=2 TO 5;
CRR = MOD(CHR+(U >LB ).2);
B • R*(CHR=O) + r(CHR=1l;
[2] Lewontin, R.C. and
the gene the unit
707-734.
[3]
Slatkin.
M.
Franklin, I. (1970) Is
of selection? Genetics 65,
(1972)
On
treating
chromosome as the unit of selection.
67.157-168.
[4]
Karlin,
S.
and
Carmelli.
the
Genetics
D.
(1978)
Evolutionary aspects and sensitivity studies
of some major gene models. l.:.. Theor;o BioI.
75. 197-222.
[5] Kimura, M. and Crow, J.P. (1978) Effect of
overall phenotypic selection on genetic change
at individual loci. Proe. Natl.
Aoad. Sci.
~ 75. 6168-~71.
---- ----
END;
STIR "" U ;
OUTPUT; -
END;
KEEP STIR A Bl-B5;
ir
~
[6]
Crow, J.F..
and
Kimura, N.
Effioiency of truncation· selection.
~~~~ 76.396-399.
PROC SORT; BY STIR;
DATA
SET GAMETE;
Gfu~ETEj
N+1; IF N>2*N
DROP N;
THEN STOP;
-
SAS
procedures
to
"snapshot"
the
selection
process
can
be
inserted
at
appropriate points.
For example, we oould
insert the line
FROe
CHART; VBAR PHENOTYF;
between the last two blocks of oode to study
the changing distribution of phenotype under
selection pressure. Also, we could insert the
line
PROC FREQ; TABLES A Ql-Q5;
immediately after the last block of code to
observe
the
gene
frequencies
in
the
population.
606
( 1979)
Proc.