Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dominance (genetics) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Designer baby wikipedia , lookup
Genetic drift wikipedia , lookup
Medical genetics wikipedia , lookup
The Selfish Gene wikipedia , lookup
Group selection wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
SIMULATION OF POPULATION GENETICS MODELS WITH SAS Edward L~ Spitznagel, Jr. Washington University 1) "Infinite" population. 2) All involved loci lie on a single chromosome. 3) One major-effect locus, positioned at the end of a string of five minor loci, with equal recombination distance between adjacent loci. 4) All effects additive. 5) Truncation selection, with individuals above threshold exhibiting decreased but non-zero fitness. ABSTRACT One of the most important problems of population genetics is the understanding of multi-locus natural selection in the presence of all the realistic conditione that make it so complicated: truncation selection, linkage, and small numbers of major loci superimposed on a background of multi-locus minor effects. SAS has proved to be an excellent vehicle for exploratory simulations, with successive work data sets representing the generations of an evolving population. The saving of programmer time is substantial t and the SAS program forms an excellent outline for the writing of production programs in Fortran. the various Step one ie to define parameters of the model. This is conveniently done by use of macros: MACRO N 4000% *POPULATION SIZE; MACRO LA .95% *PROBABILITY OF NONCRossoifER BETWEEN MAJOR LOCUS AND ADJACENT MINOR LOCUS; MACRO LB .95% *PROBABILITY OF NONCROSSOV'ER BETWEEN ADJACENT MINOR LOCI; MACRO FA .05% *PROBABILITY OF "OOMINANT" GENE AT MAJOR LOCUS; MACRO PB .05% *PROBABILITY OF "OOM:rnANT" GENE AT MINOR LOCUS; MACRO HETA 1% *DIFFERENTIAL EFFECT OF HETEROZYGOTE AT MAJOR LOCUS; MACRO HETE .,% *DIFFERENTIAL EFFECT OF HETEROZYGOTE AT MINOR LOCUS; MACRO HOMA 2% *DIFFERENTIAL EFFECT OF DOMINANT HOMOZYGOTE AT MAJOR LOCUS, MACRO HOMB .2% *DIFFERENTIAL EFFECT OF DOMINANT HOMOZYGOTE AT MINOR LOCUS, MACRO THRESa 2% *THRESHOLD VALUE; MACRO AFF .8% *PROBABILITY THAT AN INDIVIDUAL ABOVE THRESHOLD WILL NOT REPRODUCE; MACRO U UNIFORM(O)% *ABBREVIATION FOR UNIFORM RANDOM NUMBER GENERATOR; MACRO ENVIRON • 75*NORMAL{O); *NORMAL ENVIRONMENTAL CONTRIBUTION TO PHENOTYPE; INTRODUCTION Much of the early work in population genetics has fooused on models involving single loci o~ small numbers of linked lo¢i (usually two) [1]. However, there is mounting evidence for high degrees of polymorphism [1,2J, and it has been established that synergistic effects "totally unpredictable from. two-locus theory" [1J can occur in such complex systems [2,3]. The situation is furth~r complicated by the faot that many quantitative characters seem to be skewed, multi-modal, or in other ways indicative of one or a few major-effe~t genes superimposed on a background of multi-locus minor effects [4]. When some form of discontinuous selection function such as truncation selection [5,6] is used, the problem becomes so complex that no exact solution seems possible~ An alternative is computer simulation. and SAS offers a number of advantages in the simulation process. One advantage is that successive generations of a population can be stored in a very natural way, as s~cceBaive SAS data sets. Another advantage is that the sort and merge capabilities can be used in a variety of ways to model the mating process, with or without assortment for phenotype. Still another advantage is that procedures like MEANS, SUMMARY, FREQ, and CHART can be used to "snapshot" succeeding generations of the evolving population. Step two is to initialize the population, prior to selection. The most convenient way is to generate gametes. then "mate" them. The code to generate the gametes is as follows: DATA GAMEfi; ARRAY B(I) Bl-B5; DO TO 2*N ; A - (U <PA T; DO I"-TO 5; B = (U_<PB_); N-' METHODOLOGY END; OUTPUT; The following simple model will to illustrate the techniques: be used END; KEEP A B'-B5; 605 Steps three, five, seven, and so on are to produce members of the population subject to selection. We also compute the inherited character, which we will call PHENOTYF. The following code accomplishes these goals: to The last two blocks of code (steps 3 and 4) represent one complete cycle in the selection process and hence must be repeated for as many generations as we wish to study. The easiest way to repeat them is to convert them into a macro, then list the macro name for aa many generations as desired. mate gametes DATA MATINGS; MERGE GAMETE(RENAME-(A-Q Bl-Rl B2-R2 B3-R3 B4-R4 B5-R5) GAMETE(RENAME-(A=V Bl=Wl B2-W2 B3=W3 B4=W4 B5-W5) FIRSTOBS-2); N+l; IF MOD(N.2)=I; ARRAY R(I); ARRAY WeI); PHENOTYF - RETA *(Q-=V)+HOMA *(Q&V); DO I-I TO 5; . PHENOTYP - HETA_*(R--W)+HOMB_(R&W); The techniques illustrated above can be varied and generalized in a number of ways: gene extinction in s~all populations, assortative mating, monogamous mating behavior, etc. While eventually one would choose to do more extensive computations in a language such as Fortran, the simplicity of wri ting in SAS makes it posei.ble to explore uncharted territory much more quickly and easily. END; KEEP Q V Rl-R5 Wl-W5 PHENOTYF; ACKNOWLEDGEMENTS Steps four, six, eight, and so on are to combine the inherited charaeter with a random normal disturbance due to environment, and then to determine whether the individual is "fit" to reproduce. If the individual is determined to be fit, we produce a number of gametes from each such individual, allowing for the possibility of recomb~nation, shuffle them, and select 2 times N of them. At this last stage, one full cycle--of the process is completed. The code is as follows: The author is indebted to Dr. T. Reich for much information regarding the important questions in population genetics, especially in relation to disease. This work was perfo1"l'Q.ed wi th the support of USPHS Grant MH-31302. REFERENCES [1] Lewontin, R.C. (1974) Population genetics. Annual Review of Genetics 7, 1-17. DATA GAMETE; SET MATINOS; ARRAY B(I) Bl-B5; ARRAY R(I) Rl-R5; ARRAY wCI) WHI5; TOTAL • PHENOTYF + ENVIRON ; AFFECTED • LIABLE * (u (AFT); IF NOT AFFECTED; -DO GAMETES-I TO 3; CHR = (U <.5); A • Q*(ciiR=O) + V*(CHR=l); CHR = MOD(CHR+(U >LA ).2); Bl - Rl*(CHR=O)+ lIf"(OHR-l); DO 1=2 TO 5; CRR = MOD(CHR+(U >LB ).2); B • R*(CHR=O) + r(CHR=1l; [2] Lewontin, R.C. and the gene the unit 707-734. [3] Slatkin. M. Franklin, I. (1970) Is of selection? Genetics 65, (1972) On treating chromosome as the unit of selection. 67.157-168. [4] Karlin, S. and Carmelli. the Genetics D. (1978) Evolutionary aspects and sensitivity studies of some major gene models. l.:.. Theor;o BioI. 75. 197-222. [5] Kimura, M. and Crow, J.P. (1978) Effect of overall phenotypic selection on genetic change at individual loci. Proe. Natl. Aoad. Sci. ~ 75. 6168-~71. ---- ---- END; STIR "" U ; OUTPUT; - END; KEEP STIR A Bl-B5; ir ~ [6] Crow, J.F.. and Kimura, N. Effioiency of truncation· selection. ~~~~ 76.396-399. PROC SORT; BY STIR; DATA SET GAMETE; Gfu~ETEj N+1; IF N>2*N DROP N; THEN STOP; - SAS procedures to "snapshot" the selection process can be inserted at appropriate points. For example, we oould insert the line FROe CHART; VBAR PHENOTYF; between the last two blocks of oode to study the changing distribution of phenotype under selection pressure. Also, we could insert the line PROC FREQ; TABLES A Ql-Q5; immediately after the last block of code to observe the gene frequencies in the population. 606 ( 1979) Proc.