Download SUPPLEMENTAL METHODS AND RESULTS METHODS Sampling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Tag SNP wikipedia , lookup

Human genetic variation wikipedia , lookup

Microevolution wikipedia , lookup

Population genetics wikipedia , lookup

Transcript
SUPPLEMENTAL METHODS AND RESULTS
METHODS
Sampling and DNA extraction
During two previous studies [1,2] we sampled individual white-footed mice between
2009 and 2013 from 23 separate localities that were used to generate the genomic data used in
this study. Sites were chosen to represent a rural to urban gradient (Fig. 1). Rural sites were
defined as large tracts of relatively undisturbed natural habitat, and urban sites were fragmented
habitat surrounded by urban infrastructure and impervious surface. Urbanization was also
quantified by determining the percent impervious surface and human population size in a 2 km
buffer surrounding each park (See Table 1 and Figure 1 in [2]). For all sampling locations, we
trapped individuals over a period of 1-3 nights each. At each site, we set between one and four
7x7 m transects of Sherman live traps (7.62 cm x 7.62 cm x 22.86 cm), depending on the total
area of each sampling site. We weighed, sexed, and took morphological (ear length, tail length,
hind foot length, total body length) measurements for all individual mice. At all sites except
Central Park, Flushing Meadow, New York Botanical Garden, Brook Haven Park & Wild Wood
Park, High Point Park, and Clarence Fahnestock Park, we collected tissue by taking 1 cm tail
clips, placing in 80% ethanol, and storing at -20º C in the laboratory. Individual mice were then
released. For these other six sites, we used previously-collected liver samples stored in
RNAlater (Ambion Inc., Austin, TX) at -80º C. We extracted genomic DNA using standard
extraction protocols, quantified the yield, and checked quality before genomic sequencing library
preparation. See methods in (Munshi-South et al. 2016) for full details. All animal handling
procedures were approved by the Institutional Animal Care and Use Committee at Brooklyn
College, CUNY (Protocol Nos. 247 and 266) and by Fordham University’s Institutional Animal
Care and Use Committee (Protocol No. JMS-13-03).
RAD sequencing and SNP calling
We initially sequenced 233 individual white-footed mice but retained 191 P. leucopus
individuals from 23 sampling sites for the genome-wide SNP dataset after filtering out close
relatives and low-quality samples [2]. Briefly, we followed standard protocols for ddRADseq
presented in Peterson et al. (2012), starting with DNA extraction using Qiagen DNEasy kits with
an RNAse treatment. Next we used a combination of the enzymes, SphI-HF and MluCI to
generate similarly sized DNA fragments. Using two restriction enzymes increases the
probability of generating the same RAD loci across all samples. We specifically chose SphI-HF
and MluCI to generate ~50,000 RAD loci; this estimate was based on the number of fragments
produced using the same REs on a related species (Mus musculus). We cleaned the digested
DNA with Ampure XP beads and then ligated unique barcodes to each individual sample. We
then pooled samples in groups of 48 and used a Pippin Prep for precise DNA fragment size
excision from gels and Phusion high-fidelity PCR to amplify fragments and add Illumina indexes
and sequencing primers. The resulting fragments were sent to the NYU Center for Genomics
and Systems Biology for 2x100 bp paired-end sequencing in three lanes of an Illumina HiSeq
2000. We checked initial quality of the raw reads using FastQC. Subsequent primer removal,
low-quality nucleotide trimming, and de novo SNP calling was conducted using the Stacks 1.21
pipeline [4]. We called and filtered SNPs in Stacks using default setting except for requiring that
loci occur in 22 / 23 sampling sites, and within each site, occur in at least 50% of individuals.
We chose a random SNP from each RAD tag to avoid linkage between loci. Additionally, we
removed individuals if they had too few reads resulting in extremely small SNP datasets or if
they showed high levels of relatedness to other white-footed mice sampled. These filters
resulted in 14,990 SNPs in the final dataset that we used for demographic modeling.
Population structure and migration
We investigated observed patterns of genetic diversity to define evolutionary clusters that
could be used to inform demographic modeling of P. leucopus populations in the NYC region.
We examined population structure and evidence of migration among all 23 sampling sites. The
program TreeMix [5] was used to build population trees and identify migration events. TreeMix
infers populations splitting and mixing using allele frequencies from large genomic datasets.
Using a composite likelihood approach given allele frequency data, TreeMix returns the most
likely population tree and admixture events given a user-specified number of admixture events.
The number of admixture events tested ranged from 0 - 12 while the rest of the parameters used
default settings. P-values were generated for each admixture event and comparisons made
between all trees. We confirmed admixture between populations by running f3 three-population
analyses in Treemix. These statistics assess admixture between populations by identifying
correlations between allele frequencies that do not fit the evolutionary history for that group of
three populations. We used 500 bootstrap replicates to assess significance of f3 statistics.
We also used sNMF version 0.5 [6] to examine population structure sNMF explores
patterns of genetic structure by assigning individual ancestry coefficients using sparse nonnegative matrix factorization. sNMF does not make any model assumptions like requiring
populations to be in Hardy-Weinberg and linkage equilibrium [6], as opposed to other likelihood
models like STRUCTURE [7]. For the number of putative ancestral populations tested, we
chose a range from K = 1 to K = 11 using default parameters, with 10 replicate runs for each
value of K. We chose 11 as an initial maximum because there were at least nine urban parks
without any vegetated corridors between them (parks in Queens, NY have a small greenway
connecting them) plus rural LI and rural Mainland. There was not a pattern supporting a higher
number of clusters so we did not analyze K > 11. We ran sNMF on the full 14,990 SNP dataset
(≤ 50% of SNPs missing per population) and on a more conservative dataset with only ≤ 15% of
SNPs missing per population. sNMF imputes missing genotypes by resampling from the
empirical frequency at each locus [6], and using fewer missing data ensured any inferred
population structure was not due to incorrectly imputed genotypes (Fig. S1). To infer the most
likely number of ancestral populations, each model run generates a cross-entropy estimation
based on ancestry assignment error when using masked genotypes. The model with the smallest
cross-entropy score implies it is the best prediction of the true number of K ancestral populations
[6].
Demographic inference from genome-wide site frequency spectra
To reduce model complexity for demographic inference, we grouped individuals into the
minimum number of populations representing unique evolutionary clusters. Global analyses in
TreeMix and sNMF showed the highest support for two populations split by the East River, and
hierarchical analyses using discriminate analysis of principal components [2] showed support for
isolated urban populations. Collectively, results suggested a minimum of seven putative
populations captured most of the genetic variation between populations (Mainland & Manhattan:
MM, Long Island: LI, Central Park: CP, Van Cortlandt Park: VC, Inwood Hill Park: IP, Jamaica
Bay: JB, Fort Tilden: FT, Fig. 1). Along with hierarchical population structure results, we chose
several of the urban populations to include in demographic modeling based on the size of the
park, the relative isolation of the park due to urbanization, and the population density of whitefooted mice in the park. We generated the multi-population site frequency spectrum (MSFS) for
subsets of populations to test specific demographic history scenarios. We used the
dadi.Spectrum.from_data_dict command implemented in dadi [8] to generate the MSFS. When
we created the SNP dataset, we required a SNP to occur in ≥ 50% of individuals from each
population, so the MSFS was down-projected to 50% to ensure the same number of individuals
for all loci [8]. Once the MSFSs were generated, we used the software program fastsimcoal2 ver
2.5.1 [9] for demographic inference. Fastsimcoal2 (fsc2) uses a composite multinomial
likelihood approach to infer demographic histories from the site frequency spectrum generated
from genomic scale SNP datasets. The expected SFS under user defined demographic scenarios
is obtained using coalescent simulations. Fastsimcoal2 ver. 2.5.1 contained a bug that
miscalculated Max Observed Likelihood values if the SFS contained non-integers, leading to
Maximum Estimated Likelihoods that are higher than Max Observed Likelihoods. Our data did
not display this symptom and when point estimates were re-run using the latest version (fsc25),
there was no impact on the values used here and on the final conclusions presented in the main
text.
We tested demographic histories under a scenario of population isolation with migration
(IM). We compared inferred parameters between six hierarchical IM models (Fig. S3) using the
same dataset. There was one two-population IM model (seven free parameters) to test older
divergence patterns between MM and LI suggested from the geologic record. The remaining
five models were three-population IM models (15 free parameters each) testing for recent urban
population divergence. We chose to run separate models investigating each urban population
separately in order to avoid inconsistencies from over-parameterization. For these remaining
models we considered an ancestral population that split at time Tdiv1 and then an urban population
that split more recently at time Tdiv2. For Tdiv1 we included a range of divergence times based on
the LGM of the Wisconsin glacier, ~18,000 ybp. For Tdiv2 we considered divergence times
incorporating the timeframe of urbanization in NYC, ~400 ybp. We allowed for migration
between all populations, and tested occurrences of population bottlenecks when urban isolation
was incorporated into the model. During likelihood calculation, a conditional maximization
algorithm (ECM) is used to maximize the likelihood of each parameter while keeping the others
stabilized. This ECM procedure runs through 40 cycles where each composite-likelihood was
calculated using 100,000 coalescent simulations. While increasing the number of simulations
can increase precision, accuracy does not significantly increase past 100,000 simulations [9].
Additionally, in order to avoid likelihood estimates that oversample parameter values at local
maxima across the composite likelihood surface, we ran 50 replicates with each starting from
different initial conditions. We chose the replicate with the highest estimated maximum
likelihood score for each model. Using parametric bootstrapping, we generated confidence
intervals for the most likely inferred demographic parameters generated. The SFS was simulated
with the parameter values from the highest likelihood model and then new parameter values reestimated from the simulated SFS. We ran 100 parametric bootstraps. To find consistent signals
of divergence that could be attributed to urbanization, we compared parameter values and
overlapping confidence intervals between models.
DEMOGRAPHIC INFERENCE
Parameters were allowed to vary in demographic modeling using fastsimcoal2, but all six
models converged on similar parameter values estimated from the observed MSFS. Parameter
estimates with the highest likelihood generally fell within the upper and lower bounds generated
from parametric bootstrapping (Fig. S2, Table 1). The first two-population model tested
divergence time, effective population size, and migration rates between MM and LI populations
(LI_MM, Fig. S3). The divergence time for the MM and LI split was inferred to be 13,599 ybp
and the effective density of white-footed mice in MM (NE / size, 0.069 mice/ha) 23x larger than
LI (NE / size, 0.003 mice/ha) (Table 1). Divergence times are based on a generation time of 0.5
years for Peromyscus leucopus. Migration was also inferred to be low (< 1 individual per
generation) between MM and LI (Table 1).
The inferred demography for the more complex three-population models generally
supported results from the two-population model. The first two complex models both estimated
the divergence between MM and LI, but one model tested for divergence of JB and LI after the
MM and LI split (LI_JB_MM, Fig. S3) while the other model tested divergence between FT and
LI after the MM and LI split (LI_FT_MM, Fig. S3). This model also tested the likelihood of a
bottleneck event when FT and JB, both urban populations, diverged. We set up the other three
complex models in an identical fashion, except we tested the urban populations of CP
(LI_MM_CP, Fig. S3), VC (LI_MM_VC, Fig. S3), or IP (LI_MM_IP, Fig. S3) for divergence
from MM after the MM and LI split. Point estimates for demographic parameters converged on
similar values and generally fell within the 95% confidence limits from parametric bootstrapping
(Fig. S2, Table 1). The average divergence time for MM and LI was 14,679 ybp, SD = 956.19.
Similar to the two-population model, the MM NE was larger than the LI NE. While the effective
population sizes for the urban parks were smaller than NE for MM or LI, the urban populations
contained much higher effective population densities (CP: 28.7 mice / ha, IP: 79.21 mice / ha,
JB: 23.8 mice / ha, FT: 24.3 mice / ha, VC: 0.36 mice / ha) than either MM (0.069) or LI (0.003).
The individual urban populations all had NE values 10x smaller than MM, but often similar to LI.
The divergence time for the five tested urban populations, even with variation in number of
generations per year, was consistent with the timeframe of urbanization (mean divergence = 233
ybp; SD = 164.5). Our demographic models proved to be rather robust in returning reasonable
parameter values with consistent convergence to similar values across replicates. Although wide
confidence intervals on many parameters suggest low resolution in inferring parameter values
given the model and data, they are likely a consequence of the complexity of the model given the
number of parameters and wide parameter ranges. The narrow confidence intervals on other
parameters suggest that these inferences reliably capture important aspects of the true
demographic history of white-footed mice in NYC, especially given the often biologically
unrealistic parameter search space (See est files). One limitation in using Radseq data for
demographic analysis is the effect that allelic dropout has on genetic variation. Mutations can
accumulate in RE cut sites causing the loss of loci with potentially higher mutation rates [10].
Thus, ddRADSeq may underestimate genetic diversity [11] and inflate sequence divergence
[12,13]. We addressed this limitation by setting a minimum coverage limit and minimizing
missing data that might be caused by allelic dropout [11]. We also used multiple other methods
including discriminant analysis of principle components, sparse non-negative matrix
factorization, classical population genetics statistics, and composite likelihood for population tree
inference, to confirm our demographic analysis findings and inform our demographic modeling.
REFERENCES
1.
Harris, S. E., O’Neill, R. J. & Munshi-South, J. 2015 Transcriptome resources for the
white-footed mouse (Peromyscus leucopus): new genomic tools for investigating
ecologically divergent urban and rural populations. Mol. Ecol. Resour. 15, 382–394.
(doi:10.1111/1755-0998.12301)
2.
Munshi-South, J., Zolnik, C. P. & Harris, S. E. 2016 Population genomics of the
Anthropocene: urbani zation is negatively associated with genome-wide variation in white
-footed mouse populations. Evol. Appl. , doi:10.1111/eva.12357. (doi:10.1111/eva.12357)
3.
Peterson, B. K., Weber, J. N., Kay, E. H., Fisher, H. S. & Hoekstra, H. E. 2012 Double
Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in
Model and Non-Model Species. PLoS One 7, e37135. (doi:10.1371/journal.pone.0037135)
4.
Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. 2013 Stacks: an
analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140.
(doi:10.1111/mec.12354)
5.
Pickrell, J. & Pritchard, J. 2012 Inference of population splits and mixtures from genomewide allele frequency data. PLoS Genet. 8, e1002967. (doi:10.1371/journal.pgen.1002967)
6.
Frichot, E., Mathieu, F., Trouillon, T., Bouchard, G. & François, O. 2014 Fast and
Efficient Estimation of Individual Ancestry Coefficients. Genetics 4, 973–983.
(doi:10.1534/genetics.113.160572)
7.
Pritchard, J. K., Stephens, M. & Donnelly, P. 2000 Inference of population structure using
multilocus genotype data. Genetics 155, 945–959.
8.
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. 2009
Inferring the joint demographic history of multiple populations from multidimensional
SNP frequency data. PLoS Genet. 5, e1000695. (doi:10.1371/journal.pgen.1000695)
9.
Excoffier, L., Dupanloup, I., Huerta-Sanchez, E., Sousa, V. C. & Foll, M. 2013 Robust
Demographic Inference from Genomic and SNP Data. PLoS Genet. 9, e1003905.
(doi:10.1371/journal.pgen.1003905)
10.
Gautier, M., Gharbi, K., Cezard, T., Foucaud, J., Kerdelhué, C., Pudlo, P., Cornuet, J. M.
& Estoup, A. 2013 The effect of RAD allele dropout on the estimation of genetic variation
within and between populations. Mol. Ecol. 22, 3165–3178. (doi:10.1111/mec.12089)
11.
Fraser, B. a., Kunstner, A., Reznick, D. N., Dreyer, C. & Weigel, D. 2015 Population
genomics of natural and experimental populations of guppies (Poecilia reticulata). Mol.
Ecol. 24, 389–408. (doi:10.1111/mec.13022)
12.
Arnold, B., Corbett-Detig, R. B., Hartl, D. & Bomblies, K. 2013 RADseq underestimates
diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol.
Ecol. , n/a–n/a. (doi:10.1111/mec.12276)
13.
Leach, A. D., Chavez, A. S., Jones, L. N., Grummer, J. a., Gottscho, A. D. & Linkem, C.
W. 2015 Phylogenomics of phrynosomatid lizards: Conflicting signals from sequence
capture versus restriction site associated DNA sequencing. Genome Biol. Evol. 7, 706–
719. (doi:10.1093/gbe/evv026)
TPL file and EST file (input files for fastsimcoal2) showing setup for demographic
modeling and parameter size ranges.
#Figure S3
#SFS2LINLI.est
// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name #dist.#min #max
//all N are in number of haploid individuals
1 ANCSIZE unif 100 100000 output
1 NPOP1
unif 100 100000 output
1 NPOP2 unif 100 100000 output
0 N1M1
logunif 1e-2 20
hide
0 N2M2
logunif 1e-2 20
hide
1 TDIV
unif 100 40000 output
[RULES]
[COMPLEX PARAMETERS]
0
0
0
0
0
0
2NM1 = 2*N1M1 hide
2NM2 = 2*N2M2 hide
MANGO = NPOP1+NPOP2
hide
RESIZE = ANCSIZE/MANGO output
MIG1 = 2NM1/NPOP1 output
MIG2 = 2NM2/NPOP2 output
#SFS2LINLI.tpl
//Number of population samples (demes)
2
//Population effective sizes (number of genes)
NPOP1
NPOP2
//Samples sizes and samples age
67
126
//Growth rates: negative growth implies population expansion
0
0
//Number of migration matrices : 0 implies no migration between demes
2
//Migration matrix 0
0 MIG1
MIG2 0
//Migration matrix 1
00
00
//historical event: time, source, sink, migrants, new deme size, growth rate, migr mat index 1
historical event
1 historical events
TDIV 0 1 1 RESIZE 0 1
//Number of independent loci [chromosome]
10
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per gen recomb and mut rates
FREQ 1 0 2.0e-8
#SFS3LIFTNLI.est
// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name #dist.#min #max
//all N are in number of haploid individuals
1
ANCSIZE
unif 100
100000
1
NPOP1
unif 20
100000
1
NPOP2
unif 20
100000
1
NPOP3
unif 20
100000
0 N1M1
logunif 0.01 20
hide
0 N2M2
logunif 0.01 20
hide
0 N3M3
logunif 0.01 20
hide
0 N4M4
logunif 0.01 20
hide
0 N5M5
logunif 0.01 20
hide
0 N6M6
logunif 0.01 20
hide
1 TISLAND1
unif 4 1000 output
1 TISLAND2
unif 4 1000 output
1 TDIV
unif 4 30000 output bounded
0
RESIZE1
logunif 1e-10 1
output
[RULES]
TISLAND2 > TISLAND1
TISLAND2 < TDIV
NPOP3 > NPOP2
NPOP3 > NPOP1
NPOP2 > NPOP1
output
output
output
output
[COMPLEX PARAMETERS]
0
0
0
0
0
0
0
0
0
0
0
0
0
2NM1 = 2*N1M1 hide
2NM2 = 2*N2M2 hide
2NM3 = 2*N3M3 hide
2NM4 = 2*N4M4 hide
2NM5 = 2*N5M5 hide
2NM6 = 2*N6M6 hide
MIG1 = 2NM1/NPOP1 output
MIG2 = 2NM2/NPOP1 output
MIG3 = 2NM3/NPOP2 output
MIG4 = 2NM4/NPOP2 output
MIG5 = 2NM5/NPOP3 output
MIG6 = 2NM6/NPOP3 output
RESIZE2 = ANCSIZE/NPOP3 output
#SFS3LIFTNLI.tpl
//Number of population samples (demes)
3
//Population effective sizes (number of genes)
NPOP1
NPOP2
NPOP3
//Samples sizes and samples age
4
62
125
//Growth rates: negative growth implies population expansion
0
0
0
//Number of migration matrices : 0 implies no migration between demes
3
//Migration matrix 0
0 MIG1 MIG2
MIG3 0 MIG4
MIG5 MIG6 0
//Migration matrix 1
000
0 0 MIG4
0 MIG6 0
//Migration matrix 2
000
000
000
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix
index
3 historical events
TISLAND1 0 0 0 RESIZE1 0 0
TISLAND2 0 1 1 1 0 1
TDIV 2 1 1 RESIZE2 0 2
//Number of independent loci [chromosome]
10
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per gen recomb and mut rates
FREQ 1 0 2.0e-8
#SFS3LIJBNLI.est
// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name #dist.#min #max
//all N are in number of haploid individuals
1
ANCSIZE
unif 100
100000
1
NPOP1
unif 20
100000
1
NPOP2
unif 20
100000
1
NPOP3
unif 20
100000
0 N1M1
logunif 0.01 20
hide
0 N2M2
logunif 0.01 20
hide
0 N3M3
logunif 0.01 20
hide
0 N4M4
logunif 0.01 20
hide
0 N5M5
logunif 0.01 20
hide
0 N6M6
logunif 0.01 20
hide
1 TISLAND1
unif 4 1000 output
1 TISLAND2
unif 4 1000 output
1 TDIV
unif 4 30000 output bounded
0
RESIZE1
logunif 1e-10 1
output
[RULES]
TISLAND2 > TISLAND1
TISLAND2 < TDIV
NPOP3 > NPOP2
NPOP3 > NPOP1
NPOP2 > NPOP1
output
output
output
output
[COMPLEX PARAMETERS]
0
0
0
0
0
0
0
0
0
0
0
0
0
2NM1 = 2*N1M1 hide
2NM2 = 2*N2M2 hide
2NM3 = 2*N3M3 hide
2NM4 = 2*N4M4 hide
2NM5 = 2*N5M5 hide
2NM6 = 2*N6M6 hide
MIG1 = 2NM1/NPOP1 output
MIG2 = 2NM2/NPOP1 output
MIG3 = 2NM3/NPOP2 output
MIG4 = 2NM4/NPOP2 output
MIG5 = 2NM5/NPOP3 output
MIG6 = 2NM6/NPOP3 output
RESIZE2 = ANCSIZE/NPOP3 output
#SFS3LIJBNLI.tpl
//Number of population samples (demes)
3
//Population effective sizes (number of genes)
NPOP1
NPOP2
NPOP3
//Samples sizes and samples age
5
61
125
//Growth rates: negative growth implies population expansion
0
0
0
//Number of migration matrices : 0 implies no migration between demes
3
//Migration matrix 0
0 MIG1 MIG2
MIG3 0 MIG4
MIG5 MIG6 0
//Migration matrix 1
000
0 0 MIG4
0 MIG6 0
//Migration matrix 2
000
000
000
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix
index
3 historical events
TISLAND1 0 0 0 RESIZE1 0 0
TISLAND2 0 1 1 1 0 1
TDIV 2 1 1 RESIZE2 0 2
//Number of independent loci [chromosome]
10
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per gen recomb and mut rates
FREQ 1 0 2.0e-8
#SFS3LINLICP.est
// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name #dist.#min #max
//all N are in number of haploid individuals
1
ANCSIZE
unif 100
100000
1
NPOP1
unif 20
100000
1
NPOP2
unif 20
100000
1
NPOP3
unif 20
100000
0 N1M1
logunif 0.01 20
hide
0 N2M2
logunif 0.01 20
hide
0 N3M3
logunif 0.01 20
hide
0 N4M4
logunif 0.01 20
hide
0 N5M5
logunif 0.01 20
hide
0 N6M6
logunif 0.01 20
hide
1 TISLAND1
unif 4 1000 output
1 TISLAND2
unif 4 1000 output
1 TDIV
unif 4 30000 output bounded
0
RESIZE1
logunif 1e-10 1
output
[RULES]
TISLAND2 > TISLAND1
TISLAND2 < TDIV
NPOP3 > NPOP2
NPOP3 > NPOP1
NPOP2 > NPOP1
output
output
output
output
[COMPLEX PARAMETERS]
0
0
0
0
0
0
0
0
0
0
0
0
0
2NM1 = 2*N1M1 hide
2NM2 = 2*N2M2 hide
2NM3 = 2*N3M3 hide
2NM4 = 2*N4M4 hide
2NM5 = 2*N5M5 hide
2NM6 = 2*N6M6 hide
MIG1 = 2NM1/NPOP1 output
MIG2 = 2NM2/NPOP1 output
MIG3 = 2NM3/NPOP2 output
MIG4 = 2NM4/NPOP2 output
MIG5 = 2NM5/NPOP3 output
MIG6 = 2NM6/NPOP3 output
RESIZE2 = ANCSIZE/NPOP3 output
#SFS3LINLICP.tpl
//Number of population samples (demes)
3
//Population effective sizes (number of genes)
NPOP1
NPOP2
NPOP3
//Samples sizes and samples age
9
66
116
//Growth rates: negative growth implies population expansion
0
0
0
//Number of migration matrices : 0 implies no migration between demes
3
//Migration matrix 0
0 MIG1 MIG2
MIG3 0 MIG4
MIG5 MIG6 0
//Migration matrix 1
000
0 0 MIG4
0 MIG6 0
//Migration matrix 2
000
000
000
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix
index
3 historical events
TISLAND1 0 0 0 RESIZE1 0 0
TISLAND2 0 2 1 1 0 1
TDIV 2 1 1 RESIZE2 0 2
//Number of independent loci [chromosome]
10
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per gen recomb and mut rates
FREQ 1 0 2.0e-8
#SFS3LINLIIP.est
// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name #dist.#min #max
//all N are in number of haploid individuals
1
ANCSIZE
unif 100
100000
1
NPOP1
unif 20
100000
1
NPOP2
unif 20
100000
1
NPOP3
unif 20
100000
0 N1M1
logunif 0.01 20
hide
0 N2M2
logunif 0.01 20
hide
0 N3M3
logunif 0.01 20
hide
0 N4M4
logunif 0.01 20
hide
0 N5M5
logunif 0.01 20
hide
0 N6M6
logunif 0.01 20
hide
1 TISLAND1
unif 4 1000 output
1 TISLAND2
unif 4 1000 output
1 TDIV
unif 4 30000 output bounded
0
RESIZE1
logunif 1e-10 1
output
[RULES]
TISLAND2 > TISLAND1
TISLAND2 < TDIV
NPOP3 > NPOP2
NPOP3 > NPOP1
NPOP2 > NPOP1
output
output
output
output
[COMPLEX PARAMETERS]
0
0
0
0
0
0
0
0
0
0
0
0
0
2NM1 = 2*N1M1 hide
2NM2 = 2*N2M2 hide
2NM3 = 2*N3M3 hide
2NM4 = 2*N4M4 hide
2NM5 = 2*N5M5 hide
2NM6 = 2*N6M6 hide
MIG1 = 2NM1/NPOP1 output
MIG2 = 2NM2/NPOP1 output
MIG3 = 2NM3/NPOP2 output
MIG4 = 2NM4/NPOP2 output
MIG5 = 2NM5/NPOP3 output
MIG6 = 2NM6/NPOP3 output
RESIZE2 = ANCSIZE/NPOP3 output
#SFS3LINLIIP.tpl
//Number of population samples (demes)
3
//Population effective sizes (number of genes)
NPOP1
NPOP2
NPOP3
//Samples sizes and samples age
6
66
119
//Growth rates: negative growth implies population expansion
0
0
0
//Number of migration matrices : 0 implies no migration between demes
3
//Migration matrix 0
0 MIG1 MIG2
MIG3 0 MIG4
MIG5 MIG6 0
//Migration matrix 1
000
0 0 MIG4
0 MIG6 0
//Migration matrix 2
000
000
000
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix
index
3 historical events
TISLAND1 0 0 0 RESIZE1 0 0
TISLAND2 0 2 1 1 0 1
TDIV 2 1 1 RESIZE2 0 2
//Number of independent loci [chromosome]
10
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per gen recomb and mut rates
FREQ 1 0 2.0e-8
#SFS3LINLIVC.est
// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name #dist.#min #max
//all N are in number of haploid individuals
1
ANCSIZE
unif 100
100000
1
NPOP1
unif 20
100000
1
NPOP2
unif 20
100000
1
NPOP3
unif 20
100000
0 N1M1
logunif 0.01 20
hide
0 N2M2
logunif 0.01 20
hide
0 N3M3
logunif 0.01 20
hide
0 N4M4
logunif 0.01 20
hide
0 N5M5
logunif 0.01 20
hide
0 N6M6
logunif 0.01 20
hide
1 TISLAND1
unif 4 1000 output
1 TISLAND2
unif 4 1000 output
1 TDIV
unif 4 30000 output bounded
0
RESIZE1
logunif 1e-10 1
output
[RULES]
TISLAND2 > TISLAND1
TISLAND2 < TDIV
NPOP3 > NPOP2
NPOP3 > NPOP1
NPOP2 > NPOP1
output
output
output
output
[COMPLEX PARAMETERS]
0
0
0
0
0
0
0
0
0
0
0
0
0
2NM1 = 2*N1M1 hide
2NM2 = 2*N2M2 hide
2NM3 = 2*N3M3 hide
2NM4 = 2*N4M4 hide
2NM5 = 2*N5M5 hide
2NM6 = 2*N6M6 hide
MIG1 = 2NM1/NPOP1 output
MIG2 = 2NM2/NPOP1 output
MIG3 = 2NM3/NPOP2 output
MIG4 = 2NM4/NPOP2 output
MIG5 = 2NM5/NPOP3 output
MIG6 = 2NM6/NPOP3 output
RESIZE2 = ANCSIZE/NPOP3 output
#SFS3LINLIVC.tpl
//Number of population samples (demes)
3
//Population effective sizes (number of genes)
NPOP1
NPOP2
NPOP3
//Samples sizes and samples age
6
66
119
//Growth rates: negative growth implies population expansion
0
0
0
//Number of migration matrices : 0 implies no migration between demes
3
//Migration matrix 0
0 MIG1 MIG2
MIG3 0 MIG4
MIG5 MIG6 0
//Migration matrix 1
000
0 0 MIG4
0 MIG6 0
//Migration matrix 2
000
000
000
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix
index
3 historical events
TISLAND1 0 0 0 RESIZE1 0 0
TISLAND2 0 2 1 1 0 1
TDIV 2 1 1 RESIZE2 0 2
//Number of independent loci [chromosome]
10
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per gen recomb and mut rates
FREQ 1 0 2.0e-8
Table S1. Sampling localities and full name descriptions
Sampling location code
Full name
AP
Alley Pond & Cunningham
WW
Wildwood N. Shore Long Island
& Brookhaven State Park
FT
Fort Tilden
FM
Flushing Meadows
FP
Forest Park
KP
Kissena Park
RR
Ridgewood Reservoir
NYBG
NY Botanical Garden
VC
Van Cortlandt Park
IP
Inwood Hill Park
SW
Saxon Woods
LCC
Louis Calder Center
MRG
Mianus River Gorge
CFP
Clarence Fahnestock Park
CIE
Cary Institute of Ecosystem
Studies
CPV
Carter Road, Pleasant Valley,
NY
MH
Cabin, Cornwall CT
HIP
High Point State Park
HP
Harriman State Park
MR
Minnewaska SP Reserve
CP
Central Park
JB
Jamaica Bay
PB
Pelham Bay Park
Table S2. Inferred demographic parameters with 95% confidence values from parametric bootstrapping for remaining three
fastsimcoal2 models not displayed in main manuscript.
LI_MM_VC
Parameters
Ancestral Ne
Site(s)
(X)
-
Long island Ne
(LI)
Mainland &
Manhattan Ne
(MM)
Local park Ne
Van,Cortlandt
Time of divergence
(LI_MM)
(LI_MM)
Time of divergence
(Urban_LI or MM)
(VC_MM)
Ancestral resize
factor
(LI_MM)
Urban pop resize
factor
(VC)
Time of urban pop
resize
(VC)
Mig_(X)_to_MM
(LI)
Mig_(X)_to_(LI)
(MM)
LI_MM_IP
(Point Estimate)
(95 % CI)
90482
(71280,312469)
8991
(8180,74287)
16416
(13984,93388)
156
(27,36482)
29669
(17400,29663)
373
(459,8586)
5.5
(2.9,7.0)
0.6
,9
(3.8x10 ,0.66)
182
(251,3071)
2.4x10,6
,7
,5
(7.0x10 ,1.2x10 )
Site(s)
(X)
(LI)
(MM)
Inwood
Hill
(LI_MM)
(IP_MM)
(Point Estimate)
(95 % CI)
73627
(53734,283269)
8723
(6639,70959)
15138
(12371,80804)
6886
(32,41092)
28662
(23600,29647)
462
(545,14379)
4.8
(LI_MM)
(IP)
(IP)
(LI)
1.4x10,6
(7.9x10,7,6.5x10,5)
LI_JB_MM
(2.5,7.2)
4.8x10
,8
(LI)
(MM)
Jamaica
Bay
(LI_MM)
(JB_LI)
(85828,269490)
7354
(9849,74872)
17186
(18832,76281)
6279
(37,54167)
29666
(18312,29636)
327
(667,7749)
6.7
(LI_MM)
,1
455
(326,4058)
2.4x10,6
,6
(6.3x10 ,6.5x10 )
(JB)
(JB)
(LI)
2.1x10,6
(MM)
(Point Estimate)
(95 % CI)
114700
(3.2,7.2)
,5
(2.2x10 ,7.8x10 )
,7
Site(s)
(X)
(7.9x10,7,3.7x10,5)
1.86
,8
(1.8x10 ,0.83)
55
(331,3148)
2.9x10,6
7
(7.6x10 ,1.4x10,5)
1.3x10,6
(MM)
(7.8x10,7,2.9x10,4)
Mig_(X)_to LI
(VC)
Mig_(X)_to_MM
(VC)
Mig_LI_to_(X)
(VC)
Mig_MM_to_(X)
(VC)
0.16
(6.5x10,4,0.39)
0.07
(3.5x10,5,2.3x10,1)
1.4x10,3
(1.5x10,4,3.1x10,3)
5.1x10,3
(7.5x10,5,3.5x10,3)
(IP)
(IP)
(IP)
(IP)
5.1x10,3
(4.4x10,4,0.22)
1.4x10,5
(5.6x10,6,1.9x10,1)
5.4x10,4
(8.6x10,5,2.7x10,3)
5.9x10,3
(5.0x10,5,4.5x10,3)
(JB)
(JB)
(JB)
(JB)
3.8x10,3
(2.5x10,5,1.9x10,1)
1.2x10,3
(1.9x10,4,5.9x10,2)
5.3x10,3
(5.9x10,4,4.8x10,3)
4.4x10,3
(1.7x10,5,2.4x10,3)
Ne = effective population size. Time of divergence is in generations. Migration is reported as the coalescent m, proportion of individuals that move from one
population to another per generation.
0.6
0.4
0.0
0.2
Ancestry K=2
0.8
1.0
A
AP
CN
FM
FP FT JB
LI
0.6
0.4
.2
Ancestry K=2
0.8
1.0
B
KP
RR
WWP
CFP
CIE
CP
CPV HIP
HP
IP
LCC
MH
MM
MR MRG NYBG
PB
SW VC
0.0
AP
CN
FM
FP FT JB
KP
RR
WWP
CFP
CIE
CP
CPV HIP
HP
IP
LI
MH
MR MRG NYBG
PB
MR
PB
SW VC
MM
0.6
0.4
0.0
0.2
Ancestry K=2
0.8
1.0
B
LCC
AP
CN
FM
FP FT JB
KP
RR
WWP
CFP
CIE
CP
CPV
HIP
HP
LI
IP
LCC
MH
MRG NYBG
SW
VC
MM
Figure S1
(A) Population structure analysis of 14,990 SNPs analyzed in sNMF. Vertical bars represent individuals and are assigned to
populations by color. Results are shown for K=2 (highest support). Red = assigned to Long Island. Blue = assigned to Mainland &
Manhattan. Mixed colors represent admixture. (B) sNMF results for K=2 (highest support) of ≤ 15% missing SNP dataset (3,664
SNPs). Red = assigned to Long Island. Blue = assigned to Mainland & Manhattan.
JB
0
1000
40000
FT
E
VC
6000
D
IP
CP
7000
LI_CP
5000
LI_IP
4000
LI_VC
3000
50000
0
2e+04
40000
6e+04
MM NE
4e+04
20000
LI NE
8e+04
60000
1e+05
B
2000
Urban divergence
30000
LI_FT
40000
20000
LI_JB
30000
10000
C
0
Urban pop NE
A
LI
MM_JB
TDIV2_JB
MM_FT
MM_VC
TDIV2_FT
MM_IP
TDIV2_VC
MM_CP
TDIV2_IP
MM
TDIV2_CP
2000
200
1000
10000
0
0
TDIV2_FT
TDIV2_VC
TDIV2_IP
TDIV2_CP
40000
TDIV2_JB
30000
MM & LI divergence
E
CP
IP
VC
20000
FT
10000
JB
F
TDIV1_FT
TDIV1_VC
TDIV1_IP
TDIV1_CP TDIV1_LI_MM
Migration (m)
0
20
40
60
80
100
TDIV1_JB
JB_into_MM
Figure S2
MM_into_JB
FT_into_MM
MM_into_FT
VC_into_MM
MM_into_VC
IP_into_MM
MM_into_IP
CP_into_MM
MM_into_CP
Confidence Intervals from 100 parametric bootstraps for inferred demographic parameters. Red dots represent point estimates from
replicates with the highest likelihood. Results are from all six tested models. (A) Ne of LI inferred from each model. X-axis labeled
with the urban population being tested for divergence. (B) Ne of MM inferred from each model. X-axis labeled with the urban
population being tested for divergence. (C) Ne for each diverged urban population. (D) Time of divergence for urban populations from
LI or MM. (E) Time of divergence for LI and MM inferred from each model. (F) # of migrants per generation from urban populations
into MM and MM into urban populations.
1
A
B
Na
C
Na
TDIV1
TDIV1
M23
N1
N2
M12
M12
M21
MM
N2
M12
TDIV2
N3
M13
N1
M21
M21
M32
TDIV2
2
Na
TDIV1
N1
M31
LI
CP, VC, IP
MM
LI
MM
M13
M13
N2
M23
M32
N3
LI
JB, FT
3
4
5
Figure S3
6
fastsimcoal2 models. MM = Mainland & Manhattan. LI = Long Island. CP = Central Park. VC = Van Cortlandt Park. IP = Inwood
7
Hill Park. JB = Jamaica Bay. FT = Fort Tilden. TDIV = Time of Divergence. N = Effective Population Size. M = migration rate. (A)
8
Model 1 tests divergence between MM and LI. (B) Models 2, 3, and 4 test divergence of CP, VC, and IP from MM after the split. (C)
9
Models 5 and 6 test for divergence of JB and FT from LI after split.