* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The plots show the decay of LD (y-axis) with physical
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene expression programming wikipedia , lookup
Metagenomics wikipedia , lookup
Designer baby wikipedia , lookup
Population genetics wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Neocentromere wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Y chromosome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
History of genetic engineering wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
X-inactivation wikipedia , lookup
Genome editing wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Microevolution wikipedia , lookup
Human genetic variation wikipedia , lookup
Genomic library wikipedia , lookup
Genomic imprinting wikipedia , lookup
Pathogenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genome (book) wikipedia , lookup
Supplementary figures Mitochondrial-nuclear interactions maintain geographic separation of deeply diverged mitochondrial lineages in the face of nuclear gene flow Authors: Hernán E. Morales, Alexandra Pavlova, Nevil Amos, Richard Major, Andrzej Kilian, Chris Greening and Paul Sunnucks Contents: Figure S1 Eastern Yellow Robin (EYR) hypothesized evolutionary history. ....... 2 Figure S3 FST histograms at fine spatial scales. ................................................... 3 Figure S3 Principal Component Analysis of genome-wide nuclear variation. .... 4 Figure S4 Allelic frequency correlations between north and south transects. ... 5 Figure S5 Manhattan plot of FST analyses at fine spatial scales .......................... 6 Figure S6 Manhattan plot of BayeScanEnv analyses ........................................... 7 Figure S7 Manhattan plot of PCAdapt analyses.................................................... 8 Figure S8 Counts of nuclear-encoded genes with mitochondrial functions (Nmt genes). ................................................................................................................ 9 Figure S9 Candidate genes for mitochondrial-nuclear interactions mapped to a three-dimensional model of OXPHOS complex I. ............................................... 10 Figure S10 Shape and location of transects for cline analyses ......................... 11 Figure S11A PCA analysis of neutral and outlier loci for northern transect. .... 12 Figure S11B PCA analysis of neutral and outlier loci for southern transect. ... 13 Figure S12 Spatial population structure in the northern transect. .................... 15 Figure S13 Spatial population structure in the southern transect. .................... 16 Figure S14A Linkage disequilibrium (LD, r2) decay chromosomes 1-6. ............ 17 Figure S14B Linkage disequilibrium (LD, r2) decay chromosomes 7-14. .......... 18 Figure S14C Linkage disequilibrium (LD, r2) decay chromosomes 15-Z. .......... 19 Figure S15 DArT-tags mapping comparison against two reference genomes. 20 Figure S16 correlation of chromosome size and number of SNPs .................... 21 Figure S17 Results summary of outlier detection methods for loci that were not mapped to a reference genome............................................................................ 22 Figure S18 Populations arbitrarily defined across each transect for the BayesScanEnv analyses and the Hardy-Weinberg equilibrium tests. ............... 23 Figure S19 Summary of result for PCAdapt analyses. ....................................... 24 References ............................................................................................................. 25 1 Figure S1 Eastern Yellow Robin (EYR) hypothesized evolutionary history. The background colour of the boxes represents birds’ nuclear genomic background (red- north, blue- south). Inside of the bird symbols, the circles represent mitochondrial DNA (white- mito-A, black- mito-B) and the chromosome symbols (crosses) represent nuclear-encoded genes with mitochondrial function (N-mt genes). Matching colours (white-white or black-black) represent functional mitonuclear interactions, mismatching colours (white-black) represent mitonuclear incompatibilities. Each panel represent a stage in EYR evolutionary history. (A) Initial differentiation with gene flow between northern and southern populations as described in Morales et al. (2017). (B) Two independent events of mitonuclear co-introgression resulting in inland-coastal divergence (shades of blue and shades of red in the third and fourth panels). (C) Selection for mitonuclear interactions despite ongoing nuclear gene flow (shades of red and blue differing horizontally highlight the subtle genome-wide differentiation of each genomic background; see genome-wide PCAs in Fig. 1 of main text). (D) Mitonuclear divergence in the face of gene flow: selection against the formation of mitonuclear genetic incompatibilities in hybrids (intrinsic barriers); selection for locally adapted mitonuclear interactions to maintain geographic separation at the climatic divide (extrinsic barriers); and/or evolution of genomic architecture for reduced recombination. The processes described by the last two panels likely occurred simultaneously. 2 Figure S3 FST histograms at fine spatial scales. Inland-coastal FST differentiation between different mitolineage-bearing populations in each transect. This analysis includes only individuals sampled within 40 km of the centre of the contact zone. The insets show the right tails of the distributions (FST > 0.4) showing high peaks that are barely visible on the plots with all markers. 3 Random 1 (565 SNPs) Random 2 (565 SNPs) Random 3 (565 SNPs) Random 4 (565 SNPs) Random 5 (565 SNPs) PCA 2 Non-outlier loci (59,879 SNPS) PCA 1 Figure S3 Principal Component Analysis of genome-wide nuclear variation. For Fig. 1C in the main text we depicted a PCA with genome-wide variation (all loci). Here we show that the same general pattern is recovered regardless of the number and type of loci used. Top-left panel shows a PCA with non-outlier loci only. The remaining panels show PCAs with five different random sets of non-outlier loci of the same size (565 SNPs) as the outlier analyses reported in the main text. 4 Figure S4 Allelic frequency correlations between north and south transects. (A) Allelic frequency correlation between north mito-A-bearing and south mito-Abearing populations. (B) Allelic frequency correlation between north mito-B-bearing and south mito-B-bearing populations. Correlations of outlier loci are significantly higher than expected at random (P<0.001) showing that alleles within mitolineagebearing populations segregate in the same direction in both nuclear backgrounds. The genomic random distribution was obtained by calculating allelic frequency correlations of 140 sets of non-outlier loci. 5 Figure S5 Manhattan plot of FST analyses at fine spatial scales Nuclear DNA differentiation between mitolineage-bearing populations mapped onto the zebra finch reference genome for individuals located within a 40-km radius from the midpoint of the contact zone between the two mitolineages in each transect. (A) Northern transect; (B) southern transect; genomic position of each Single Nucleotide Polymorphism as mapped to the zebra finch genome is shown on the x-axis; FST is presented on the y-axis. 6 Figure S6 Manhattan plot of BayeScanEnv analyses Nuclear DNA differentiation between populations correlated with a binomial variable of mitolineage membership mapped onto the reference genome for all individuals across each transect. (A) Northern transect; (B) southern transect. The genomic position of each Single Nucleotide Polymorphism as mapped to the zebra finch genome is shown on the x-axis. The -log10 posterior error probability (see Methods in main text) depicted on the y-axis, indicates the probability of each SNPs being an outlier , with the False Discovery Rate threshold of 5% shown with a horizontal red line. 7 Figure S7 Manhattan plot of PCAdapt analyses. Nuclear DNA differentiation along the first axis of differentiation (K = 1, see methods) for all individuals across each transect. (A) Northern transect; (B) southern transect. The genomic position of each Single Nucleotide Polymorphism as mapped to the zebra finch genome is shown on the x-axis. The -log10 posterior error probability (see Methods in main text) depicted on the y-axis, indicates the probability of each SNPs being an outlier, with the False Discovery Rate threshold of 5% shown with a horizontal red line. 8 Figure S8 Counts of nuclear-encoded genes with mitochondrial functions (Nmt genes). Histograms of counts of nuclear-encoded genes with mitochondrial function (N-mt genes) in random genomic regions of the same size as the mtDNA-linked island of divergence represent the random expectation. The black arrows indicate the number of N-mt genes found within the chromosome 1A mtDNA-linked island of divergence. Two different categories of N-mt genes were counted (A) N-mt genes (GO term: 0005739) and (B) a subset of N-mt genes, protein-coding OXPHOS genes (GO term: 0006119). The mtDNA-linked island of divergence on chromosome 1A is significantly enriched for N-mt genes overall (P<0.001), and also for the subset of OXPHOS genes (P<0.01). 9 Figure S9 Candidate genes for mitochondrial-nuclear interactions mapped to a three-dimensional model of OXPHOS complex I. Mitonuclear interactions in the OXPHOS enzymatic complexes are formed by mitochondrial-encoded genes (mtDNA genes) and nuclear-encoded genes with mitochondrial function (N-mt OXPHOS genes). The underlying structure represents the complete structure of the bovine mitochondrial complex I (Zhu et al. 2016). The modules responsible for NADH oxidation (E module), ubiquinone reduction (Q module), and proton-pumping (ND1, ND2, ND4, and ND5 modules) are represented with differentially shaded backgrounds. Arrows show the routes of electron transfer (thin, dotted, maroon), proton translocation (medium, dashed, navy), and the conformational changes that couple them (thick, solid, grey). EYR N-mt genes in the genomic island of divergence of chromosome 1A (NDUFA6, NDUFA12, NDUFB2; see Fig. 3 of main text) are mapped onto the protein structure (in yellow colour). EYR mitochondrially-encoded core subunits that show evidence of positive selection between the mito-A and mito-B (ND4, ND4L and ND5) are highlighted in purple. 10 Figure S10 Shape and location of transects for cline analyses For each transect, we projected the location of each individual sample along a unidimensional transect (dashed line) and calculated the distance of each sample to a common geographic point (triangle). The geographic locations from which samples were taken did not comprise ideal transects running perpendicular to the mitochondrial variation (black = mito-A and red = mito-B), particularly in the southern transect, is likely the reason why the confidence intervals of nuclear clines were so wide (see Results). 11 Figure S11A PCA analysis of neutral and outlier loci for northern transect. A PCA analysis was performed using two datasets: (A) 6,947 non-outliers located at least 100,000 bases away from any outliers and (B) 292 outliers located within the chromosome 1A island of differentiation. Each panel shows on the left side a biplot of PCA scores for the two first axes (mitolineages: blue- mito-B, red- mito-A), ellipses capture 80% of the variation for each group. Each panel shows on the right side shows the spatial location of individuals with PCA-axis-1 scores ranging from low negative values (dark blue) though zero (white) to large positive values (dark red). The outline of the squares represents the mitolineage membership of each individual. Individual locations are jittered in the latitudinal direction to allow visualization of overlapping individuals. 12 Figure S11B PCA analysis of neutral and outlier loci for southern transect. A PCA analysis was performed using two datasets: (A) 6,947 non-outliers located at least 100,000 bases away from any outliers and (B) 292 outliers located within the chromosome 1A island of differentiation. Each panel shows on the left side a biplot of PCA scores for the two first axes (mitolineages: blue- mito-B, red- mito-A), ellipses capture 80% of the variation for each group. Each panel shows on the right side shows the spatial location of individuals with PCA-axis-1 scores ranging from low negative values (dark blue) though zero (white) to large positive values (dark red). The outline of the circles represents the mitolineage membership of each individual. Individual locations are jittered in the longitudinal direction to allow visualization of overlapping individuals. 13 14 Figure S12 Spatial population structure in the northern transect. (A) Shows the actual spatial location of each individual across the transect, samples in the rest of the panels are jittered on the latitudinal axes to facilitate visualization of overlapping samples. (B) Mitolineage membership for each individual (red- mito-A, blue- mito-B). (C-E) Pie-charts show individual probability of assignment to two genetic cluster from the STRUCTURE (Pritchard et al. 2000) analysis, the circle outline shows mitolineage membership (red- mito-A, blue- mito-B). (C) Structure recovered from 12/20 replicates with genome wide neutral loci (6,947 non-outliers located at least 100,000 bases away from any outliers); (D) Structure recovered from 8/20 replicates with genome wide neutral loci; (E) Structure recovered from 20/20 replicates with 292 chromosome 1A outlier loci. 15 Figure S13 Spatial population structure in the southern transect. (A) Shows the actual spatial location of each individual across the transect, samples in the rest of the panels are jittered on the longitudinal axes to facilitate visualization of overlapping samples. (B) Mitolineage membership for each individual (red- mito-A, blue- mito-B). (C-D) Pie-charts show individual probability of assignment to two genetic cluster from the STRUCTURE (Pritchard et al. 2000) analysis, the circle outline shows mitolineage membership (red- mito-A, blue- mito-B). (C) Structure recovered from 20/20 replicates with genome wide neutral loci (6,947 non-outliers located at least 100,000 bases away from any outliers); (D) Structure recovered from 20/20 replicates with 292 chromosome 1A outlier loci. 16 Figure S14A Linkage disequilibrium (LD, r2) decay chromosomes 1-6. The plots show the decay of LD (y-axis) with physical distance between pairs of loci relative to their position in the zebra finch genome (x-axis). Black dots indicate observed pairwise LD. Green lines show the expected average decay of LD per chromosome with confidence intervals in red, estimated with a non-linear regression. 17 Figure S14B Linkage disequilibrium (LD, r2) decay chromosomes 7-14. The plots show the decay of LD (y-axis) with physical distance between pairs of loci relative to their position in the zebra finch genome (x-axis). Black dots indicate observed pairwise LD. Green lines show the expected average decay of LD per chromosome with confidence intervals in red, estimated with a non-linear regression. 18 Figure S14C Linkage disequilibrium (LD, r2) decay chromosomes 15-Z. The plots show the decay of LD (y-axis) with physical distance between pairs of loci relative to their position in the zebra finch genome (x-axis). Black dots indicate observed pairwise LD. Green lines show the expected average decay of LD per chromosome with confidence intervals in red, estimated with a non-linear regression. 19 Figure S15 DArT-tags mapping comparison against two reference genomes. The plots show the mapping position in base pairs to the zebra finch reference genome (Warren et al. 2010) on the x-axis against the difference in mapping position compared to the collared flycatcher reference genome (Ellegren et al. 2012) on the y-axis. DArTtags that mapped to identical positions in both reference genomes are located along zero values on the y-axis (e.g. chromosome 6). DArT-tags that mapped to different positions in both reference genomes depart from zero on the y-axis (e.g. chromosome Z). Only macro-chromosomes are shown for simplicity. 20 Figure S16 correlation of chromosome size and number of SNPs obtained 21 Figure S17 Results summary of outlier detection methods for loci that were not mapped to a reference genome. Histograms of distributions of various summary statistics for the different outlier detection methods employed: (A) fine-scale FST estimates (including only individuals sampled within 40 km of the centre of the contact zone), (B) BayeScEnv significance levels and (C) PCAdapt significance levels. The inset in each histogram shows the tail of the distribution. (D) Venn diagrams show the overlap of significant outliers detected between methods (A, B and C) in each transect. 22 Figure S18 Populations arbitrarily defined across each transect for the BayesScanEnv analyses and the Hardy-Weinberg equilibrium tests. Samples were organized in (A) 6 populations in the north transect (25 mito-A individuals, 27 mito-B), and (B) 11 populations in the south transect (71 mito-A, 32 mito-B). Each colour represents a different population. Populations contain at least two individuals that belong to the same mitogroup and are in close proximity. 23 Figure S19 Summary of result for PCAdapt analyses. (A) Screeplot of the amount of genetic variation explained by the first 76 K clusters (see Methods). (B) PCA analysis of both transects (green- north mito-A-bearing population, red- north mito-B-bearing population, purple- south mito-A-bearing population, blue- south mito-B-bearing population). (C) PCA analysis of north transect (blue- mito-A, red- mito-B). (D) PCA analysis of south transect (red- mito-A, blue- mito-B). 24 References Ellegren H, Smeds L, Burri R, et al. (2012) The genomic landscape of species divergence in Ficedula flycatchers. Nature 491, 756-760. Morales HE, Sunnucks P, Joseph L, Pavlova A (2017) Perpendicular axes of differentiation generated by mitochondrial introgression. Molecular Ecology. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155, 945-959. Warren WC, Clayton DF, Ellegren H, et al. (2010) The genome of a songbird. Nature 464, 757-762. Zhu J, Vinothkumar KR, Hirst J (2016) Structure of mammalian respiratory complex I. Nature 536, 354-358. 25