* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Nature Genetics: doi:10.1038/ng.3304
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene therapy wikipedia , lookup
Koinophilia wikipedia , lookup
Tay–Sachs disease wikipedia , lookup
Gene expression profiling wikipedia , lookup
Behavioural genetics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Skewed X-inactivation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome evolution wikipedia , lookup
Pharmacogenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Public health genomics wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Designer baby wikipedia , lookup
Oncogenomics wikipedia , lookup
Population genetics wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genome (book) wikipedia , lookup
Medical genetics wikipedia , lookup
Frameshift mutation wikipedia , lookup
Supplementary Figure 1 Distribution of sequencing coverage in the WGS500 project. Left, plots of the cumulative distribution, for each WGS500 sample, of coverage across the genome (top left) or exome (bottom left). Top right, a comparison of coverage between WGS500 samples (blue) and exomes (black) sequenced at the Oxford Biomedical Research Centre. Thicker lines are the medians across samples; dotted vertical lines are the global medians. Bottom right, the distribution, for each WGS500 sample, of the ratio of the number of reads with the alternate allele (ALT) to the total number of reads (TOTAL), for novel variants. We expect the mean to be 0.5. Individuals with mean <0.4 are shown with colored lines. These are likely to have sample contamination, which leads to a larger number of heterozygous calls for which there are few ALT reads. The sample HCM_2361 was removed from further analysis. Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 2 Influence of coverage on concordance between sequence data and SNP arrays for multiple samples. Left, genotype concordance as a function of sequencing depth; note that concordance drops progressively when coverage drops below 15×. 95% confidence intervals, calculated by the Wald method, are indicated. Right, fraction of sites with a given level of coverage. Note that samples with higher coverage (e.g., LVNC_1.1.70, LVNC_1.2.83) have fewer SNPs in the lower-coverage bins, and the genotype concordance estimate therefore has larger confidence intervals. Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 3 Effect of filtering variants by frequency in public databases and/or other WGS500 samples. Density plots of the distribution of the number of novel heterozygous (top) or rare homozygous (bottom) coding variants (ANNOVAR annotation) across all individuals, where frequency is defined in the control data sets indicated. The individuals in the top 5th percentile are shown; all of these samples are known to have African or South Asian ancestry except for MR_6 and MR_8, for which we suspect have some sample contamination (Supplementary Fig. 1). ESP, NHLBI Exome Sequencing Project. Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 4 The burden of variants of unknown significance in candidate genes for craniosynostosis. Histograms of the number of potentially pathogenic, conserved coding variants in different candidate gene sets for craniosynostosis (CRS). The candidate genes were chosen by a combination of literature and high-throughput database searches, augmented by expert curation (Online Methods). Sample names in green text indicate that the variant is not likely to be pathogenic, as it does not fit a plausible inheritance model or is less functionally compelling than another candidate (Supplementary Table 6). SC, Saethre-Chotzen syndrome. Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 5 The burden of putative regulatory variants. Distributions of the number of novel heterozygous (top) and rare homozygous (bottom) variants that alter conserved positions in regulatory regions within 5 kb (red) or 50 kb (black) of a gene. The fact that the number of variants does not substantially change if one considers only regulatory regions within 5 kb of genes (black line) reflects the fact that these regions tend to be close to genes. Note that most of the outliers are also outliers in Supplementary Figure 3, and these samples tend to be of African or Asian ancestry. Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 6 The burden of putative regulatory variants around candidate genes for early-onset epilepsy or craniosynostosis. As shown for Supplementary Figure 4 but for variants at conserved positions in regulatory regions within 50 kb of candidate genes for early-onset epilepsy (top) or craniosynostosis (bottom). Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 7 Segregation of putative causal variants in UMOD and CASR. Top, the NM_001008389:c.410G>A UMOD variant was identified by WGS in individual III.2 in this family with familial juvenile hyperuricaemic nephropathy (FJHN). The G>A transition generates an AccI restriction endonuclease recognition site, and digestion of a 349-bp PCR product with AccI was used to confirm cosegregation of the variant with affected individuals in the family. Digestion of the mutant (mut) allele generated 93-bp and 256-bp fragments, with the wild-type (WT) allele remaining uncut. Bottom, the NM_000388:c.2299G>C CASR variant was identified by WGS in individuals I.2 and II.1 in a family with familial hypoparathyroidism (FH). The G>C transversion causes loss of a BssSI restriction endonuclease recognition site, and digestion of a 367-bp PCR product with BssSI was used to confirm cosegregation of the variant with affected individuals in the family. Digestion of the wild-type (WT) allele generated 180-bp and 187-bp fragments, with the mutant (mut) allele remaining uncut. Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 8 Parental origin and sequence conservation of the HUWE1 mutation. Left, alignment of sequencing reads from the proband (CRS_4659), mother (CRS_4654) and father (CRS_4655) over 2 C/A polymorphisms (arrows; C shown in red and A shown in black). Allele-specific primers (AARev and CCRev) were designed with a common primer Intron6-For to amplify the HUWE1 mutation and polymorphisms in a single PCR product (top right). Red arrows indicate polymorphic sites, and nucleotides included in the primer sequences are underlined. The results of PCR are shown underneath. The products were digested with HpaII, which showed the presence of the mutation (white arrow) only in the CC-Rev amplification product from the proband (second panel from the bottom, right), indicating paternal origin. Bottom right, an alignment of the DUF908 domain in the protein encoded by HUWE1, with the mutated residue indicated (red arrow). Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 9 Identification of an inherited interstitial insertion involving chromosomes 2p25.3 and Xq27.1 associated with X-linked recessive hypoparathyroidism. The sequences of the proximal (top left) and distal (top right) insertion junctions are shown. Reference sequences on Xq27.1 and 2p25.3 are indicated in red and blue, respectively. A 3-bp microinsertion at the distal insertion boundary is indicated in yellow. Bottom, primers specific for chromosomes 2 (2SPF) and X (XSPF and XSPR) were designed for the DNA sequence at the distal boundary and used to further characterize the insertion. The sizes of the PCR products obtained with each primer pair are indicated. Chromosome X is shown in black, and the inserted sequence from chromosome 2q25.3 is shown in gray. Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 10 Coverage in this study compared to a large-scale exome sequencing project. Coverage comparison of this study and a large-scale exome sequencing (WES) project for the variants given in Table 1 (top) and the causative variants identified in the WES project (bottom). For the WES project, for nondisclosure reasons, only the gene name is given. The WES coverage data (blue) were compiled from 141 whole-exome data sets that were sequenced using the Roche NimbleGen SeqCap EZ v.2.0 kit. Labels for variants located in regions targeted by this kit are in blue, those within 20 bp of the targeted regions are in green and those outside the targeted regions are in red. The WGS500 data (red) were compiled from all the whole-genome data sets used in this study. The horizontal green lines denote two exemplary coverage thresholds used in variant detection. To improve readability, the plots were truncated above a coverage value of 100 (top) or 200 (bottom) and the box-plot whiskers were extended to the data extremes. Nature Genetics: doi:10.1038/ng.3304 Supplementary Figure 11 Distribution of the lengths of the largest regions of homozygosity across all samples. Thirty-seven samples had at least one region of homozygosity >4 Mb in length (black bars), suggesting consanguinity. Note that the largest bin includes one sample with confirmed uniparental isodisomy. See the Online Methods for an explanation of how regions of homozygosity were identified. Nature Genetics: doi:10.1038/ng.3304 Supplementary Note Case studies 1. HUWE1 in craniosynostosis 1.1. Introduction Background on the disease Craniosynostosis, the premature fusion of the cranial sutures, is a serious disorder with a prevalence of ~1 in 2,200 children. There are over 30 known disease genes, with dominantly acting mutations in the FGFR2, FGFR3, TWIST1 and EFNB1 genes accounting for most of the 20-25% of cases with a single genetic aetiology1. Case study The female proband, CRS_4659, was noted to have microcephaly in utero, and craniosynostosis was suspected based on magnetic resonance imaging performed at 30 weeks’ gestation. She was born at term by planned Caesarean section, and did not require any resuscitation. On formal craniofacial assessment at the age of 7 weeks, a very tall skull with a marked transverse occipital constriction and multiple palpable soft spots was noted. She was dysmorphic with exorbitsm, slightly upslanting palpebral fissures and arched eyebrows. She had a high arched palate and thin upper lip; the ears, hands and feet were normal. Three dimensional computed tomographic analysis of the skull showed multiple widespread craniolacunae and synostosis of all sutures (Figure 3A). She underwent an occipital craniectomy and foramen magnum decompression at 7 months of age and a fronto-orbital advancement at 4 years of age. Formal developmental assessment prior to the second procedure indicated that she was performing in the low average range of ability, with decreased attention and concentration and marked distractibility. Speech and language assessment using Clinical Evaluation of Language Fundamentals (CELF) Preschool gave scores in the 3-6 range (average 10). Known genes/pathways and prior screens The clinical picture was not reminiscent of craniosynostosis disorders caused by known disease mutations. Analysis of blood for disorders of bone biochemistry (Ca, P, Mg, alkaline phosphatase), a craniosynostosis disease gene screen (for mutation hotspots/deletions in FGFR1, FGFR2, FGFR3, TWIST1 and EFNB1), karyotyping and array comparative genomic hybridisation screen (Agilent 250k) were all normal. Experimental design and strategy for identifying candidates We sequenced the proband and her unaffected parents. They were nonconsanguineous, and there was no family history of craniosynostosis. We suspected a de novo, dominant mutation (because the majority of monogenic craniosynostosis exhibits dominant genetics), but also considered recessive Nature Genetics: doi:10.1038/ng.3304 mechanisms. We searched for de novo variants in the proband, absent in the parents, and prioritised exonic mutations predicted to cause alterations in the encoded protein. 1.2. Methods for follow-up studies Parental origin of the mutation We designed allele-specific primers to determine whether the child’s HUWE1 mutation was present on the allele of maternal or paternal origin. We exploited two closely adjacent intronic C/A polymorphisms at positions chrX:53,675,478 and chrX:53,675,488, ~1150 bp upstream of the HUWE1 mutation, that we had identified in the whole genome sequence. The proband and her mother (CRS_4654) were heterozygous (CC/AA), whilst the father (CRS_4655) was a CC hemizygote (Supplementary Figure 8 left). Allele-specific primers CCRev (5'-CCAAGGTGGGTTTTT GTTTTGTTTTTTGTTTTGTTTTGTTTTG-3') and AARev (5'-CCAAGGTGGGTTTTTGTTTTGTTTTTTGTTT TTTTTTGTTTTT-3') were designed to amplify the CC and AA alleles respectively (bases differing between the two primers italicised). PCR was carried out using either CCRev or AARev and a common forward primer Intron6-For (5'-CCCATCAACCCTATGAAGGATAGTATCTATATCC-3'), so that the product spanned exons 5 and 6 (containing the HUWE1 mutation). The AARev primer did not generate a product using the paternal sample, confirming that amplification was allele-specific (Supplementary Figure 8, top gel pictures). The 1392 bp amplification product was digested with HpaII, for which a restriction site is ablated by the HUWE1 mutation. The 645 bp fragment characteristic of the mutant allele was observed only in DNA amplified from the child using the CCRev primer (Supplementary Figure 8, bottom gel picture, white arrow), indicating that the mutation resided on the paternal allele (Figure 3B). X inactivation studies for HUWE1 mutation Skewing of X inactivation was measured using the androgen receptor gene (AR) triplet repeat assay2,3. Briefly, 1 µg of genomic DNA was predigested with RsaI either in the presence (+) or absence (-) of the methylation-sensitive enzyme HpaII (20 U). PCR amplification was carried out using primers AR-For (5'-TCCAGAATCTGTTCCAGAGCGTGC-3’) and AR-Rev (5'-FAMGCTGTGAAGGTTGCTGTTCCTCAT-3’). Amplicons were analysed on an ABI 3130 sequencer and sized using GeneScan software. Differences in peak areas for the two alleles in the HpaII(+) assay were corrected for differences in amplification efficiency measured in the HpaII(-) assay, and the final results expressed as a percentage of the more inactivated allele. Of note, Platypus was unable to specify the correct AR triplet repeat genotypes in heterozygotes (not shown), highlighting the difficulties of accurately calling simple sequence repeats using 100 bp read data. Analysis of HUWE1 expression RNA was extracted from EBV-transformed lymphoblastoid cells and scalp fibroblasts obtained from patient CRS_4659. cDNA was synthesised using the Fermentas RevertAid First-Strand Synthesis kit with random hexamer primers according to the manufacturer's instructions. PCR amplification was carried out using primers Ex6Rev (5'-CTGCCAGCACCACTTGCATATCAGAGGAAGCC-3’) and Ex5For (5'GTGCGAGTTATATCACTGGGTGGACCTGTTGG-3’) to generate a product of 257 bp. HpaII digests the normal but not the mutant product, generating fragments of 186 and 71 bp. Nature Genetics: doi:10.1038/ng.3304 1.3. Results Amongst 94 de novo mutations identified in the proband, a mutation at chrX:53,674,333 in the HUWE1 gene (c.329G>A encoding p.Arg110Gln) was the only one predicted to alter protein coding. HUWE1 encodes a ubiquitin ligase, and the mutation resides at a highly conserved position (including across invertebrates and yeasts) of the DUF908 domain, the function of which is unknown (http://pfam.sanger.ac.uk/family/duf908) (Supplementary Figure 8, bottom right). Five missense mutations elsewhere in the protein were previously reported in pedigrees segregating X-linked mental retardation or autistic spectrum disorder4-6. In two of these pedigrees, female carriers were reported to be symptomatic and/or to have associated macrocephaly. The mutation was found to originate from the paternal X chromosome (Supplementary Figure 8, bottom gel picture). To seek additional support for an X-linked origin of the child’s disorder, we studied X-inactivation, because (owing to negative selection of cells lacking a functional gene copy), female carriers of many serious X-linked disorders exhibit preferential inactivation of the X chromosome bearing the mutant allele7. Indeed, we found that the proband exhibited extreme skewing of X-inactivation (Figure 3C), but to our surprise the maternal (and therefore non-mutant) X was preferentially inactivated. Corroborating this conclusion, RNA expression studies in scalp fibroblasts and EBV-transformed lymphoblastoid cells showed that only mutant HUWE1 is expressed in the patient (Figure 3D). Neither the mother nor the paternal grandmother showed extreme skewing of X-inactivation (Figure 3C), ruling out the possibility that one of the proband’s X chromosomes was constitutionally susceptible or resistant to X-inactivation. We considered the possibility that a different de novo mutation, occurring on the maternally inherited X chromosome, could have led to selective inactivation of the maternal X. We therefore scrutinised the seven other X-encoded de novo mutations that had been detected by WGS (Figure 3E). Three variants located within genes (all noncoding): two were shown to be present on the paternal allele whilst the third, for which parental origin could not be determined, was in a gene (CCDC160) that is in a region not subject to X- inactivation8. Of the other four de novo variants, all were found in SINE or LINE repeats, and appeared unlikely to be functionally significant. In addition, we scrutinised the X chromosome for regions in which a maternal allele was apparently not transmitted, as this could indicate a de novo deletion on the maternal allele. We identified two such regions, one (chrX:5,055,376-5,057,467) in an intergenic region at Xp22.1, and the other (chrX:154,778,278-154,784,971) within intron 1 of TMLHE at Xq28. These copy number changes did not appear to be good candidates to cause the skewed X-inactivation. We used dideoxy-sequencing to screen 280 patients with craniosynostosis for mutations of HUWE1 located within the region encoding the DUF908, but found no other significant mutations. In addition, we sequenced the entire HUWE1 gene in 47 patients with multisuture synostosis using Fluidigm Access Array multiplexing and Ion Torrent sequencing, but again found no likely pathogenic mutations. However, we did subsequently identify, using exome sequencing, a different de novo hemizygous mutation altering the same amino acid of HUWE1 (c.328C>T encoding p.R110W) in a boy presenting with metopic craniosynostosis, moderate-severe learning disability and other dysmorphic features, making us confident that this mutation was pathogenic in both cases. Nature Genetics: doi:10.1038/ng.3304 1.4. Clinical actions There is strong evidence that the mutation is the cause of the child’s learning disability and craniosynostosis, because (1) HUWE1 is an established disease gene in X-linked mental retardation in males4,6, and (2) a de novo mutation at the same codon was found by exome sequencing in a male patient who presented with craniosynostosis. The parents requested help with obtaining special educational support for their daughter, so a letter outlining the genetic diagnosis and its likely contribution to the learning disability was written to the education authorities. A suitable educational plan was subsequently implemented and the parents considered the genetic information to be instrumental in this outcome. 2. EPO in erythrocytosis 2.1. Introduction Background on the disease Erythrocytosis is a clinical condition characterized by increased red cell mass and typically elevated haematocrit and haemoglobin (Hb) concentration9. It can be congenital (e.g. genetic) or acquired. In primary erythrocytosis, patients have an intrinsic defect in the erythroid cells of the bone marrow and typically have low levels of erythropoietin (Epo), the protein that promotes the survival, proliferation and differentiation of erythrocyte progenitor cells. In secondary erythropoiesis, the increased red cell production is driven by external factors (e.g. hypoxia or defects in oxygen sensing) through increased erythropoietin production, and patients typically have high or inappropriately normal Epo levels9. Epo production is controlled at the transcriptional level in an oxygen-regulated manner. This control is mediated by hypoxia-inducible transcription factors (HIF), and mutations in genes in the HIF pathway are known to cause erythrocytosis10. Even after screening for all known mutations, there remains a considerable number of patients in whom no genetic cause has been found11. Of these patients with idiopathic erythrocytosis, about two thirds have inappropriately normal or elevated levels of erythropoietin (given their level of Hb), suggesting a high likelihood of a defect in their oxygen-sensing pathway. Furthermore, most of these patients have early-onset (childhood) disease and often have a family history (sometimes with clear patterns of Mendelian inheritance), suggesting a high probability for an underlying genetic aetiology. Case study Two unrelated families, Family M and Family S, were identified showing an autosomal dominant pattern of erythrocytosis inheritance (Figure 4B). The patients with erythrocytosis had raised Hb concentrations (>16.5 g/dl in females and >18.5 g/dl in males) and haematocrits (>0.5 l/l), had high Epo levels and were diagnosed at young age, some as young as 2 years of age (e.g. PAR07 and PAR08). Known genes / pathways and prior screens Our investigations focused on patients with idiopathic erythrocytosis with either high or inappropriately normal Epo levels, suggesting a defect in oxygen sensing rather than primary erythrocytosis. Based on knowledge of the biological pathway for hypoxia induced erythrocytosis, a Nature Genetics: doi:10.1038/ng.3304 list of a priori candidate genes included: HIF1A, EPAS1 (HIF2A), HIF3A, ARNT (HIF1B), HIF1AN (FIH), EGLN1 (PHD2), EGLN2 (PHD1), EGLN3 (PHD3), VHL, EPO, HBB, HBA1, HBA2, BPGM. We also considered JAK2 and EPOR as candidates, since these are involved in congenital erythrocytosis. Known exonic mutations in VHL, PHD2, HIF2A, EPOR and JAK2, were screened for prior to genome sequencing, and found to be absent. Experimental design and strategy for identifying candidates In family S, DNA samples were available from six individuals both from affected and unaffected family members (Figure 4B). WGS was performed on two affected individuals (PAR09 and PAR07), and on one unaffected (PAR18), forming a trio. DNA from one affected (PAR08) and 2 unaffected members (PAR19 and PAR 22) were used in follow-up genotyping and segregation analysis. We did not have any information on, nor could we collect DNA from, the father of PAR09. In family M, DNA samples were available for four individuals. WGS was performed on two affected individuals (PAR15 and PAR16), and DNA from one affected (PAR17) and one unaffected (PAR20) members were used in follow-up genotyping and segregation analysis. In both families, the disease appeared to follow a dominant inheritance pattern and so we focused on heterozygous variants shared between affecteds, but not unaffecteds, in each family individually. As all individuals have large numbers of these, we prioritized known candidate genes. We originally looked at coding candidates, but later extended this to noncoding exonic sequence. 2.2. Results None of our candidate genes contained coding mutations that segregated with disease. However, we identified a single nucleotide variant (G>A) in the 5’UTR of EPO at chr7:100,318,468, shared by the affected individuals in the 2 families (PAR07 and PAR09 in family S, PAR15 and PAR16 in family M). The variant was not found in PAR18 nor in other WGS500 samples. Sanger sequencing confirmed the presence of the 5’UTR variant in the affected members of Family M (PAR17, PAR15, PAR16) and Family S (PAR09, PAR07, PAR08) and its absence in unaffected members of either family (Family S: PAR 18, PAR19, PAR22; Family M: PAR20). The EPO variant is not listed in dbSNP, 1000 Genomes, or any other samples sequenced within the WGS500 project. In order to determine whether the variant had arisen independently in the two families, we analysed the surrounding SNVs in the family members for which genomic sequence was available. This revealed 47 rare variants (1000 Genomes frequency < 1%) that are found, uniquely (within the WGS500 project) in the four affected individuals, in an approximately 8 Mb region (chr7:93,533,251-100,993,241). This provides compelling evidence that the region is identical-bydescent in both families and that the 5’UTR variant had one common origin. The 5’UTR variant is the only exonic variant and EPO the only candidate gene in this 8 Mb region. No other genomic region in the two families shows a similar pattern of sharing of rare variants. This suggests that the haplotype carrying the variant is multiple generations old, and thus likely to be found in others. This finding will be followed-up by screening larger cohorts of patients with idiopathic erythrocytosis for this 5’UTR variant and by functional molecular studies to investigate the mechanism of action of this mutation. The putative functional variant lies in a small conserved block within the EPO 5’UTR (Figure 4A), and so it seems likely to affect expression. Nature Genetics: doi:10.1038/ng.3304 2.3. Clinical actions WGS has played an important diagnostic role in clarifying the aetiology of the disease in these two families with erythrocytosis. These findings will be useful in screening and family planning and counselling. 3. SOX3 in hypoparathyroidism 3.1. Introduction Background on the disease Hypoparathyroidism is an endocrine disorder in which deficiency of parathyroid hormone (PTH) results in hypocalcaemia that may be associated with tetany, carpopedal spasms, seizures, laryngeal stridor, cataracts or ectopic calcification. Treatment with oral vitamin D preparations and calcium supplements is effective at restoring normocalcaemia and ameliorating the neuro-muscular symptoms. Hypoparathyroidism may be congenital due to parathyroid gland agenesis (e.g. the DiGeorge syndrome) or acquired and due to destruction of the parathyroid glands (e.g. in autoimmune diseases). In addition, hypoparathyroidism may occur as part of a complex congenital syndrome (e.g. the DiGeorge syndrome or a pluriglandular autoimmune disorder) or as a nonsyndromic solitary endocrinopathy, which is referred to as isolated or idiopathic hypoparathyroidism. Familial occurrences of idiopathic hypoparathyroidism have been reported and autosomal dominant, autosomal recessive, and X-linked inheritances have been established. Genetic abnormalities of TBX1, AIRE1, GATA3, TBCE, GCMB, PTH, CASR, GNA11, SOX3 and the mitochondrial genome have been reported in patients with these syndromic and non-syndromic forms of hypoparathyroidism. The incidence of hypoparathyroidism has not been established, but DiGeorge syndrome is reported to occur in 1/3,000 live births and an autoimmune form of hypoparathyroidism has been reported to occur more often in the Finnish and Iranian-Jewish populations. Case study The proband, HPT_3, is the son of nonconsanguineous parents. He presented at age 9 months with seizures due to hypocalcemia (corrected serum calcium concentrations = 3.0 to 4.0 mg/dl; normal = 9.0 to 10.1 mg/dl) attributed to primary hypoparathyroidism. His grandfather, HPT_1, was also known to have had hypocalcemic seizures during childhood. Neither the proband nor his grandfather suffered from immunodeficiency, cardiac anomalies, craniofacial defects, developmental delay, deafness, or renal dysplasia. Thus, the findings seemed consistent with isolated hypoparathyroidism (HPT) that was likely inherited as an X-linked recessive trait. Treatment with vitamin D preparations and oral calcium supplements restored normocalcaemia. Known genes/pathways and prior screens Genetic abnormalities involving the coding regions of the GCMB, PTH, CASR, SOX3, and AIRE1 have been previously excluded. A previous report of familial hypoparathyroidism implicated a deletioninsertion near SOX3 on chromosome X via linkage analysis12. The report identified a region from 2p25.3 inserted at Xq27.1 with simultaneous deletion of a region from chromosome X. It also demonstrated Sox3 expression in the developing parathyroid tissue of mouse embryos. Nature Genetics: doi:10.1038/ng.3304 Experimental design and strategy for identifying candidates The genomes of the proband, HPT_3, and his affected grandfather, HPT_1, were sequenced. DNA from the proband’s unaffected mother, HPT_2, and brother, HPT_4, was used in follow-up genotyping. The pattern of inheritance clearly suggested that the pathogenic variant was X-linked, and SOX3, previously linked to hypoparathyroidism, was the main candidate gene. Accordingly, we searched for mutations and larger structural variants in and around SOX3. 3.2. Results Visual inspection of reads mapping downstream of SOX3 showed an apparent deletion of 1.4 kb of X chromosomal sequence, located approximately 81.5 kb downstream of the gene. The pairs of reads flanking this deletion consistently mapped to either end of an approximately 50 kb sequence on chromosome 2p (Figure 4C), suggesting a simultaneous deletion of an X chromosomal region with an insertion into the X chromosome of the region from 2p. We used PCR to confirm the chromosomal rearrangement in the affected individuals (Figure 4D; Supplementary Figure 9) and to show that the mother is a carrier and that the unaffected brother does not have the rearrangement. The breakpoint on chromosome X is coincident with a palindromic sequence at the 5’ end (chrX:139,502,865-139,503,044), indicated in blue and red below, with the asterisk marking the breakpoint (Supplementary Figure 9): GGGTTCAGCTTCCCTCTAAGCCCCTAACATGTTTGTTCTAGTTTATTTCTGGTGACTTCAGTGCTTTTAAAAAGC AATATAT*AAGCTATATCTAGCTTATATATTGCTTTTTAAAAGCACTGAAGTCACCAGAAATAAACTAGAACAA ACATGTTAGGGGCTTAGAGGGAAGCTGAACCC The deletion does not include any of the conserved non-coding elements defined in UCSC, but it does occur approximately 500 bp upstream of a vertebrate conserved element (chrX:139,504,830139,505,088, lod=422). The rearrangement breakpoints are distinct from the previously reported kindred12, suggesting independent events, but they are in broadly similar regions, so the pathogenesis is likely to be similar. 3.3. Clinical actions WGS has helped to elucidate the aetiology of the disease in this family with hypoparathyroidism. This finding will be useful in counselling and screening of relatives. Supplementary references 1. 2. 3. Wilkie, A.O. et al. Prevalence and complications of single-gene and chromosomal disorders in craniosynostosis. Pediatrics 126, e391-400 (2010). Allen, R.C., Zoghbi, H.Y., Moseley, A.B., Rosenblatt, H.M. & Belmont, J.W. Methylation of HpaII and HhaI sites near the polymorphic CAG repeat in the human androgen-receptor gene correlates with X chromosome inactivation. Am J Hum Genet 51, 1229-39 (1992). Tilley, W.D., Marcelli, M., Wilson, J.D. & McPhaul, M.J. Characterization and expression of a cDNA encoding the human androgen receptor. Proc Natl Acad Sci U S A 86, 327-31 (1989). Nature Genetics: doi:10.1038/ng.3304 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Froyen, G. et al. Submicroscopic duplications of the hydroxysteroid dehydrogenase HSD17B10 and the E3 ubiquitin ligase HUWE1 are associated with mental retardation. Am J Hum Genet 82, 432-43 (2008). Nava, C. et al. Analysis of the chromosome X exome in patients with autism spectrum disorders identified novel candidate genes, including TMLHE. Transl Psychiatry 2, e179 (2012). Isrie, M. et al. HUWE1 mutation explains phenotypic severity in a case of familial idiopathic intellectual disability. Eur J Med Genet 56, 379-82 (2013). Plenge, R.M., Stevenson, R.A., Lubs, H.A., Schwartz, C.E. & Willard, H.F. Skewed Xchromosome inactivation is a common feature of X-linked mental retardation disorders. Am J Hum Genet 71, 168-73 (2002). Carrel, L. & Willard, H.F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434, 400-4 (2005). McMullin, M.F. The classification and diagnosis of erythrocytosis. Int J Lab Hematol 30, 44759 (2008). Franke, K., Gassmann, M. & Wielockx, B. Erythrocytosis: the HIF pathway in control. Blood 122, 1122-8 (2013). McMullin, M.F. Idiopathic erythrocytosis: a disappearing entity. Hematology Am Soc Hematol Educ Program, 629-35 (2009). Bowl, M.R. et al. An interstitial deletion-insertion involving chromosomes 2p25.3 and Xq27.1, near SOX3, causes X-linked recessive hypoparathyroidism. J. Clin. Invest. 115, 2822-31 (2005). Qi, X.P. et al. RET germline mutations identified by exome sequencing in a Chinese multiple endocrine neoplasia type 2A/familial medullary thyroid carcinoma family. PLoS One 6, e20353 (2011). Slimani, A. et al. Effect of mutations in LDLR and PCSK9 genes on phenotypic variability in Tunisian familial hypercholesterolemia patients. Atherosclerosis 222, 158-66 (2012). Wiestner, A., Schlemper, R.J., van der Maas, A.P. & Skoda, R.C. An activating splice donor mutation in the thrombopoietin gene causes hereditary thrombocythaemia. Nat Genet 18, 49-52 (1998). Bolze, A. et al. Ribosomal protein SA haploinsufficiency in humans with isolated congenital asplenia. Science 340, 976-8 (2013). Elsayed, S.M. et al. Autosomal dominant SCA5 and autosomal recessive infantile SCA are allelic conditions resulting from SPTBN2 mutations. Eur J Hum Genet 22, 286-8 (2014). Wang, Y. et al. A Japanese SCA5 family with a novel three-nucleotide in-frame deletion mutation in the SPTBN2 gene: a clinical and genetic study. J Hum Genet 59, 569-73 (2014). Lise, S. et al. Recessive mutations in SPTBN2 implicate beta-III spectrin in both cognitive and motor development. PLoS Genet 8, e1003074 (2012). Babbs, C. et al. Homozygous mutations in a predicted endonuclease are a novel cause of congenital dyserythropoietic anemia type I. Haematologica 98, 1383-7 (2013). Cossins, J. et al. Congenital myasthenic syndromes due to mutations in ALG2 and ALG14. Brain 136, 944-56 (2013). Petousi, N. et al. Erythrocytosis associated with a novel missense mutation in the BPGM gene. Haematologica 99, e201-4 (2014). Zajac, J.D. & Danks, J.A. The development of the parathyroid gland: from fish to human. Current Opinion in Nephrology and Hypertension 17, 353-6 (2008). Uckun-Kitapci, A., Underwood, L.E., Zhang, J. & Moats-Staats, B. A novel mutation (E767K) in the second extracellular loop of the calcium sensing receptor in a family with autosomal dominant hypocalcemia. Am J Med Genet A 132A, 125-9 (2005). Liu, M. et al. Novel UMOD mutations in familial juvenile hyperuricemic nephropathy lead to abnormal uromodulin intracellular trafficking. Gene 531, 363-9 (2013). Nature Genetics: doi:10.1038/ng.3304 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. Smith, G.D. et al. Characterization of a recurrent in-frame UMOD indel mutation causing late-onset autosomal dominant end-stage renal failure. Clin J Am Soc Nephrol 6, 2766-74 (2011). Lens, X.M., Banet, J.F., Outeda, P. & Barrio-Lucia, V. A novel pattern of mutation in uromodulin disorders: autosomal dominant medullary cystic kidney disease type 2, familial juvenile hyperuricemic nephropathy, and autosomal dominant glomerulocystic kidney disease. Am J Kidney Dis 46, 52-7 (2005). McNally, E.M., Golbus, J.R. & Puckelwartz, M.J. Genetic mutations and mechanisms in dilated cardiomyopathy. J Clin Invest 123, 19-26 (2013). Kirby, A. et al. Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing. Nat Genet 45, 299-303 (2013). Bokil, N.J., Baisden, J.M., Radford, D.J. & Summers, K.M. Molecular genetics of long QT syndrome. Mol Genet Metab 101, 1-8 (2010). Wu, Y. et al. Mutations in ionotropic AMPA receptor 3 alter channel properties and are associated with moderate cognitive impairment in humans. Proc Natl Acad Sci U S A 104, 18163-8 (2007). Martin, H.C. et al. Clinical whole-genome sequencing in severe early-onset epilepsy reveals new genes and improves molecular diagnosis. Human Molecular Genetics (2014). Weckhuysen, S. et al. KCNQ2 encephalopathy: emerging phenotype of a neonatal epileptic encephalopathy. Ann Neurol 71, 15-25 (2012). Nakamura, K. et al. Clinical spectrum of SCN2A mutations expanding to Ohtahara syndrome. Neurology 81, 992-8 (2013). Palles, C. et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nature Genetics 45, 136-44 (2013). Zhou, X.P. et al. Germline mutations in BMPR1A/ALK3 cause a subset of cases of juvenile polyposis syndrome and of Cowden and Bannayan-Riley-Ruvalcaba syndromes. Am J Hum Genet 69, 704-11 (2001). Lipton, L. et al. Germline mutations in the TGF-beta and Wnt signalling pathways are a rare cause of the "multiple" adenoma phenotype. J Med Genet 40, e35 (2003). Schwarzova, L. et al. Novel mutations of the APC gene and genetic consequences of splicing mutations in the Czech FAP families. Fam Cancer 12, 35-42 (2013). Sharma, V.P. et al. Mutations in TCF12, encoding a basic helix-loop-helix partner of TWIST1, are a frequent cause of coronal craniosynostosis. Nat Genet 45, 304-7 (2013). van der Zwaag, P.A. et al. A genetic variants database for arrhythmogenic right ventricular dysplasia/cardiomyopathy. Hum Mutat 30, 1278-83 (2009). Posch, M.G. et al. A missense variant in desmoglein-2 predisposes to dilated cardiomyopathy. Mol Genet Metab 95, 74-80 (2008). Caputo, S. et al. Description and analysis of genetic variants in French hereditary breast and ovarian cancer families recorded in the UMD-BRCA1/BRCA2 databases. Nucleic Acids Res 40, D992-1002 (2012). Panizza, E. et al. Yeast model for evaluating the pathogenic significance of SDHB, SDHC and SDHD mutations in PHEO-PGL syndrome. Hum Mol Genet 22, 804-15 (2013). Fokkema, I.F., den Dunnen, J.T. & Taschner, P.E. LOVD: easy creation of a locus-specific sequence variation database using an "LSDB-in-a-box" approach. Hum Mutat 26, 63-8 (2005). Loeys, B.L. et al. Aneurysm syndromes caused by mutations in the TGF-beta receptor. N Engl J Med 355, 788-98 (2006). Levano, S. et al. Increasing the number of diagnostic mutations in malignant hyperthermia. Hum Mutat 30, 590-8 (2009). Castellone, M.D. et al. A novel de novo germ-line V292M mutation in the extracellular region of RET in a patient with phaeochromocytoma and medullary thyroid carcinoma: functional characterization. Clin Endocrinol (Oxf) 73, 529-34 (2010). Nature Genetics: doi:10.1038/ng.3304 48. 49. 50. Ngo, D.N. et al. Screening of the RET gene of Vietnamese Hirschsprung patients identifies 2 novel missense mutations. J Pediatr Surg 47, 1859-64 (2012). Pickup, M.J. & Pollanen, M.S. Traumatic subarachnoid hemorrhage and the COL3A1 gene: emergence of a potential causal link. Forensic Sci Med Pathol 7, 192-7 (2011). Dandanell, M., Friis-Hansen, L., Sunde, L., Nielsen, F.C. & Hansen, T.V. Identification of 3 novel VHL germ-line mutations in Danish VHL patients. BMC Med Genet 13, 54 (2012). Nature Genetics: doi:10.1038/ng.3304 Supplementary Tables Supplementary Table 1: Concordance between genotypes from WGS and cytoSNP12v1 array data. Sample CAT_919B6 LVNC_1.1.70 LVNC_1.2.83 RTA_2.1.47 RTA_2.2.73 DCM_3.2.55 DCM_3.2.98 CCM_4.2.69 CCM_4.2.92 CQT_8.1.61 EOE_5 ERY_PAR09 Number of SNPs 277,538 279,002 277,784 277,612 276,711 279,271 278,471 277,472 277,571 279,619 278,605 278,016 % concordant genotypes 99.93 99.95 99.94 99.64 99.91 99.93 99.92 99.93 99.92 99.93 99.96 99.95 Mean coverage (concordant/discordant) 23.16/15.85 47.91/39.06 46.22/39.99 21.82/14.39 22.90/16.19 24.66/18.13 25.27/20.78 23.85/16.89 23.77/16.06 26.95/18.78 35.14/26.96 25.87/17.82 Supplementary Table 2: Error profiles between genotypes from WGS and from array data. The results show the proportion of the total SNP comparisons (across all 12 samples) that fall into each error category. cytoSNP12v1 array Genotype HomRef Het HomAlt Nature Genetics: doi:10.1038/ng.3304 HomRef 50.618% 0.019% 0.004% WGS Het 0.018% 29.621% 0.012% HomAlt 0.002% 0.011% 19.696% Supplementary Table 3: The number of putative de novo protein-altering variants (ANNOVAR annotation) in children from trios or quartets under different filtering strategies. When segregation is ignored, this is simply the number of novel heterozygous coding variants as defined by absence from the control datasets indicated. Parental genotypes (either raw calls or jointly called genotypes) were checked to determine whether the variants were likely to be de novo. EOE: Early-onset epilepsy, CRS: craniosynostosis, SC: Saethre-Chotzen syndrome, ERY: erythrocytosis, HCM: hypertrophic cardiomyopathy. Note that all families were treated as trios with one affected child and two unaffected parents for this analysis, even though, for the ERY trio, both the mother and daughter were affected, and, for the MR and EOE quartets, both children were affected. +Note that the mean excludes HCM_2361, MR_6 and MR_8 because the abnormally large number of novel heterozygous variants in these samples is probably due to contamination. Filtered against Segregation ignored 1000G, ESP WGS500 1000G, ESP, WGS500 de novo - raw calls de novo – joint calls ignored de novo - raw calls de novo – joint calls ignored de novo - raw calls de novo – joint calls Nature Genetics: doi:10.1038/ng.3304 CRS 4103 193 CRS 4447 265 CRS 4659 330 CRS 4917 273 EOE EOE EOE EOE 15 18 21 22 469 395 336 314 ERY PAR07c 285 HCM + 2361 1194 15 25 38 24 25 35 64 43 36 641 1 4 4 3 4 1 5 0 2 11 78 78 88 83 274 200 59 65 106 888 1 1 1 1 3 3 4 3 3 0 1 1 1 3 1 0 0 54 54 62 59 252 176 44 1 1 1 1 3 3 0 1 1 1 3 1 MR MR EOE EOE EOE EOE SC + + 6 8 12 2 5 7 2930 604 611 281 179 277 258 242 121 138 + Mean (reduction relative to row above) 292.6 20 21 51 31 21 32.1 (89%) 1 5 4 1 1 2.6 (92%) 109 124 147 57 97 86 89 107.6 529 31 41 2 2 1 3 1 2.1 (98%) 2 7 3 3 1 2 0 1 1 1 (52%) 50 81 851 102 114 93 42 78 63 63 83.6 4 3 3 521 30 41 2 2 1 3 1 2.1 (98%) 0 0 2 7 3 3 1 2 0 1 1 1 (52%) 3 3 Supplementary Table 4: The number of putative simple recessive (i.e. homozygous) protein-altering variants (ANNOVAR annotation) in trio and quartet children under different filtering strategies. When segregation is ignored, this is simply the number of rare homozygous coding variants as defined by their frequency in the control datasets indicated. In order to classify the variant as recessive, the parents were both required to be heterozygous (in the raw calls). Manual inspection revealed that many of the homozygous variants filtered out using the other WGS500 samples tended to be in low complexity regions which are prone to spurious or incorrect calls, or have very low coverage. +Note that the mean excludes HCM_2361, MR_6 and MR_8 because of suspected sample contamination, EOE_5 due to a uniparental disomy for chromosome 9, and EOE_12 due to consanguinity. Filtered against 1000G, ESP WGS500 1000G, ESP, WGS500 Segregation CRS 4103 CRS 4447 CRS 4659 CRS 4917 EOE 15 ignored 72 78 74 88 75 78 recessive 2 1 1 1 2 ignored 2 4 0 3 2 recessive 0 2 0 0 ignored 1 3 0 recessive 0 1 0 Nature Genetics: doi:10.1038/ng.3304 EOE EOE 18 21 EOE 22 ERY PAR07c HCM + 2361 79 85 82 93 5 3 2 4 4 3 1 2 0 6 1 1 0 0 0 1 3 1 3 1 1 0 0 0 1 0 0 0 MR MR + + 6 8 91 EOE + 12 EOE EOE EOE SC + 2 5 7 2930 + 76 87 87 Mean (reduction relative to row above) 80.8 3 2 7 3 2.8 (96.8%) 1 13 4 1 1.9 34 1 0 1 0 0.5 (73.9%) 7 9 1 11 3 1 1.5 1 7 1 0 0 0 0.25 (83.3%) 68 85 85 1 2 10 18 13 40 0 3 6 18 1 0 Supplementary Table 5: Search terms used to define candidate gene lists for early-onset epilepsy for analysis of the burden of variants of unknown significance. Databases were accessed in September 2012. HGMD, Human Gene Mutation Database, www.hgmd.cf.ac.uk/; MIPS, Mammalian Protein-Protein Interaction Database, http://mips.helmholtz-muenchen.de/genre/proj/corum; GO, Gene Ontology, www.geneontology.org. Tier Database 1 2 HGMD HGMD MIPS HGMD 3 GO Nature Genetics: doi:10.1038/ng.3304 Search term Ohtahara syndrome (disease/phenotype) epileptic encephalopathy (disease/phenotype) early infantile epileptic encephalopathy (disease/phenotype) epilepsy (disease/phenotype) seizure (all fields) gene names from tier 1 ion channel (all fields) ion channel (gene description) ion channel (gene ontology) brain development (gene ontology) brain development (all fields) ion channel complex brain development Number of genes 1 10 4 83 1 9 240 134 226 78 98 200 335 Total number of unique genes 10 82 679 Supplementary Table 6: The variants thought to be causal (classes A, B or C) or possibly causal (class D or E, still under investigation) in the EOE and CRS trios and the MR quartet. The EOE quartet and HCM trio are not listed because they currently have no good candidates, and the ERY trio (with parent and child affected) is described in the section on the noncoding variant in EPO. Sample Gene Class Consequence Inheritance Candidate gene CRS_4103 MNT E nonsense de novo no CRS_4447 ZIC1 A nonsense de novo no, but in EOE tier 3 CRS_4917 TECPR1; THNSL2 E nonsynonymous; nonsynonymous de novo; de novo no; no CRS_4659 HUWE1 B nonsynonymous de novo no, but in MR tier 1 SC_2930 CDC45 A nonsynonymous and splicing (synonymous) recessive (compound heterozygous) no EOE_0007 SCN2A C nonsynonymous de novo tier 2 EOE_0012 PIGQ A essential splice site recessive no EOE_0015 CSNK1G1 D nonsynonymous de novo no EOE_0018 CBL D essential splice site de novo no EOE_2 KCNQ2 C nonsynonymous de novo tier 2 EOE_5 KCNT1 B nonsynonymous recessive - UPD9 tier 3 MR_6, MR_8 GRIA3 C nonsynonymous X-linked tier 1 Nature Genetics: doi:10.1038/ng.3304 Supplementary Table 7: Results from two-sided Fisher’s exact tests for a difference in the frequency of pseudo-candidate variants in candidate genes between trio probands and controls. Note that the numbers in each row do not add to 216 (the total number of individuals included in Figure 2), since related unaffected individuals were excluded from the control set, as were the individuals in the EOE quartet, because they might be expected to carry variants in these genes too. Disease Variant Genotype novel heterozygous EOE rare homozygous novel heterozygous CRS rare homozygous Nature Genetics: doi:10.1038/ng.3304 Tier Number of controls without variants Number of controls with variants Number of cases without variants Number of cases with variants 1 2 3 1 2 3 1 2 3 1 2 3 146 98 12 147 144 126 114 102 48 149 145 139 2 50 136 1 4 22 35 47 101 0 4 10 6 2 0 6 6 3 4 5 1 5 5 4 0 4 6 0 0 3 1 0 4 0 0 1 Odds ratio 0 3.88 NA 0 0 5.63 0.82 0 1.89 0 0 3.43 p value 1 0.18 1 1 1 0.05 1 0.32 1 1 1 0.31 Supplementary Table 8: Summary of conditions for which pathogenic genes were identified (class A, B or C), with evidence for pathogencity. aGene and/or variant will be reported in an independent publication. bCausal variant discovered independently of WGS500. cReference to publications on WGS500 case studies. Disease Gene Class of mutation Evidence for pathogenicity WGS500 Referencec Acquired essential thrombocytosis THPO splicing Asplenia RPSAb splicing SPTBN2 nonsense Common variable immunodeficiency disorder a missense Congenital dyserythropoietic anaemia, type 1 C15ORF41 missense ALG2 missense Cerebellar ataxia Congenital myasthenic syndrome Nature Genetics: doi:10.1038/ng.3304 The same variant (NM_001177598: c.13+1G>C) has been described in two families before 13,14 and arose independently. Wiestner et al. showed that it leads to skipping of exon 3 in the 5' UTR, which increased thrombopoietin protein causing enhanced megakaryopoiesis and thus increased platelet count in the affected individual15. This variant was used clinically: additional family members were screened, provided with counselling and offered appropriate haematology follow up. Bolze et al. found that rare heterozygous RPSA mutations were significantly enriched in asplenia patients compared to controls (8/23 cases vs. 1/508 controls)16. This mutation (NM_002295.4:c.34+5G>C ) leads to an impaired splicing at the end of exon 1 of RPSA, producing an insertion of 70bp in 80% of the transcripts coming from that allele. Overall the mutation leads to a 40% reduction of the RPSA protein level. This is a homozygous stop mutation (NM_006946:c.1881C>A:p.C627X), and a mouse knock out has the same phenotype. Another family with a different homozygous stop mutation and identical phenotype and additional cases with truncating mutations have been published17,18. The family have requested consideration for using this for prenatal diagnosis. This gene is part of the TNFR superfamily and involved in B cell activation and proliferation. This variant is associated with B cell defects in its homozygous and heterozygous state. Although there is published evidence for a functional B cell defect in carriers, there seems to be incomplete penetrance and we therefore consider that it contributes to, rather than causes the phenotype. This variant (NM_001130010:c.533T>A:p.L178Q) was present in all 3 affected family members of this pedigree. A second homozygous missense change (Y94C) was later identified in the same gene in four further CDA-I patients from two unrelated pedigrees. C15ORF41 has been added to the genes resequenced in unexplained anaemia patients by the NHS diagnostic service and they have recently identified a further individual from an unrelated pedigree with CDA-I caused by the L178Q change identified in the WGS study. This makes a total of 8 individuals from 4 unrelated pedigrees with CDA-I caused by this gene and work is ongoing onto the pathogenic mechanism. The recessive variant (NM_033087:c.203T>G:p.V68G) segregates with disease within the family: the parents and unaffected brother are heterozygous, the patient homozygous. Expression of the protein derived from patient muscle biopsy was severely reduced versus control muscle biopsies. Expression from cDNA encoding the ALG2 mutation in HEK293 cells showed severely reduced ALG2 protein expression versus controls. Information was used in the clinic for genetic counselling, and to provide appropriate treatment: a cholinesterase inhibitor (pyridostigmine) in combination with either salbutamol or ephedrine. N/A N/A 19 N/A 20 21 Craniosynostosis Erythrocytosis Familial hypoparathyroidism ZIC1 nonsense HUWE1 missense EPO noncoding BPGM missense SOX3 noncoding Nature Genetics: doi:10.1038/ng.3304 This patient had bicoronal synostosis and severe learning disability and was found to have a de novo nonsense mutation in ZIC1 (NM_003412.3: c.1163C>A: p.S388*). We found three other similar patients with de novo nonsense mutations in this gene, one by exome sequencing and two by resequencing it in 342 patients. The transcript escapes nonsense-mediated decay and cDNA constructs showed altered activity in biological assays. This information was used clinically. This patient had multisuture craniosynostosis and mild learning disability. A de novo missense mutation (NM_031407.6: c.329G>A:p.R110Q) was found in HUWE1, which was previously reported for mental retardation with craniofacial features. The mutation affects a very highly conserved residue in a domain of unknown function (DUF908). The gene is large, spanning 154,641 bp and comprising 84 exons, and, because of extensive heterogeneity in CRS, the contribution to the disease is likely to be low, and thus it was not surprising that we did not find any other HUWE1 mutations in a cohort of 47 unrelated cases with complex CRS. The mutation was shown to have originated on the paternal X chromosome (Figure 3B and Supplementary Figure 8). Unexpectedly, cells from the patient show preferential inactivation of the maternally inherited, wild-type X (Figure 3C) and, consistent with these two observations, only the mutant allele was expressed in the two tissues (fibroblast and transformed lymphoblasts) available for analysis (Figure 3D). Whilst this work was under review we identified, using exome sequencing, a different de novo hemizygous mutation altering the same amino acid of HUWE1 (c.328C>T encoding p.R110W) in a boy presenting with metopic craniosynostosis, moderate-severe learning disability and other dysmorphic features. See the Supplementary Note for further details. The same variant at a highly conserved base (NM_000799.2:c.-136G>A; Figure 4A) within the 5’ UTR of the erythropoietin gene EPO was identified in two independent families with erythrocytosis and cosegregated with the disease. EPO is a strong candidate gene for erythrocytosis as erythropoietin is essential for red cell production and increased erythropoietin levels lead to increased red cell mass, the hallmark of erythrocytosis. The genetic evidence for causality of this EPO variant is strong: it is the only rare exonic variant found in an extended (8 Mb) region that is identical-by-descent in the affected individuals in these two unrelated families (the only such region), suggesting that it had a single mutational origin. See the Supplementary Note for further details. The patient inherited this mutation (NM_001724:c.269G>A:p.R90H) from his mother, who was asymptomatic but had hemoglobin levels on the upper end of the normal range. Both he and his mother had significantly lower levels of BPGM in red blood cells. BPGM deficiency affects the hemoglobin-oxygen dissociation curve, which leads to less available oxygen, stimulating red cell production. Other erythrocytosis patients with BPGM deficiency have been reported, and the same residue is mutated in other patients. A complex interstitial insertion-deletion leading to deletion of 1.4 kb of the X chromosome and insertion of 50 kb from chromosome 2p was discovered in a patient with X-linked hypoparathyroidism. This variant lies 81.5 kb downstream of SOX3, segregates with the disease and is similar to, but distinct N/A N/A N/A 22 N/A from, an event previously reported in an independent kindred12. SOX3 is a strong candidate since it is known to be involved in the development of the parathyroid gland23. See the Supplementary Note for further details. CASR UMOD Familial tubulointerstitial nephropathy Hypertrophic cardiomyopathy (sarcomere genenegative) Inflammatory bowel syndrome/colitis Interstitial nephritis UMOD MYBPC3c a MUC1 Nature Genetics: doi:10.1038/ng.3304 missense This variant (NM_000388:c.2299G>C:p.E767Q) co-segregates with disease in the family (Supplementary Figure 7), and a previously reported mutation at the same location (E767K) causes a similar phenotype (autosomal dominant hypocalcaemia)24. It was missed by prior sequencing in a UK research lab. missense This variant (NM_001008389:c.410G>A:p.C137Y) co-segregates with disease in the family (Supplementary Figure 7). The UMOD gene is well known in FJHN25, and this variant is located in a region in which multiple mutations have been observed before (cbEGF3). The variant was missed by prior sequencing in a non-UK research lab. missense (inframe This is a complex indel comprising chr16:20360333, CCTTCGGGGCAG > C insertion/deletion) (NM_001008389:c.279_289del:p.93_97del) and chr16:20360345, A > AGGAGGCGG (NM_001008389:c.278_279insCCGCCTCC:p.V93fs). Pathogenicity is strongly supported by a previously published study describing four kindreds with the same indel and phenotype26 and another paper describing this indel in a family with Autosomal Dominant Medullary Cystic Kidney Disease type 227. Some of the families in the paper by Smith, et al. 26 have the same haplotype as this one. The UMOD gene was not initially suspected due to late presentation and absence of gout. This discovery has been used clinically to improve diagnosis of other family members, and potentially identify suitable kidney donors within the family. nonsense We found a heterozygous nonsense mutation (NM_000256:c.1303C>T:p.Q435X) in one of two affected cousins, who had not had prior clinical genetic testing. Segregation data are not available in his family, but mutations in MYBPC3 and MYH3 cause 75% of HCM cases28, and so this variant seems highly likely to be causal. The cause for HCM in the other cousin and his brother are still unidentified. missense This gene is involved epithelial stress response and in the production of reactive oxygen species. The variant is extremely rare: absent from 1000 Genomes and from 4000 IBD patients. Plasmid data have demonstrated that the protein is defective, and data from biopsies and primary epithelial organoids from case and controls showed defects in protein function. Recent publications suggest that defects in gene function increase i29n colitis susceptibility in animal models. Further details will be disclosed in a subsequent publication. a Mutations in MUC1 cause medullary cystic kidney disease type 1. Variants in this gene are not amenable to identification by WGS due to segmental duplications, and this variant was found by another method. N/A N/A N/A N/A N/A N/A Long QT syndrome KCNQ1 frameshift Mental retardation GRIA3 missense Ohtahara syndrome and other earlyonset epilepsies PIGQ splicing KCNT1 missense KCNQ2 missense SCN2A missense POLD1 POLE missense missense MSH6 missense and nonsense BMPR1A frameshift APC splicing TCF12 nonsense Multiple adenoma Saethre-Chotzen syndrome (TWIST1 Nature Genetics: doi:10.1038/ng.3304 KCNQ1 is a well known gene in long QT syndrome30. This variant (NM_000218:c.1195_1196insC:p.A399fs), which leads to a frameshift and premature stop codon, segregates in the family and was missed in original HPLC clinical genetic testing. It is now being used for cascade testing. GRIA3, which encodes an ionotropic glutamate receptor, has previously been implicated in X-linked mental retardation31. Both affected brothers inherited the mutation from their heterozygous mother, who is phenotypically normal. It lies in the highly conserved channel region, and electrophysiology experiments showed that it affects gating of the channel. Functional studies are underway, and further details will be disclosed in a subsequent publication. We found a recessive mutation (NM_004204:c.690-2A>G) that affected splicing of PIGQ and led to defective glycophosphatidyl inositol (GPI) biosynthesis. Mutations in other GPI pathway genes, including PIGA, the binding partner of PIGQ, have been implicated in various syndromes that involve seizures. This patient had uniparental isodisomy for chromosome 9, which led to a missense variant in KCNT1 (NM_020822: c.2896G>A:p.A966T) becoming homozygous. This gene had previously been implicated in other types of epilepsy, and electrophysiology experiments demonstrated an effect on channel current. This patient had a de novo mutation in KCNQ2 (NM_004518:c.827C>T:p.T276I which falls in a highly conserved transmembrane segment of the channel that forms part of the pore and is two amino acids away from the T274M mutation recently described in another patient33. This patient had a de novo mutation in SCN2A (NM_001040143:c.5558A>G:p.H1853R). It falls in the cytosolic C-terminal region of the protein; other de novo mutations in the cytosolic domains were recently reported in patients with Ohtahara Syndrome34 The POLD1 (NM_002691:c.G1433A:p.S478N) and POLE (NM_006231:c.1270C>G:p.L424V) variants cosegregated in multiple families, were over-represented in cases versus controls, and functional assays in yeast showed they caused hypermutation This patient had possible compound heterozygote mutations in MSH6 (NM_000179:c.G2315A:p.R772Q and NM_000179:c. 2731C>T:p.R911*). Nonsense mutations in this are known to cause Lynch syndrome, which predisposes to colorectal cancer. N/A We found a rare frameshift mutation in BMPR1A (NM_004329.2:c.142_143insT:p.Thr49Asnfs*22), a known juvenile polyposis 36and multiple adenoma gene37. This mutation (NM_001127511:c.251-2A>G) affects a canonical splice site, consistent with knowledge that early APC exon mutations and splice mutations cause attenuated polyposis38. One patient had a nonsense mutation (NM_207037.1:c.1283T>G; p.L428*) and another a splicing mutation (NM_207037.1:c.1035+3G>C; called intronic by Annovar on RefSeq transcripts) which was N/A N/A 32 35 N/A N/A 39 negative) TCF12 splicing CDC45 synonymous (splicing) and missense Nature Genetics: doi:10.1038/ng.3304 shown to lead to the skipping of exon 12. Mutations in TCF12 were found in four other patients by exome sequencing, and 32/341 patients in whom these genes was resequenced; all had coronal synostosis. This patient was compound heterozygous for one missense (NM_001178010.2:c.773A>G;p.D258G) and one synonymous (NM_001178010.2:c.318C>T;p.V106=;) variant in CDC45. The synonymous variant was found to cause skipping of exon 4. Two other coronal synostosis patients with compound heterozygous missense mutations were found: one by exome sequencing and the other by resequencing 427 cases. N/A Supplementary Table 9: Breakdown of results by project category for all 156 projects, with the percentage of the total for that project category indicated in parentheses. See Online Methods for an explanation of results class and of project category. The totals for the broader project categories are shaded grey. Project category 1.1 1.2 1.3 1.4 1 2.1 2.2 2 3 4 Total Nature Genetics: doi:10.1038/ng.3304 A 4 1 0 0 5 (10.6%) 3 0 3 (14.3%) 2 (3.7%) 2 (5.9%) 12 (7.7%) B 0 1 0 0 1 (2.1%) 2 0 2 (9.5%) 1 (1.9%) 0 4 (2.6%) Result class C 5 0 1 2 8 (17%) 3 1 4 (19.1%) 2 (3.7%) 3 (8.8%) 17 (10.9%) D 9 5 0 0 14 (29.8%) 2 2 4 (19.1%) 12 (22.2%) 4 (11.8%) 34 (21.8%) E 9 2 0 8 19 (40.4%) 4 4 8 (38.1%) 37 (68.5%) 25 (73.5%) 89 (57%) Total 27 9 1 10 47 14 7 21 54 34 156 Supplementary Table 10: Incidental findings deemed not to be significant. The frequencies in the UK10K twins cohort and in the Exome Variant Server European American (EVS_EA) cohort are shown. VUS: variant of unknown significance; NS: nonsynonymous; ARVC: Arrhythmogenic right ventricular cardiomyopathy; LOVD: Leiden Open Variation Database; UMD: Universal Mutation Database; GSDB: Genome Sequence Database. Potential Incidental Finding Condition Gene Variant Effect UK10K Twins EVS_EA Comments on pathogenicity Arrhythmogenic right ventricular cardiomyopathy DSG2 NM_001943:c.473T>G:p.V158G NS 0.0071 0.0079 DSG2 NM_001943:c.2759T>G:p.V920G NS 0.0057 0.0050 DSG2 NM_001943:c.1174G>A:p.V392I NS 0.0017 0.0021 DSP NM_004415:c.4372C>G:p.R1458G NS 0.0020 0.0021 DSP NM_001008844:c.88G>A:p.V30M NS 0.0006 0.0019 DSP BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 LDLR NM_001008844.1:c.2815G>A:p.G939S NM_000059:c.9586A>G:p.K3196E NM_000059:c.8182G>A:p.V2728I NM_000059:c.9976A>T:p.K3326* NM_000059:c.1151C>T:p.S384F NM_000059:c.223G>C:p.A75P NM_001195800:c.1371C>T:p.N457= 0.0031 0.0100 0.0014 0.0003 0.0006 0.0001 0.0001 0.0045 0.0084 0.0015 0.0005 0.0002 LDLR NM_001195800:c.1372G>A:p.E458K NS NS NS nonsense NS NS spliceacceptor NS PCSK9 NM_174936:c.520C>T:p.P174S NS Classified as variant of ‘no known pathogenicity’ in ARVC database40 based on 9 independent reports. Classified as variant of ‘no known pathogenicity’ in ARVC database40 based on 8 independent reports. Lack of segregation reported by 41. Classified as variant of ‘no known pathogenicity’ in ARVC database40 based on 14 independent reports. Classified as variant of ‘no known pathogenicity’ in ARVC database40 based on 2 independent reports. Classified as variant of ‘no known pathogenicity’ in ARVC database40 based on 10 independent reports. Classified as VUS in ARVC database40. classified as VUS in UMD for BRCA242. classified as neutral in UMD for BRCA242. classified as neutral in UMD for BRCA242. classified as neutral in UMD for BRCA242. classified as neutral in UMD for BRCA242. Codon AAC to AAT, synonymous, not near splice site Not listed in UK or Dutch LDLR databases. http://www.ucl.ac.uk/ldlr/Current/search.p hp?select_db=LDLR&srch=all&page=8 Report indicating this may be protective of (not risk factor for) hypercholesterolaemia14 Breast cancer Familial hypercholesterolaemia Nature Genetics: doi:10.1038/ng.3304 0.0003 0.0001 normal in a yeast assay 43 0.0022 +10 bp into intron and not obviously affecting splicing not listed in LOVD44 for MSH2 http://chromium.liacs.nl/LOVD2/colon_canc er/variants.php?select_db=MSH2&action=vi ew_all Likely neutral in Universal Mutation Database for MSH6 http://www.umd.be/MSH6/ Single report of Loeys-Dietz syndrome with non-penetrant parent45, no corroborative functional data. 6 samples in EVS with variant. Not listed as causative in European Malignant Hyperthermia Group database https://emhg.org/genetics/mutations-inryr1/ . Not segregating according to 46 Not listed as causative in European Malignant Hyperthermia Group database https://emhg.org/genetics/mutations-inryr1/. Not listed as causative in European Malignant Hyperthermia Group database https://emhg.org/genetics/mutations-inryr1/. Gain of function mutation in two independent MTC reports, in vitro functional data indicating weakly transforming on transfection (phosphotyrosine activity and proliferation rates)13,47. Mutation also SDHD NM_003002:c.158C>T:p.P53L NS MLH1 NC_000003.11:g37056045A>G MSH2 NM_000251.1:c.1886A>G:p.G629R splicedonor NS MSH6 NM_000179:c.1526T>C:p.V509A NS Loeys-Dietz syndrome (aortic aneurysm) TGFBR1 NM_001130916:c.1202A>G:p.N401S NS absent 0.0007 Malignant hyperthermia RYR1 NM_000540:c.4055C>G:p.A1352G NS 0.0080 0.0000 RYR1 NM_000540:c.4178A>G:p.K1393R NS 0.0040 0.0058 RYR1 NM_000540:c.7025A>G:p.N2342S NS 0.0011 0.0013 RET NM_020630:c.874G>A:p.V292M NS absent absent Hereditary paraganglioma pheochromocytoma syndrome Inherited colorectal cancer Phaeochromocytoma & medullary thyroid carcinoma (MTC) Nature Genetics: doi:10.1038/ng.3304 0.0048 0.0013 Vascular EhlersDanlos syndrome (subarachnoid haemorrhage) (EDS) Von Hippel-Lindau disease COL3A1 NM_000090:c.812G>A:p.R271Q NS 0.0017 0.0038 COL3A1 NM_000090:c.3938A>G:p.K1313R NS 0.0017 0.0026 VHL NM_000551.2:c.340+5G>C splicedonor 0.0009 Nature Genetics: doi:10.1038/ng.3304 described as loss-of-function in two patients with Hirschsprung disease48. Listed as probably pathogenic in EDS database https://eds.gene.le.ac.uk/variants.php?sele ct_db=COL3A1&action=view_all. Report of potential pathogenic association between COL3A1 and subarachnoid haemorrhage or arteriopathy in 4 patients including R271Q49. Present in 37 EVS samples suggesting absolute risk low or absent. Listed as probably pathogenic in EDS database. K1313R reported for one patient in 49. Present in 23 EVS samples suggesting absolute risk low or absent. 340+5G>C, reported as benign in 50