Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Graduate Theses and Dissertations Graduate College 2012 Gene mapping of monogenic disorders and complex diseases via genome wide association studies Xia Zhao Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd Part of the Animal Diseases Commons, and the Molecular Biology Commons Recommended Citation Zhao, Xia, "Gene mapping of monogenic disorders and complex diseases via genome wide association studies" (2012). Graduate Theses and Dissertations. Paper 12878. This Dissertation is brought to you for free and open access by the Graduate College at Digital Repository @ Iowa State University. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Digital Repository @ Iowa State University. For more information, please contact [email protected]. Gene mapping of monogenic disorders and complex diseases via genome wide association studies by Xia Zhao A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Genetics Program of Study Committee: Max Rothschild, Co-major Professor Dorian Garrick, Co-major Professor Susan Lamont N. Matthew Ellinwood Peng Liu Iowa State University Ames, Iowa 2012 Copyright © Xia Zhao, 2012. All rights reserved. ii TABLE OF CONTENTS LIST OF FIGURES iv LIST OF TABLES vi ABSTRACT CHAPTER1. GENERAL INTRODUCTION INTRODUCTION RESEARCH OBJECTIVES DISSERTATION ORGANIZATION LITERATURE REVIEW REFERENCES vii 1 1 2 2 4 27 CHAPTER 2. A NOVEL NONSENSE MUTATION IN THE DMP1 GENE IDENTIFIED BY A GENOME-WIDE ASSOCIATION STUDY IS RESPONSIBLE FOR INHERITED RICKETS IN CORRIEDALE SHEEP ABSTRACT INTRODUCTION RESULTS DISCUSSION MATERIALS and METHODS REFERENCES CHAPTER 3. A MISSENSE MUTATION IN AGTPBP1 WAS IDENTIFIED IN SHEEP WITH A LOWER MOTOR NEURON DISEASE ABSTRACT INTRODUCTION METHODS RESULTS DISCUSSION ACKNOWLEDGEMENTS REFERENCES CHAPTER 4. IN A SHAKE OF A LAMB’S TAIL: USING GENOMICS TO UNRAVEL A CAUSE OF CHONDRODYSPLASIA IN TEXEL SHEEP SUMMARY INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES 38 38 39 41 44 47 51 61 61 62 64 68 72 76 76 87 87 88 91 95 99 105 CHAPTER 5. BAYESIAN INFERENCE IDENTIFIES A CANDIDATE REGION ASSOCIATED WITH CANINE CRYPTORCHIDISM THAT INCLUDES THE AMHR2 GENE ABSTRACT INTRODUCTION 120 120 121 iii METHODS RESULTS DISCUSSION ACKNOWLEDGEMENTS REFERENCES CHAPTER 6. GENERAL CONCLUSIONS 125 129 133 138 138 157 FUTURE RESEARCH CONCLUSIONS 161 162 ACKNOWLEDGEMENTS 164 iv LIST OF FIGURES CHAPTER1. GENERAL INTRODUCTION Figure 1. Structures involved in testicular descent 1 20 CHAPTER 2. A NOVEL NONSENSE MUTATION IN THE DMP1 GENE IDENTIFIED BY A GENOME-WIDE ASSOCIATION STUDY IS RESPONSIBLE FOR INHERITED RICKETS IN CORRIEDALE SHEEP Figure 1. Corriedale sheep with rickets Figure 2. Work flow chart for discovering the mutation locus of inherited rickets in Corriedale sheep Figure 3. Mutation analyses Figure S1. Complete predicted DMP1 protein sequence alignments among rickets affected sheep, wild type sheep and cow 38 54 55 56 57 CHAPTER 3. A MISSENSE MUTATION IN AGTPBP1 WAS IDENTIFIED IN SHEEP WITH A LOWER MOTOR NEURON DISEASE Figure 1. The distribution of identical homozygous fragments in affected lambs Figure 2. Work flow chart for discovering the mutation locus of low motor neuron disease in Romney sheep Figure 3 The conserved domain structures of partial protein sequence of AGTPBP1 Figure S1. Conserved amino acids predicted by “CD-search” Figure S2. Characteristics of protein AGTPBP1 Figure S3. Secondary structure modeling of the AGTPBP1 protein indicates critical amino acid residues 61 79 80 81 84 85 86 CHAPTER 4. IN A SHAKE OF A LAMB’S TAIL: USING GENOMICS TO UNRAVEL A CAUSE OF CHONDRODYSPLASIA IN TEXEL SHEEP 87 Figure 1. Chondrodysplasia of Texel sheep 109 Figure 2. Work flow chart for discovering the mutation locus of chondrodysplasia in Texel sheep 110 Figure 3. The topology prediction of transmembrane domains (TMDs) of ovine SLC13A1 protein 111 Figure 4. The direct sequencing results of the causative mutation (g.25513delT) 112 Figure S1. The pedigree information for the 23 sheep genotyped by ovine SNP chip 113 Figure S2. The distribution of homozygous fragments commonly in affected lambs 114 Figure S3. The CNV region display 115 Figure S4. The comparison of the SLC13A1 protein sequences among 7 species 116 CHAPTER 5. BAYESIAN INFERENCE IDENTIFIES A CANDIDATE REGION ASSOCIATED WITH CANINE CRYPTORCHIDISM THAT INCLUDES THE AMHR2 GENE 120 Figure 1. Re-grouping dogs based on the value of cluster membership estimated from STRUCTURE 144 v Figure 2. Whole genome association analyses for subgroup 1 comprising 129 mostly USA sourced (60 cases and 69 controls) Siberian Huskies 145 Figure S1. Whole genome association analyses for subgroup 2 comprising 41 mixed sourced (26 cases and 15 controls) Siberian Huskies 146 Figure S2. Whole genome association analyses for subgroup 3 comprising 35 mostly UK sourced (20 cases and 15 controls) Siberian Huskies 147 Figure S3. Whole genome association analyses for a pooled sample of 204 Siberian Huskies in which 105 were diagnosed with cyptorchidism 148 vi LIST OF TABLES CHAPTER 2. A NOVEL NONSENSE MUTATION IN THE DMP1 GENE IDENTIFIED BY A GENOME-WIDE ASSOCIATION STUDY IS RESPONSIBLE FOR INHERITED RICKETS IN CORRIEDALE SHEEP Table 1. The genotyping results of the causative mutation (R145X) Table S1. Primer sequence, annealing temperature, pcr amplicon information and genetic variants identified by sequencing Table S2. The list of ovine positional candidate genes based on the bovine reference gene sequences 38 54 58 59 CHAPTER 3. A MISSENSE MUTATION IN AGTPBP1 WAS IDENTIFIED IN SHEEP WITH A LOWER MOTOR NEURON DISEASE Table 1. The genotyping results of three exonic SNPs Table S1 primer sequences, annealing temperatures, pcr amplicon information and targeted encoding regions of gene AGTPBP1 Table S2. The list of ovine positional candidate genes based on the bovine reference gene sequences 61 79 82 83 CHAPTER 4. IN A SHAKE OF A LAMB’S TAIL: USING GENOMICS TO UNRAVEL A CAUSE OF CHONDRODYSPLASIA IN TEXEL SHEEP 87 Table 1. The genotyping results of the causative mutation (g.25513del) 108 Table S1. A list of candidate gene symbol, primer sequence, annealing temperature, PCR amplicon information, genetic variants identified by sequencing and accessed template sequence 117 CHAPTER 5. BAYESIAN INFERENCE IDENTIFIES A CANDIDATE REGION ASSOCIATED WITH CANINE CRYPTORCHIDISM THAT INCLUDES THE AMHR2 GENE 120 Table1. The top 1 Mb windows with the high percentage of genetic variance obtained from bayesb analyses in different Siberian Husky subgroups 149 Table 2. The results of bayesb analyses on 17 previously reported genes being associated with cryptorchidism 150 Table S1. Re-grouping dogs based on the value of cluster membership (k=3) 151 Table S2. A list of genes in the putative regions for subgroup1 152 Table S3. A list of genes in the putative regions for subgroup2 and subgroup3 155 Table S4. A list of genes in the putative regions for all Siberian huskies (including subgroup1, 2 and 3) 156 vii ABSTRACT Inherited rickets, lower motor neuron disease and chondrodysplasia were recently reported genetic disorders that had been identified on sheep farms in New Zealand. Animals with rickets showed softening and weakening bone, and were characterized by decreased growth rate, thoracic lordosis and angular limb deformities. Sheep diagnosed with lower motor neuron disease were characterized by poor muscle tone and progressive weakness from about one week of age, leading to severe tetraparesis, recumbency and muscle atrophy. Chondrodysplasia in Texel sheep was characterized by dwarfism and angular deformities of the forelimbs. These three disorders were determined likely simple autosomal recessive by backcross breeding trials. One complex disease called cryptorchidism was found with high prevalence in Siberian Husky dogs. This disease is a congenital disorder characterized by failure of one or both testicles to descend into the scrotum. In order to map gene loci and thus the mutations involved, genome wide association studies were conducted using the Illumina OvineSNP50 BeadChip for the three monogenic disorders and the Illumina CanineHD BeadChip for cryptorchidism. Several mapping strategies were utilized including homozygosity mapping, Bayesian inference, case-control analysis and these were followed by fine mapping with a candidate gene approach. In results, a nonsense mutation 250C > T in exon 6 of the dentin matrix protein 1 gene (DMP1) was revealed as the putative cause of inherited rickets in Corriedale sheep. The missense mutation c.2909G > C on exon 21 of the ATP/GTP-binding protein 1 gene (AGTPBP1) was identified likely being responsible for lower motor neuron disease in Romney sheep. A 1-bp deletion (g.25513delT) on exon 3 in the solute carrier family 13, viii member 1 gene (SLC13A1) was discovered for chondrodysplasia seen in Texel sheep. The 100% concordant rate of genotypes for mutations in certain sheep populations with the recessive pattern of inheritance for the three disorders indicates a causative relationship between these mutations with their related monogenic disorders. Gene knockout mice and studies of similar diseases in humans provide more evidence to support this conclusion. In the cryptorchidism study, the anti-mullerian hormone type II receptor gene (AMHR2) was found located in a 1Mb window on dog chromosome CFA27 which is associated with cryptorchidism using a BayesB approach in a subgroup of Siberian Husky dogs. Previous human studies support AMHR2 is a very promising functional candidate gene for the development of cryptorchidism. In conclusions, we have successfully identified putative causative mutations responsible for the three monogenic disorders in sheep, and also found a promising candidate gene in a putative region associated with cryptorchidism in dogs. Our finding could shed light on further investigations of the role of involved genes and the affected animals could be used as animal models for human diseases with similar mechanisims. 1 CHAPTER1. GENERAL INTRODUCTION INTRODUCTION Genetic disorders in humans and animals include illnesses caused by abnormalities in genes or chromosomes, those abnormalities typically existing at birth. Disorders can be passed down from the parents’ genes, or caused by de novo mutations to the DNA loci. Disorders caused by a single mutated gene are called single or monogenic defects. Over 4,000 human diseases belong to this kind (Wikipedia, http://en.wikipedia.org/wiki/Genetic_disorder). Single gene disorders are relatively easy to be studied especially when a supportive pedigree exists. In this dissertation, we described detection of causative mutations involved in three sporadically occurring heritable disorders in sheep including inherited rickets, lower motor neuron disease and chondrodysplasia. These defects all show an inheritance pattern that reflects a simple gene autosomal recessive basis. Homozygosity mapping was applied to locate the genomic regions and then the positional candidate gene approach was carried out to search for the causative mutation responsible for each of these disorders. Genetic disorders might not be monogenic, but can be caused by complex, multifactorial or so-called polygenic effects. The effects of multiple genes and environmental factors would contribute to the development of complex diseases. Compared to monogenic disorders, the genetic basis underlying complex disorders are not simply deciphered. Different strategies implemented in the genetic analyses were utilized for this kind of diseases. Here we did a whole genome scan for a complex disorder called cryptorchidism in Siberian Husky dogs. Cryptorchidism is an abnormality whereby one or both testes fail to descend into the scrotum. Undescended testes could reduce fertility, and increase the risk of testicular tumors. For 2 certain purebred breeds of dogs, the incidence of cryptorchidism is as high as 14% of the population. Dogs with cryptorchidism will not be qualified by their breed organizations for either breeding purposes or most of the dog competitions or dog shows in the United States. It is essential to remove affected dogs and prevent carriers from contributing to the next generation of a dog breed. In this dissertation, we integrate information from a genome-wide association study with biological functions and comparative genomics to predict possible genomic regions and candidate genes that might be the cause of cryptorchidism. RESEARCH OBJECTIVES In this dissertation, the first goal is to explore the genetic basis for each of the disorders above. The causative mutations can be used as genetic markers for breeding purposes in order to exclude carriers to avoid at risk mating, and to improve animal welfare and decrease the economic losses. Second, due to the body size and relatively low cost of maintenance, sheep with rickets, lower motor neuron disease or chondrodysplasia could be used as animal models for studies of similar disorders in humans. Third, the results from the dog cryptorchidism study may provide valuable information for human research on cryptorchidism. DISSERTATION ORGANIZATION This dissertation was written by starting with a literature review (Chapter 1) followed by four manuscripts published in peer-reviewed journals (or prepared for submission). Chapter 2 is a manuscript that explores the causative mutation responsible for inherited rickets in Corriedale sheep. Dorian Garrick designed this experiment and assisted to uncover the identical by descent regions. The fine mapping in this research and manuscript was 3 conducted and written by Xia Zhao under the direction of Dorian Garrick and Max Rothschild. Keren Dittmer provided suggestions for good candidate genes and contributed a lot for manuscript preparation. This work has been published in PLoS One 2011;6:e21739. Chapter 3 is a manuscript that examines the genetic basis causing a lower motor neuron disease that occurred in an extended family of Romney lambs in New Zealand. Dorian Garrick designed this experiment and helped to uncover the identical by descent regions. Searching for priority candidate genes in the homozygous regions, mutation discovery and mutation analyses were performed by Xia Zhao. Suneel Onteru contributed initial sequencing during the screening of positional candidate genes. The manuscript was written by Xia Zhao under the supervision of Dorian Garrick and Max Rothschild. This work has been published in Heredity 2012;109:156-62. Chapter 4 is a manuscript that unravels the cause of chondrodysplasia in Texel sheep by a whole genome scan. With the help from Dorian Garrick, the identical by descent region was located. Fine mapping of positional candidate genes was carried out by Xia Zhao and Suneel Onteru. Eventually, the putative causal mutation was identified and the manuscript was written by Xia Zhao under the supervision of Dorian Garrick and Max Rothschild. This work has been published in a special issue of Animal Genetics 2012;43 Suppl 1:9-18. Chapter 5 is a manuscript that exams the associated genomic regions and candidate genes for cryptorchidism in Siberian Huskies using a whole genome association study built on a foundation of Bayesian inference. This research was conducted by Xia Zhao under the supervision of Max Rothschild and Dorian Garrick. Max Rothschild helped to design this experiment. Suneel Onteru and Mahdi Saatchi conducted the Bayes B analyses. The manuscript was written by Xia Zhao 4 under the supervision of Max Rothschild and Dorian Garrick. Chapter 6 discribed conclusions summarized from Chapters 2, 3, 4 and 5. LITERATURE REVIEW 1. Current Trends in Characterizing The Underlying Basis of Genetic Disorders The completed whole genome sequence in humans and many animal species, and subsequent generation of widespread markers of genetic variations have allowed investigators to obtain dramatic improvement in their ability to map gene loci associated with monogenic or polygenetic diseases. Monogenic disorders are rare at a population level and result from a single mutated gene. These disorders typically follow Mendelian patterns of transmission (autosomal dominant, autosomal recessive, X-linked dominant, X-linked recessive) or nonMendelian patterns of transmission (Y-linked and mitochondrial or maternal inherited). As a result, monogenic disorders are easily studied when supportive pedigrees are available with many affected members. The database OMIM (Online Mendelian Inheritance In Man, http://www.ncbi.nlm.nih.gov/Omim/mimstats.html) reported in September 21st, 2012, a total of 3571 human phenotypes with a known molecular basis. The database OMIA (Online Mendelian Inheritance in Animals, http://omia.angis.org.au/home/) also provides a list of 473 Mendelian traits with a known important mutation in animals, and 1,218 potential animal models for human diseases. The two databases may exhibit overlap with similar characterized disease phenotypes and this helps to establish the basis for comparative genomics to facilitate further disease studies in humans or animal species. 5 More recently, researchers have shifted their focus from rare monogenic to more common polygenic disorders. Unlike monogenic disorders, polygenic disorders do not usually follow Mendelian inheritance pattern, and appear to be caused by an unknown number of genes. Only a small fraction of affected individuals in some complex diseases such as cardiovascular defects, diabetes and several cancers, have been found to result from a single mutant gene transmitted with Mendelian inheritance (Scheuner et al., 2004). The effects of multiple genes and environmental factors might contribute together to the development of these kinds of diseases. The genetic mechanisms of many complex diseases are still unclear. Studying complex disorders for their genetic basis is more difficult than studying monogenic diseases. Genome wide association studies (GWAS) represent a major tool for unraveling complex traits (Stranger et al., 2011). Linkage mapping can be used in studies of complex multifactorial disease. Additionally, studies of monogenic diseases will contribute greatly to knowledge of polygenic forms of human diseases (Antonarakis and Beckmann, 2006). The following section describes the basic principles of a list of mapping methods and their applications for uncovering mutations responsible for monogenic disorders and complex diseases. 2. Gene Mapping Strategy Understanding the genetic basis of human diseases was extremely challenging for biomedical researchers before the 20th century. A big improvement has been obtained with the development of novel genetic tools and technologies, such as single nucleotide polymorphism (SNP) chips, microarrays and next generation sequencing (NGS). Several gene mapping strategies have been broadly performed in many studies for human and animal 6 diseases including linkage mapping, GWAS, homozygosity mapping, and whole genome sequencing. 2.1 Linkage Mapping Linkage mapping was widely applied as a traditional mapping strategy for qualitative and quantitative trait loci (QTL) mapping over the last few decades. This mapping strategy is based on the assumption that linked genes are on the same chromosome, whereas the chance of two gene loci located at different chromosomes or more than 50 centimorgan (cM) apart being co-inherited in one meiotic event is 0.5. According to this principle, detecting markers being inherited together with a given phenotype having a probability much larger than 0.5 will help discover phenotype related genes or QTL regions. Linkage mapping is usually conducted in certain families that can provide informative meioses for linkage detection. A meiosis is informative for linkage when one can identify whether or not the resulting gamete is a recombinant in a particular region. Different types of markers have been used for linkage map construction. Restriction fragment length polymorphisms (RFLPs) are differences between samples of homologous DNA molecules that come from differing locations of restriction enzyme sites. The differing locations of restriction enzyme sites could be caused by nucleotide substitutions, rearrangements, insertions, or deletions (Chang et al., 1988). Amplified fragment length polymorphisms (AFLPs) were improved markers produced by adding ligation of adaptors to the ends of digested restriction fragments and that were amplified by polymerase chain reaction (PCR) (Tan et al., 2001). These two types of markers (RFLPs and AFLPs) are less employed in practice now, compared to simple sequence repeats (SSR or microsatellites) and 7 SNPs for linkage mapping (Ball et al., 2010). Microsatellites or SSR are repeating sequences of 2-6 base pairs of DNA, with advantages for linkage mapping such as being PCR-based, highly reproducible and polymorphic, generally co-dominant and abundant in animal and plant genomes (Miao et al., 2005). Single nucleotide polymorphisms are single nucleotide variations in a DNA sequence. The high-resolution linkage map using high density SNP markers facilitates fine mapping of QTLs and enables the identification of genomic regions associated with high recombination rates (Groenen et al., 2009). More recently, a novel genetic marker called restriction-site associated DNA (RAD) was developed. This kind of marker is a short fragment of DNA nearby each side of a particular restriction enzyme recognition site (Miller et al., 2007). Integrating next-generation sequencing as a RAD genotyping platform, RAD has been rapidly applied for linkage map construction in many species since a high-density genetic map can be available and simultaneously sufficient sequence information will meet the demand of mutation analysis (Baird et al., 2008; Wang et al., 2012). The implementation of linkage mapping at genome wide scale using commercial SNP chips has been widely applied for many interesting traits. The density of SNP chips, haplotype frequency in the population and the amount of linkage disequilibrium (LD) in the genome will strongly affect the ability to detect genomic regions harboring the real causative mutations (Teo et al., 2010). Alleles that are in non-random association are said to be in LD. Linkage disequilibrium represents the chance of co-inheritance of alleles at different loci. It can be the result of physical linkage of genes or due to selection, mutation, migration, crossing or genetic drift. Owing to the longer LD structure and artificial selection, livestock are preferred study populations for linkage mapping for many economically important traits 8 such as reproduction traits, disease resistance, behavior traits, lactation (milk production) and meat quality (Georges, 2007; Rosendo et al., 2012; Heifetz et al., 2009; Glenske et al., 2011; Marques et al., 2011; Nones et al., 2012). On the other hand, mapping gene loci using linkage data is extremely inefficient in humans due to typically few progeny for calculating reliable map distance, traits being difficult to follow for many generations, and large distance between the known genes compared to livestock populations (Griffiths et al., 2000). 2.2 Homozygosity Mapping Homozygosity mapping was first proposed in 1987 (Lander et al., 1987) as a powerful method of mapping genes underlying recessive traits in humans. The principle of this approach is based on the assumption that the homozygous genotype of a mutation resulting from a recessive disease gene is “identical by descent (IBD)” by the transmission of the affected allele to the affected child from a common ancestor occuring through both parents. The adjacent chromosomal region surrounding the mutated locus is likely homozygous by descent in closely related offspring and this short segment of homozygosity can be detected in the affected group but not in carriers. This mapping strategy has been used successfully in both human and animal recessive diseases (Lander et al., 1987; Drielsma et al., 2012; Charlier et al., 2008). Traditional homozygosity mapping requires suitable consanguineous families with multiple affected offspring. However, the limited availability of suitable families has hampered the discovery of genes for many diseases. Moreover, too closely related family members provide large homozygous segments that contain too many candidate genes. This hinders efficient identification of causative genes. Applying homozygosity mapping to single affected individuals from outbred populations may aid in the discovery of 9 short candidate chromosome segments and accelerate the identification of novel disease genes (Hildebrandt et al., 2009). For many years, microsatellite markers were used in the analysis of homozygosity mapping. With the development of high throughput genotyping platforms like array-based genotyping, this mapping approach is mainly based on data generated by commercially available chips with hundreds of thousands of SNPs. The decreasing costs of NGS technique drives researchers to try homozygosity mapping using sequencing data on some diseases such as exome sequencing data in a study for a cohort of Crohn’s Disease cases. A new algorithm was implemented in this study by detecting IBD from exome sequencing data to find small slices of matching alleles between pairs of individuals and then to extend them into full IBD segments. However, the detection accuracy of this method is low in assessing the causative regions (Zhuang et al., 2012). There are several freely available online computational tools for homozygosity mapping to facilitate the analysis of large datasets for gene discovery, for example HomozygosityMapper (Seelow et al., 2009, http://www.homozygositymapper.org/), IBDFinder (Carr et al., 2009, http://www.insilicase.co.uk/Desktop/Ibdfinder.aspx), and KinSNP (Amir et al., 2010, http://bioinfo.bgu.ac.il/bsu/software/KinSNP/). 2.3 Genome Wide Association Study The recent use of genome wide association studies has become a powerful tool for investigating the genetic basis of common diseases such as Crohn’s disease (Barrett et al., 2008), type 2 diabetes (Zeggini et al., 2008) and obesity (Wang et al., 2011). This approach searches for associations between DNA sequence variants (SNPs) across the whole genome 10 and phenotypes of interest. In an attempt to pinpoint genes that contribute to the etiology of a disease, scientists use GWAS to determine if one variant of an SNP is statistically more common in individuals with a given phenotype than in the ones without. This mapping strategy became available for human diseases since the completion of the Human Genome Project in 2003 and the HapMap in 2005. The map of genetic variants and a list of new technologies assist the accurate analysis of samples for genetic variations contributing to the onset of a disease. However, the associated variants themselves may not be the causative mutations, but may just be in LD with the actual causal genetic variants. A genome wide association analysis is usually followed by fine mapping with a larger number of recombinant individuals to narrow the region of interest, and may include targeted sequencing of genome libraries prepared for the candidate genomic regions. Genome wide association studies can be confounded by population stratification caused by systematic ancestry differences between cases and controls (Price et al., 2010). Population stratification will drive spurious associations when allele frequencies are different between subpopulations (Campbell et al., 2005). Therefore, correcting for stratification is critical in performing GWAS. 2.4 Gene Hunting for Diseases with Whole Genome Sequencing With the development of novel biomedical technologies and dramatically decreased prices for sequencing, whole genome sequencing is more affordable that in the past. Gene hunting for diseases with whole genome sequencing is a modern approach for revealing the underlying causal gene locus not only on Mendelian diseases but also on complex diseases. This approach avoids the need for routine molecular analysis in the lab which is often labor intensive and time consuming, and effectively provides genetic information for doctors to 11 counsel at risk patients. Whole genome sequencing may also quickly allow suitable patients the option of gene therapy. For example, remarkable advances would be obtained for understanding disease-specific genetic alterations associated with cancers by the application of whole genome sequencing, especially for the detection of structural variation (Mardis & Wilson, 2009). Artificial selection in livestock will also be easily conducted when genetic markers are quickly identified by whole genome sequencing (Larkin et al., 2012). Phenotypes of Mendelian diseases might not be genetically homogeneous. The causal type of mutation like point mutations or segmental variations may vary between disease affected individuals with the same syndromes. Additionally, different patterns of inheritance can be involved in the same phenotype. For instance, Charcot-Marie-Tooth disease (CMT) is an inherited peripheral neural disease but can be separated into an autosomal dominant, recessive, or X-linked causes and both SNPs and copy number variants confer susceptibility to CMT (Lupski et al. 2010). Different genetic variants in the same gene locus might cause similar phenotypes. For instance, cystic fibrosis (CF) is a recessive inherited disease. Over 1,000 mutations have been found in the cystic fibrosis transmembrane conductance regulator gene (CFTR) to be associated with CF (Jaffe and Bush, 2001). There is not usually a 100% success in discovering the causative mutation for a new CF patient even by screening all known mutation loci. To detect the genetic basis underlying these genetically heterogeneous or polymorphic disorders, performing whole genome sequencing within individual families who have well characterized pathologic records will be quick and efficient (Lupski et al., 2010). Most complex diseases result from the combined small effects from multiple genes. Mapping novel genes by whole genome sequencing for complex diseases is a powerful tool 12 (Casals et al., 2012). Whole genome sequencing could detect any alterations on the entire genome and provide new insights into the mutational spectra. For example, a whole genome sequencing analysis of a group of hepatocellular carcinoma (HCC) patients revealed the influence of etiological background of somatic mutation patterns, and also identified recurrent mutations in chromatin regulators in the development of HCC (Fujimoto et al., 2012). 3. Gene Mapping of Genetic diseases In order to understand the applications of different mapping strategies for detecting genetic components underlying heritable diseases, gene mapping of three monogenetic disorders including inherited rickets, chondrodysplasia, and lower motor neuron disease, as well as one of the polygenic complex diseases, cryptorchidism are reviewed here. 3.1 Gene Mapping of Inherited Rickets Rickets is a metabolic bone disease due to nutritional deficiencies of vitamin D, phosphorus, magnesium and/or calcium. The lack of those micronutrients will lead to defective mineralization of cartilage at sites of endochondral ossification. The weakening or softening bones will often result in fractures and limb deformities (Fitch, 1943; Dittmer et al., 2009; Ozkan, 2010). Inherited rickets is an inborn error of metabolism, resulting from mutations that disrupt the normal bone metabolism. Several different forms of inherited rickets have been described including X-linked hypophosphatemic rickets, autosomal dominant hypophosphatemic rickets (ADHR), autosomal recessive hypophosphatemic rickets (ARHR), hereditary hypophosphataemic rickets with hypercalciuria (HHRH) and two types of Vitamin D-dependent rickets (VDDR-I and VDDR-II). 13 X-linked hypophosphatemic rickets is commonly known as HYP or XLH. For this review, HYP will be the acronym used. Based on pedigree analysis, HYP was considered a dominant disorder and is the most common form of familial hypophosphatemic rickets. To map the causative locus, a multilocus linkage analysis using a series of X chromosome markers was conducted in eleven HYP relevant human families, and eventually located a region on the X chromosome (Read et al., 1986). The laboratory mouse mutation (Hyp) has been identified on a homologous segment of the mouse X chromosome and the mice exhibited similar symptoms to human HYP (Eicher et al., 1976). Using a positional cloning approach, the HYP consortium successfully isolated a new gene named phosphate regulating gene with homologies to endopeptidase on the X chromosome (PHEX; MIM 300550). The causative mutations responsible for HYP patients were then determined (HYP consortium, 1995). Similarly, a genome-wide multilocus linkage mapping was performed for ADHR using a large family and determined the gene responsible for this disorder was on human chromosome 12p13 in the 18-cM interval (Econs et al., 1997). Combining the results in this large study with another small ADHR pedigree, the disease locus was mapped to a reduced region of a 1.5-Mb region. The following direct sequencing of coding and promoter regions in the positional candidate genes detected missense mutations in fibroblast growth factor 23 gene (FGF23; MIM 605380), a member of the FGF family (ADHR consortium, 2000). Gene mapping of ARHR was carried out using a series of mapping tools. In one study, homozygosity mapping using a SNP array identified a 4.6 Mb candidate regions on human chromosome 4q21. Mutation screening by direct sequencing found loss-of-function mutations in a positional plausible candidate gene called DMP1 (dentin matrix protein 1; 14 MIM 600980). The elevated phosphaturic protein FGF23 suggested that dysfunctional DMP1 mis-regulated the expression of FGF23 to results in ARHR (Lorenz-Depierux et al., 2010). Similar methods were used in another study for ARHR human patients, but mutations in a different causative gene called ecto-nucleotide pyrophosphatase/phosphodiesterase 1 (ENPP1) were found. Biochemical experiments confirmed that the inactive mutations in ENPP1 reduce enzymatic activity in hypophosphatemic patients but may not affect phosphate (Pi) renal reabsorption (Levy-Litan et al., 2010). Another form of autosomal recessive hypophosphatemic rickets is HHRH (Tieder et al., 1985). The distinctive characteristic of HHRH compared to other forms of hypophosphatemic rickets is hypercalciuria due to increased serum 1,25-dihydroxyvitamin D levels and increased intestinal calcium absorption. A genome-wide linkage scan with homozygosity mapping, followed by the candidate gene method were performed to map a single nucleotide deletion in solute carrier family 34 (sodium phosphate), member 3 gene (SLC34A3; MIM 241530). This gene encodes the renal sodium-phosphate cotransporter NaPi-IIc (Bergwitz et al., 2006). Two other recessively inherited forms of rickets in humans were due to disruptions of either the synthesis or metabolism of vitamin D. Vitamin D-dependent rickets, type I (VDDR-I) is caused by mutations in the CYP27B1 gene, which encodes vitamin D 1-alpha-hydroxylase (Fu et al., 1997). The mutation in this gene was initially mapped by linkage mapping with big pedigrees followed by the candidate gene method which involved sequencing for causative genetic variants (Labuda et al., 1990; Fu et al., 1997). Vitamin D-dependent rickets, type II (VDDR-II), is called hereditary vitamin D - resistant rickets (HVDRR) (MIM 277440). Shortly after the successful cloning of a full-length cDNA sequence of the vitamin 15 D receptor gene (VDR) in 1988, researchers discovered many mutations in VDR that were found to be associated with HVDRR (Malloy et al., 1999). 3.2 Gene Mapping of Lower Motor Neuron Disease Motor neuron diseases (MND) are a group of progressive neurological disorders involved with selective degeneration of upper motor neurons (UMN) and/or lower motor neurons (LMN). Messages from neurons in the brain (UMN) are initially transmitted to neurons in the brain stem and spinal cord (LMN) and then are transmitted from LMN to particular muscles to produce movements such as walking, chewing or blinking. When there are disruptions in the signals between LMN and the muscle, it is commonly called LMN disease. The incidence of MND varies between different ethnic human populations from 0.6 to 2.4 individuals per 100,000 person-years (Lee, 2010). Common characteristics of MND are muscle weakness and/or spastic paralysis. Symptoms for UMN degeneration include spastic tone, hyperreflexia and upgoing plantar reflex. Signs of LMN degeneration are represented as muscle atrophy, fasciculations and weakness when associated LMN degenerate (Lee, 2010). Different types of MND have been classified based on their time of onset, type of motor neuron involvement and detailed clinical features. These MND include amyotrophic lateral sclerosis (ALS), hereditary spastic paraplegia (HSP), primary lateral sclerosis (PLS), spinal bulbar muscular atrophy (SBMA), spinal muscular atrophy (SMA) and lethal congenital contracture syndrome (LCCS). Among these diseases, both UMN and LMN are involved in the development of ALS and LCCS (Dion et al., 2009). Previous evidence has shown genetic variants within more than 30 genes might be responsible for MND (Dion et al., 2009). Amyotrophic lateral sclerosis is a fatal 16 degenerative disorder of motor neurons. Linkage mapping using single-strand conformational polymorphisms (SSCP) was applied to discover markers near the Cu/Znbinding superoxide dismutase (SOD1) gene being in significant linkage with ALS. Mutations in this gene were identified by direct sequencing to explain about 20% of patients with familial ALS (Rosen et al., 1993). Hereditary spastic paraplegia with an autosomal dominant inheritance pattern is characterized by progressive spasticity of the lower limbs. Pathogenic mutations in spastin gene (SPAST) were revealed by a positional cloning strategy involving construction of BAC (bacterial artificial chromosome) contigs for sequencing. The spastin gene was shown to be the most common causal gene locus of autosomal dominant HSP patients (Hazan et al., 1999). Spinal muscular atrophy is a common fatal autosomal recessive disorder, leading to progressive paralysis with muscular atrophy. Genomic regions linked with SMA were first determined by means of linkage analysis. Further analysis using yeast artificial chromosome (YAC) contigs were applied to narrow down the linked regions, and finally identified loss of function mutations or deletions on SMN1 (survival of motor neuron 1, telomeric) responsible for SMA (Lefebvre et al., 1995). Lethal congenital contractual syndrome type 2 is an autosomal recessive neurogenic form of arthrogryposis, characterized by multiple joint contractures and markedly distended urinary bladder. To map gene loci for this MND, linkage analyses and homozygosity mapping using microsatellite markers were used. One base pair substitution on exon 11 of ERBB3 (receptor tyrosineprotein kinase erbB-3) was identified as being associated with LCCS2 (Narkis et al, 2007). 3.3 Gene Mapping of Chondrodysplasia 17 Chondrodysplasia is a genetically diverse group of diseases characterized by abnormal development of cartilage that are present in humans and many animals. Defective genes disrupt normal chondrogenesis and cartilage development leading to abnormal shape and structure of the skeleton. The inheritance patterns of chondrodysplasia are diverse. Mutations in several genes related to the synthesis of collagen or proteoglycans, or the signal-transduction mechanisms have been reported to be involved in inherited chondrodysplasia. Collagen II is the major collagenous component of cartilage. Traditional genetic tools such as genomic cloning and sequencing indicated that several point mutations in human collagen, type II, alpha 1 gene (COL2A1) interrupt the normal formation of type II collagen and causes an autosomal dominant achondrogenesis type II (Vissing et al., 1989; Chan et al., 1995). Partially disrupted Col2a1 has been identified with a phenotype of chondrodysplasia in transgenic mice (Vandenberg et al., 1991). Successive genetic linkage and positional candidate cloning studies have shown that disruption of genes encoding other cartilage-specific collagen could cause different types of chondrodysplasia. Mutations in families with an autosomal dominantly inherited chondrodysplasia, multiple epiphyseal dysplasia (MED), have been reported in genes coding the α1, α2 and α3 chains of collagen IX. A mutation in intron 8 of COL9A1 (collagen, type IX, alpha 1) causing a splicing defect was found in one family with MED (Czarny-Ratajczak et al. 2001). Several mutations were described to cause the skipping of COL9A2 (collagen, type IX, alpha 2) exon 3 in families with MED (Muragaki et al., 1996; Spayde et al., 2000; Holden et al. 1999). Separate families with MED have been described to show that mutations in the acceptor splice site of intron 2 in COL9A3 (collagen, type IX, alpha 3) caused the skipping of exon 3, and resulted phenotypes of MED (Paassilta et al. 1999b, Bönnemann et al. 2000). Type X collagen is a 18 short chain collagen expressed by hypertrophic chondrocytes during endochondral ossification. A nonsense mutation of the collagen, type X, alpha 1 (COL10A1) gene has been found to be responsible for Schmid type metaphyseal chondrodysplasia (MCDS) in a human patient (Woelfle et al., 2011). Sulfated glycosaminoglycan is very important for maintaining the elasticity and waterbinding capacity in the cartilage matrix (Jerosch, 2011). Absorption and re-absorption of sulfate ions is very critical for the normal cartilage and bone development. Different forms of chondrodysplasia in humans including multiple epiphyseal dysplasia, atelosteogenesis type II and diastrophic dysplasia are due to mutations in the solute carrier family 26, member 2 gene (SLC26A2), which encodes a sulfate transporter protein (DTDST). These results were supported by genetic analyses with techniques of positional cloning and direct sequencing. The impaired sulfate uptake prevents the normal development of cartilage (Cho et al., 2010; Hästbacka et al., 1996; Hästbacka et al., 1999). Mutations in genes encoding a number of functionally osteogenic growth factors/signaling molecules have been also reported to be associated with different forms of chondrodysplasia. A 22-bp tandem duplication resulting in a frameshift of CDMP1 (designated cartilagederived morphogenetic protein 1) was found to be associated with chondrodysplasia in human patients (Thomas et al., 1996). A multi-breed association study in domestic dogs demonstrated that an evolutionarily recently acquired retrogene encoding fibroblast growth factor 4 (fgf4), an extra copy of the FGF4 gene, was strongly associated with chondrodysplasia (Parker et al., 2009). The most common inherited chondrodysplasia of sheep is “spider lamb syndrome” (SLS) in the Suffolk breed. By means of direct sequencing, 19 it was found that the mutation responsible for the disease was a non-synonymous transversion on the highly conserved tyrosine kinase II domain of the fibroblast growth factor receptor 3 gene (FGFR3) (Cockett et al., 1999; Beever et al., 2006). 3.4 Gene Mapping of Cryptorchidism Cryptorchidism, also known as undescended testicle, is the absence from the scrotum of one or both testes. It is the most common genital problem encountered in pediatrics. The incidence of this defect in human studies varies from 0.2% to 13% (Thonneau et al., 2003). Cryptorchidism has also been seen in domestic animals such as dogs (Cox et al., 1978), cats (Yates et al., 2003), horses (Cox et al., 1979), pigs (Rothschild et al., 1988), goats (Singh and Ezeasor, 1989), sheep (Smith et al., 2007) and cattle (St Jean et al., 1992), and wild animals such as maned wolves (Burton and Ramsay, 1986), black bears (Dunbar et al., 1996) and reindeer (Leader-Williams, 1979). An undescended testicle is considered a risk factor for testicular cancer and male infertility (Berkowitz et al., 1993; Carizza et al., 1990). In most mammals, the testis must descend from the abdomen to the scrotum to provide an environment with a lower temperature than deep body temperature which is too high for normal spermatogenesis. The elephant is an exception since its testes are completely inside the body (Short, 2005). Normal testicular descent involves a two-phase process including transabdominal and inguinoscrotal descents (Hutson, 1985; Figure 1). Key anatomical events to release the testes from their urogenital ridge location and to guide the gonad into the scrotum are the degeneration of the cranio-suspensory ligament, the swelling, outgrowth and regression of the gubernaculum and the elevation of the intro-abdominal pressure (Hutson and Hasthorpe, 2005). Insulin-like hormone 3 plays an important role for the normal 20 descent of the testes during the initial phase of transabdominal testicular descent, and androgen works on both phases. The gubernaculum is the principle structure in testicular androgen ogen play a role of controlling the changes descent and both insulin-like hormone 3 and andr of the gubernaculum (Nation et al., 2009). Other factors including hormones like mullerian inhibiting substance, estrogen and gonadotropin-releasing hormone, and proteins encoded by members of the HOX gene famil y also contribute to testicular descent. Abnormalities during the key processes of testicular descent have been recognized to relate to cryptorchidism. Figure 1. Structures involved in testicular descent. Two critical phases of testis descent transabdominal (left), and inguinoscrotal (right) travel are essential to move the testes into the scrotum. The intermediate stage is shown in the middle. The process of testicle descent depends on the degeneration of the cranio-suspensory ligament (CSL) and the swelling, outgrowth and regression of the gubernaculum. This figure was modified based on the reference Klonisch et al., 2004. Cryptorchidism is a genetic defect caused by the interaction of genetic, epigenetic and environmental factors (Hutson and Hasthorpe, 2005). Environmental factors with estrogenic or anti-androgenic effects play a role on the development of cryptorchidism (Fisher, 2004, 21 Gill et al., 1979). Various genes implicated in the regulation of testicular descent have been investigated for cryptorchidism. Previous reports have identified approximately 20 genes that might be associated with cryptorchidism (Klonisch et al., 2004; Amann and Veeramachaneni, 2007). Androgens might play the role on both transabdominal and inguinoscrotal processes via regulating the release of a calcitonin gene-related peptide by the genitofemoral nerve. Several androgen related genes have been investigated while studying cryptorchidism. Androgen receptor (AR) was found to be strongly expressed in the gubernaculums (George and Peterson, 1988) and a case-control study showed that the GGN trinucleotide repeat length variant in AR was associated with a risk of cryptorchidism and penile hypospadias in an Iranian population (Radpour et al., 2007). Calcitonin gene-related peptides are encoded by two genes including CGRP-alpha (CALCA) and CGRP-beta (CALCB). Exogenous calcitonin gene-related peptide injection into the scrotum can change the direction of gubernacular migration in the mutant trans-scrotal (TS) rat (Griffiths et al., 1993; Clarnette and Hutson 1999). However, no pathogenic sequence changes were identified in CALCA and CALCB for 90 selected human patients with unilateral or bilateral cryptorchidism by mutation screening performed through SSCP analyses (Zuccarello et al., 2004). According to the previous reports, a minority of affected cases have mutations either in insulin-like hormone 3 gene (INSL3) or affecting its receptor gene, called relaxin/insulinlike family peptide receptor 2 (RXFP2), the latter also known as leucine-rich repeatcontaining G protein-coupled receptor 8 gene (LGRF8). Insulin-like factor 3 is expressed in the pre-and postnatal Leydig cells of the testis and postnatal theca cells of the ovary, and is a member of the insulin-like hormone superfamily (Zimmermann et al., 1997). Targeted disruption of this gene prevents gubernaculum growth and causes bilateral cryptorchidism in 22 mice (Nef and Parada 1999; Zimmermann et al., 1999). The receptor gene LGR8 has been found highly expressed in the gubernaculum and is responsible for mediating hormonal signals that control normal testicular descent in humans (Gorlov et al., 2002). Knockout mice models for this gene exhibit bilateral intra-abdominal cryptorchidism (Bogatcheva et al., 2007). Disruptions of other hormone related genes also show associations with aberrant testicular descent. Müllerian inhibiting substance (MIS), or anti-Mullerian hormone (AMH), encoding a 140-kDa glycoprotein produced by Sertoli cells is critical for the mullerian duct during male sexual differentiation (Bashir and Wells, 1995) and is associated with the swelling reaction of gubernaculum occurring during the first phase of testicular descent (Kubota et al., 2002). In mice, the testicular feminization (Tfm) mutation with the deletion of the Mis gene was found with a phenotype of undescended testis suggesting a causal relationship between Mis and cryptorchidism (Behringer et al., 1994). The mutated gonadotropin-releasing hormone receptor gene (Gnrhr) has failed to initiate testicular descent in male mice as well (Pask et al., 2005). Estrogen receptor 1 (ESR1) is a ligand-activated transcription factor that is important in the process of testicular descent in children (Przewratil et al., 2004). Esr1 knockout male mice with retracted gonads suggest estrogens may play a role in the scrotal positioning of the testicle (Donaldson et al., 1996). Moreover, investigations were carried out on HOX genes for studying cryptorchidism. Homeobox A10 (Hoxa10) is highly expressed in the gubernaculum and this gene has been found to be essential for the gubernacular migration during testicular descent in mice. Disruption of this gene implicates its role in the development of cryptorchidism ((Nightingale et al., 2008; Rijli et al., 1995). A single SNP variant in HOXD13 (homeobox D13), has been identified to be 23 significantly associated with cryptorchidism in humans via a case-control analysis (Wang et al., 2007). 4. Animal Models for Human Diseases Using appropriate model organisms for human disease studies will help understand the genetic components behind disorders. Domestic animals have proven to be good research models due to their large body size and many anatomical and physiological similarities to humans. Numerous genetic defects have been identified in domestic animals, and inherited disorders in these animal models can provide valuable information to assist in the identification of human disease genes and the study of gene therapy. 4.1 The Sheep model Sheep (Ovis aries) were initially domesticated approximately 11,000 years ago for agricultural purposes including the production of fleece, skins, meat and milk (Colledge et al., 2005; Chessa et al., 2009). The top three sheep meat producing countries, in order of quantity, are China, Australia and New Zealand based on data from the Food and Agricultural Organization of the United Nations (http://faostat.fao.org/site/569/DesktopDefault.aspx?PageID=569#ancor). in Large 2010 animal models like sheep have received extensive clinical attention for human genetic diseases due to their advantages over rodent models, such as greater physiological similarity to human patients, a longer life span and larger body size. Sheep are considered as excellent models for studying respiratory physiology (Meeusen et al., 2009), renal physiology (Ramchandra et al., 2012), reproductive physiology (Vinet et al., 2012), cardiovascular physiology (Wylie et al., 2012) and the endocrine system (Recabarren et al., 2005). The body size of sheep allows 24 for frequent blood sampling and easy insertion of monitoring devices into organs. These advantages allow performing prenatal gene therapy to be performed in sheep prior to disease onset to achieve long-term expression of the exogenous gene without modifying the germ line of the recipient (Porada et al., 1998). The construction of the reference sheep genome sequence was started in 2009 by the International Sheep Genomics Consortium. Genome sequence will enable accelerated genetic gain and improved understanding of the etiology of diseases deleterious for the sheep breeding programs (The International Sheep Genomics Consortium et al., 2010). Most modern sheep breeds have maintained high levels of genetic diversity due to strong gene flow between individuals of different lines and breeds (Kijas et al., 2012). The initial analysis of LD structure in sheep was conducted across over 3,000 sheep from 74 diverse breeds. The results suggest substantially lower LD in sheep compared with cattle but higher when compared to humans. Moreover, substantial variation in LD structure exists between ovine breeds (Raadsma et al., 2010). The new reference sheep genome sequence would help biomedical researchers to overcome previous challenges of limited gene sequence availability. 4.2 The Dog Model The domestic dog (Canis lupus familiaris) may be the first animal to be domesticated approximately 15,000 years ago (Savolainen et al., 2002). The purposes of domestication of dogs at the beginning were for working or hunting. More recently, the dog has been used extensively as a model organism for complex human disease research because of the similarities between canine and human diseases including disease manifestation and genetic 25 predisposition of diseases such as cancers (Khanna et al., 2006), diabetes (Srinivasan and Ramarao, 2007), heart disease (Kohler et al., 1984), kidney disease (Aresu et al., 2008) (Karlsson and Lindblad-Toh, 2008). Additionally, many anatomic and physiological aspects, particularly in the cardiovascular, urogenital, nervous and musculoskeletal systems between canine and humans are similar (Khanna et al., 2006). The similar spectrum of diseases, specific dog population structure, the close social relationship and similar living environment between humans and dogs, drive the domestic dog as an ideal model for mapping human diseases. With the first version of Dog Genome Sequencing draft being completed in 2005, the dog genome was found to exhibit relatively small size (2.4 Gb compared to 3.2 Gb in humans), with less divergence from the human when compared to the mouse genome (Lindblad-Toh et al., 2005; International Human Genome Sequencing Consortium, 2004). This similarity allows the dog to be a suitable animal model for human diseases with a genetic component. Additionally, the dog genome has a favorable LD structure for investigating complex disorders. Two population bottlenecks have occurred during domestication of the dog. This created an advantageous linkage structure which is ideal for gene mapping. First, long-range LD, from hundreds of kilobases to several megabases, exists within a breed, unlike the tens of kilobases observed in the human genome. Second, short-range LD exists in the whole dog population due to its long domestication period (Lindblad-Toh, et al., 2005). By knowing the patterns of limited LD across whole dog populations and extensive LD within breeds, a twostage mapping including identification of the disease locus in a breed and then finding the narrowed genomic region using multiple breeds is a favored strategy (Karlsson and LindbladToh, 2008). An example of this occurred with the mapping of white spotting in dogs. 26 Initially, 19 boxer dogs were used to discover an 800kb region. A second breed, bull terriers, was used to confirm and narrow the regions to a 100kb region that contained a gene called microphthalmia-associated transcription factor (MITF). Candidate regulatory mutations were identified to be associated with the spotting trait (Karlsson et al., 2007). The most powerful and commonly used mapping approach in dogs for disease research is GWAS. Less than 50 individuals are sufficient to map a simple Mendelian recessive trait using GWAS, but many more dogs are needed to map complex diseases (Karlsson and Lindblad-Toh, 2008). There are many successful cases using GWAS to map Mendelian traits in dogs. Some examples include a missense mutation on SOD1 (superoxide dismutase 1) associated with canine degenerative myelopathy (DM) (Awano et al., 2009), a 23 kb deletion in the canine ADAM9 (a disintegrin and metalloprotease domain, family member 9) associated with a canine cone-rod dystrophy (crd3) (Goldstein et al., 2010), and a missense mutation in a conserved functional domain of β-glucuronidase (GUSB) confirmed to cause mucopolysaccharidosis VII (MPS VII) in a population of Brazilian terriers (Hytönen et al., 2012). Other mapping strategies including QTL mapping, across-breed mapping and selective sweep mapping have also been applied in dogs (Karlsson and Lindblad-Toh, 2008; Sutter et al., 2007). The selective sweep mapping is to identify the selective sweeps that are associated with a strongly selected phenotype. A selective sweep is the region where there is a reduction or elimination of genetic variations in DNA neighboring a mutation, due to recent and strong positive natural or artificial selection (Palaisa et al., 2004). Artificial selection for breeding purposes in dogs over hundreds of years causes big morphological differences 27 among modern breeds and uniform physical features within pure-bred dogs (Wayne and Ostrander, 2007). This special characteristic of dog populations provides suitable resources for QTL mapping and selective sweep mapping to discovery of genes underlying breedspecific characteristics. For example, a common haplotype in insulin-like growth factor 1 gene (IGF1) associated with body size in dogs was successfully identified by QTL mapping and selective sweep mapping using small and giant dog breeds (Sutter et al., 2007). To avoid the confounding of long haplotypes and extensive homozygosity within the breed, acrossbreed selection sweep mapping is helpful to find interesting loci. A study used this mapping strategy to analyze genetic variation in 46 dog breeds identified 44 chromosomal regions that are extremely variable between breeds and are likely to control many of the morphological and behavioral traits that vary between them (Vaysse et al., 2011). By contrast, the limitation of offspring from crosses in large numbers restricts the application of QTL mapping in dogs comparing to livestock animals. Genetic diseases consititute huge public healthy problems for humans, and also cause big economic losses in animal industry. Finding the causative mutations will help to improve this situation. Applying suitable gene mapping tools for scientists could make gene hunts faster, cheaper and practical for genetic diseases. Animal models are preferable for experimental disese research due to their advantages reviewed above. The gene discoveries in animal models will facilitate the study of human genetic diseases. REFERENCES ADHR Consortium. Autosomal dominant hypophosphataemic rickets is associated with mutations in FGF23. Nat Genet. 2000. 26:345-8. Amann RP, Veeramachaneni DNR. Cryptorchidism in common eutherian mammals. Reproduction. 2007. 133:541-1. 28 Amir el-AD, Bartal O, Morad E, Nagar T, Sheynin J, Parvari R, et al. KinSNP software for homozygosity mapping of disease genes using SNP microarrays. Hum Genomics. 2010. 4:394-40. Antonarakis SE, Beckmann, JS. Mendelian disorders deserve more attention. Nat Rev Genet. 2006. 7: 277- 82. Aresu L, Pregel P, Bollo E, Palmerini D, Sereno A, Valenza F. Immunofluorescence staining for the detection of immunoglobulins and complement (C3) in dogs with renal disease. Vet Rec. 2008. 163:679-82. Awano T, Johnson GS, Wade CM, Katz ML, Johnson GC, Taylor JF, et al. Genome-wide association analysis reveals a SOD1 mutation in canine degenerative myelopathy that resembles amyotrophic lateral sclerosis. Proc Natl Acad Sci U S A. 2009. 106: 2794-9. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008. 3:e3376. Ball AD, Stapley J, Dawson DA, Birkhead TR, Burke T, Slate J. A comparison of SNPs and microsatellites as linkage mapping markers: lessons from the zebra finch (Taeniopygia guttata). BMC Genomics. 2010. 11:218. Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet. 2008. 40:955-62. Bashir MS, Wells M. Müllerian inhibiting substance. J Pathol. 1995. 176:109-10. Beever JE, Smit MA, Meyers SN, Hadfield TS, Bottema C, Albretsen J, et al. A single-base change in the tyrosine kinase II domain of ovine FGFR3 causes hereditary chondrodysplasia in sheep. Anim Genet. 2006. 37:66-71. Behringer RR, Finegold MJ, Cate RL. Müllerian-inhibiting substance function during mammalian sexual development. Cell. 1994. 79:415-25. Bergwitz C, Roslin NM, Tieder M, Loredo-Osti JC, Bastepe M, Abu-Zahra H, et al. SLC34A3 mutations in patients with hereditary hypophosphatemic rickets with hypercalciuria predict a key role for the sodium-phosphate cotransporter NaPi-IIc in maintaining phosphate homeostasis. Am J Hum Genet. 2006. 78: 179-92. Berkowitz GS, Lapinski RH, Dolgin SE, Gazella JG, Bodian CA, Holzman IR. Prevalence and natural history of cryptorchidism. Pediatrics. 1993. 92:44-9. Bogatcheva NV, Truong A, Feng S, Engel W, Adham IM, Agoulnik AI. GREAT/LGR8 is the only receptor for insulin-like 3 peptide. Mol Endocrinol. 2003. 17:2639-46. Bönnemann CG, Cox GF, Shapiro F, Wu J-J, Feener CA, Thompson TG, et al. A mutation in the α3 chain of type IX collagen causes autosomal dominant multiple epiphyseal dysplasia with mild myopathy. Proc Natl Acad Sci USA. 2000. 97:1212-17. Burton M, Ramsay E. Cryptorchidism in Maned Wolves. J Zoo An Med. 1986. 17:133-5. 29 Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, et al. Demonstrating stratification in a European American population. Nat Genet. 2005. 37:868-72. Carizza C, Antiba A, Palazzi J, Pistono C, Morana F, Alarcón M. Testicular maldescent and infertility. Andrologia. 1990. 22:285-8. Carr IM, Sheridan E, Hayward BE, Markham AF, Bonthron DT. IBDfinder and SNPsetter: tools for pedigree-independent identification of autozygous regions in individuals with recessive inherited disease. Hum Mutat. 2009. 30:960-7. Casals F, Idaghdour Y, Hussin J, Awadalla P. Next-generation sequencing approaches for genetic mapping of complex diseases. J Neuroimmunol. 2012. 248:10-22. Review. Chan D, Cole WG, Chow CW, Mundlos S, Bateman JF. A COL2A1 mutation in achondrogenesis type II results in the replacement of type II collagen by type I and III collagens in cartilage. J Biol Chem. 1995. 270:1747-53. Chang C, Bowman JL, DeJohn AW, Lander ES, Meyerowitz EM. Restriction fragment length polymorphism linkage map for Arabidopsis thaliana. Proc Natl Acad Sci U S A. 1988. 85:6856-60. Charlier C, Coppieters W, Rollin F, Desmecht D, Agerholm JS, Cambisano N, et al. Highly effective SNP-based association mapping and management of recessive defects in livestock. Nat Genet. 2008. 40:449-54. Chessa B, Pereira F, Arnaud F, Amorim A, Goyache F, Mainland I, et al. Revealing the History of Sheep Domestication Using Retrovirus Integrations. Science. 2009. 324: 532-6. Cho TJ, Kim OH, Lee HR, Shin SJ, Yoo WJ, Park WY, et al. Autosomal recessive multiple epiphyseal dysplasia in a Korean girl caused by novel compound heterozygous mutations in the DTDST (SLC26A2) gene. J Korean Med Sci. 2010. 25:1105-8. Clarnette TD, Hutson JM. 1999. Exogenous calcitonin gene-related peptide can change the direction of gubernacular migration in the mutant trans-scrotal rat. J Pediatr Surg. 34:120812. Cockett NE, Shay TL, Beever JE, Nielsen D, Albretsen J, Georges M, et al. Localization of the locus causing Spider Lamb Syndrome to the distal end of ovine Chromosome 6. Mamm Genome. 1999. 10:35-8. Colledge S, Conolly J, Shennan S. The evolution of Neolithic farming from SW Asian origins to NW European limits. Eur J Arch. 2005. 8:137-56. Cox JE, Edwards GB, Neal PA. An analysis of 500 cases of equine cryptorchidism. Equine Vet J. 1979. 11:113-6. Cox VS,Wallace LJ, Jessen CR. An anatomic and genetic study of canine cryptorchidism. Teratology. 1978.18:233-40. Dion PA, Daoud H, Rouleau GA. Genetics of motor neuron disorders: new insights into pathogenic mechanisms. Nat Rev Genet. 2009. 10:769-82. 30 Dittmer KE, Thompson KG, Blair HT. Pathology of inherited rickets in Corriedale sheep. J Comp Pathol. 2009. 141: 147-55. Donaldson KM, Tong SY, Washburn T, Lubahn DB, Eddy EM, Hutson JM, et al. Morphometric study of the gubernaculum in male estrogen receptor mutant mice. J Androl. 1996. 17:91-5. Drielsma A, Jalas C, Simonis N, Désir J, Simanovsky N, Pirson I, et al. Two novel CCDC88C mutations confirm the role of DAPLE in autosomal recessive congenital hydrocephalus. J Med Genet. 2012 [Epub ahead of print] Dunbar MR, Cunningham MW, Wooding JB, Roth RP. Cryptorchidism and delayed testicular descent in Florida black bears. J Wildl Dis. 1996. 32:661-4. Econs MJ, McEnery PT, Lennon F, Speer MC. Autosomal dominant hypophosphatemic rickets is linked to chromosome 12p13. J Clin Invest. 1997. 100:2653-7. Eicher EM, Southard JL, Scriver CR, Glorieux FH. Hypophosphatemia: mouse model for human familial hypophosphatemic (vitamin D-resistant) rickets. Proc Natl Acad Sci U S A. 1976. 73:4667-71. Fisher JS. Environmental anti-androgens and male reproductive health: focus on phthalates and testicular dysgenesis syndrome. Reproduction. 2004. 127:305-15. Review. Fitch LWN. Rickets in hoggets, with a note on the aetiology and definition of the disease. Australian Vet J. 1943.19:2. Fu GK, Lin D, Zhang MY, Bikle DD, Shackleton CH, Miller WL, et al. Cloning of human 25-hydroxyvitamin D-1 α-hydroxylase and mutations causing vitamin D-dependent rickets type 1. Mol Endocrinol.1997. 11:1961-70. Fujimoto A, Totoki Y, Abe T, Boroevich KA, Hosoda F, Nguyen HH, et al. Whole-genome sequencing of liver cancers identifies etiological influences on mutation patterns and recurrent mutations in chromatin regulators. Nat Genet. 2012. 44:760-4. George FW, Peterson KG. Partial characterization of the androgen receptor of the newborn rat gubernaculum. Biol Reprod. 1988. 39:536-9. Georges M. Mapping, Fine Mapping, and Molecular Dissection of Quantitative Trait Loci in Domestic Animals. Annu Rev Genomics Hum Genet. 2007. 8: 131-62. Gill WB, Schumacher GF, Bibbo M, Straus FH 2nd, Schoenberg HW. Association of diethylstilbestrol exposure in utero with cryptorchidism, testicular hypoplasia and semen abnormalities. J Urol. 1979. 122:36-9. Glenske K, Prinzenberg EM, Brandt H, Gauly M, Erhardt G. A chromosome-wide QTL study on BTA29 affecting temperament traits in German Angus beef cattle and mapping of DRD4. Animal. 2011. 5:195-7. Goldstein O, Mezey JG, Boyko AR, Gao C, Wang W, Bustamante CD, et al. An ADAM9 mutation in canine cone-rod dystrophy 3 establishes homology with human cone-rod dystrophy 9. Mol Vis. 2010. 16:1549-69. 31 Gorlov IP, Kamat A, Bogatcheva NV, Jones E, Lamb DJ, Truong A, et al. Mutations of the GREAT gene cause cryptorchidism. Hum Mol Genet. 2002. 11:2309-18. Griffiths AJF, Miller JH, Suzuki DT, Lewontin RC, Gelbart WM. An ntroduction to genetic analysis. 2000. 7th edition. New York: W. H. Freeman. Griffiths AL, Middlesworth W, Goh DW, Hutson JM. 1993. Exogenous calcitonin generelated peptide causes gubernacular development in neonatal (Tfm) mice with complete androgen resistance. J Pediatr Surg. 28:1028-30. Groenen MA, Wahlberg P, Foglio M, Cheng HH, Megens HJ, Crooijmans RP, et al. A highdensity SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate. Genome Res. 2009. 19:510-9. Hästbacka J, Kerrebrock A, Mokkala K, Clines G, Lovett M, Kaitila I, et al. Identification of the Finnish founder mutation for diastrophic dysplasia (DTD). Eur J Hum Genet. 1999. 7:664-70. Hästbacka J, Superti-Furga A, Wilcox WR, Rimoin DL, Cohn DH, Lander ES. Atelosteogenesis type II is caused by mutations in the diastrophic dysplasia sulfatetransporter gene (DTDST): evidence for a phenotypic series involving three chondrodysplasias. Am J Hum Genet. 1996. 58:255-62. Hazan J, Fonknechten N, Mavel D, Paternotte C, Samson D, Artiguenave F, et al. Spastin, a new AAA protein, is altered in the most frequent form of autosomal dominant spastic paraplegia. Nat Genet. 1999. 23:296–303. Heifetz EM, Fulton JE, O'Sullivan NP, Arthur JA, Cheng H, Wang J, et al. Mapping QTL affecting resistance to Marek's disease in an F6 advanced intercross population of commercial layer chickens. BMC Genomics. 2009. 14:10-20. Hildebrandt F, Heeringa SF, Rüschendorf F, Attanasio M, Nürnberg G, Becker C, et al. A Systematic Approach to Mapping Recessive Disease Genes in Individuals from Outbred Populations. PLoS Genet. 2009. 5: e1000353. Holden P, Canty EG, Mortier GR, Zabel B, Spranger J, Carr A, et al. Identification of novel pro-α2(IX) collagen gene mutations in two families with distinctive oligo-epiphyseal forms of multiple epiphyseal dysplasia. Am J Hum Genet. 1999. 65:31-8. Hutson JM, Hasthorpe S. Abnormalities of testicular descent. Cell Tissue Res. 2005. 322:155-8. Review. Hutson JM. A biphasic model for the hormonal control of testicular descent. Lancet. 1985. 24:419-21. HYP Consortium. A gene (PEX) with homologies to endopeptidases is mutated in patients with X-linked hypophosphatemic rickets. Nat Genet. 1995. 11: 130-6. Hytönen MK, Arumilli M, Lappalainen AK, Kallio H, Snellman M, Sainio K, et al. A novel GUSB mutation in Brazilian terriers with severe skeletal abnormalities defines the disease as mucopolysaccharidosis VII. PLoS One. 2012. 7:e40281. 32 International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004. 431:931-45. International Sheep Genomics Consortium, Archibald AL, Cockett NE, Dalrymple BP, Faraut T, Kijas JW, et al. The sheep genome reference sequence: a work in progress. Anim Genet. 2010. 41:449-53. Review. Jaffé A, Bush A. Cystic fibrosis: review of the decade. Monaldi Arch Chest Dis. 2001. 56:240-7. Review. Jerosch J. Effects of Glucosamine and Chondroitin Sulfate on Cartilage Metabolism in OA: Outlook on Other Nutrient Partners Especially Omega-3 Fatty Acids. Int J Rheumatol. 2011:969012. Karlsson EK, Baranowska I, Wade CM, Salmon Hillbertz NH, Zody MC, Anderson N, et al. Efficient mapping of mendelian traits in dogs through genome-wide association. Nat Genet. 2007. 39:1321-8. Karlsson EK, Lindblad-Toh K. Leader of the pack: gene mapping in dogs and other model organisms. Nat Rev Genet. 2008. 9:713-25. Review. Khanna C, Lindblad-Toh K, Vail D, London C, Bergman P, Barber L, et al. The dog as a cancer model. Nat Biotechnol. 2006. 24: 1065- 6. Kijas JW, Lenstra JA, Hayes B, Boitard S, Porto Neto LR, San Cristobal M, et al. GenomeWide Analysis of the World's Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection. PLoS Biol. 2012. 10: e1001258. Klonisch T, Fowler PA, Hombach-Klonisch S. Molecular and genetic regulation of testis descent and external genitalia development. Dev Bio. 2004. 270:1-18. Kohler J, Silverman NA, Levitsky S, Pavel DG, Eckner FA, Fang RB. A model of cyanotic heart disease: functional, pathological, and metabolic sequelae in the immature canine heart. J Surg Res. 1984. 37:309-13. Kubota Y, Temelcos C, Bathgate RA, Smith KJ, Scott D, Zhao C, et al. The role of insulin 3, testosterone, Müllerian inhibiting substance and relaxin in rat gubernacular growth. Mol Hum Reprod. 2002. 8:900-5. Labuda M, Morgan K, Glorieux FH. Mapping autosomal recessive vitamin D dependency type I to chromosomal 12q14 by linkage analysis. Am J Hum Genet. 1990. 47:28-36. Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science. 1987. 236:1567-70. Larkin DM, Daetwyler HD, Hernandez AG, Wright CL, Hetrick LA, Boucek L, et al. Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle. Proc Natl Acad Sci U S A. 2012. 109:7693-8. 33 Leader-Williams N. Abnormal testes in reindeer, Rangifer tarandus. J Reprod Fertil. 1979. 57:127-30. Lee CN. Reviewing evidences on the management of patients with motor neuron disease. Hong Kong Med J. 2012. 18:48-55. Lefebvre S, Bürglen L, Reboullet S, Clermont O, Burlet P, Viollet L, et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell. 1995. 80:155–65. Levy-Litan V, Hershkovitz E, Avizov L, Leventhal N, Bercovich D, Chalifa-Caspi V, et al. Autosomal-recessive hypophosphatemic rickets is associated with an inactivation mutation in the ENPP1 gene. Am J Hum Genet. 2010. 86: 273-8. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005. 438:803-19. Lorenz-Depiereux B, Schnabel D, Tiosano D, Hausler G, Strom TM. Loss-of-function ENPP1 mutations cause both generalised arterial calcification of infancy and autosomalrecessive hypophosphatemic rickets. Am J Hum Genet. 2010. 86: 267-72. Lupski JR, Reid JG, Gonzaga-Jauregui C, Rio Deiros D, Chen DC, Nazareth L, et al. Wholegenome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010. 362:1181-91. Malloy PJ, Pike JW, Feldman D. The vitamin D receptor and the syndrome of hereditary 1,25-dihydroxyvitamin D-resistant rickets. Endo Rev 1999. 20: 156-88. Mardis ER, Wilson RK. Cancer genome sequencing: a review. Hum Mol Genet. 2009.18:R163-8. Marques E, Grant JR, Wang Z, Kolbehdari D, Stothard P, Plastow G, et al. Identification of candidate markers on bovine chromosome 14 (BTA14) under milk production trait quantitative trait loci in Holstein. J Anim Breed Genet. 2011. 128:305-13. Meeusen EN, Snibson KJ, Hirst SJ, Bischof RJ. Sheep as a model species for the study and treatment of human asthma and other respiratory diseases. Drug Discov Today Dis Models. 2009. 5: 101-6. Miao XX, Xub SJ, Li MH, Li MW, Huang JH, Dai FY, et al. Simple sequence repeat-based consensus linkage map of Bombyx mori. Proc Natl Acad Sci U S A. 2005. 102:16303-8. Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 2007. 17: 240-48. Muragaki Y, Mariman ECM, van Beersum SEC, Perälä M, van Mourik JBA, Warman ML, et al. A mutation in the gene encoding the α2 chain of the fibril-associated collagen IX, COL9A2, causes multiple epiphyseal dysplasia (EDM2). Nat Genet. 1996. 12:103-5. 34 Narkis G, Ofir R, Manor E, Landau D, Elbedour K, Birk OS. Lethal congenital contractural syndrome type 2 (LCCS2) is caused by a mutation in ERBB3 (Her3), a modulator of the phosphatidylinositol-3-kinase/Akt pathway. Am J Hum Genet. 2007. 81:589–95. Nation TR, Balic A, Southwell BR, Newgreen DF, Hutson JM. The hormonal control of testicular descent. Pediatr Endocrinol Rev. 2009. 7:22-31. Nef S, Parada LF. Cryptorchidism in mice mutant for Insl3. Nat Genet. 1999. 22:295-9. Nightingale SS, Western P, Hutson JM. The migrating gubernaculum grows like a "limb bud". J Pediatr Surg. 2008. 43:387-90. Nones K, Ledur MC, Zanella EL, Klein C, Pinto LF, Moura AS, et al. Quantitative trait loci associated with chemical composition of the chicken carcass. Anim Genet. 2012. 43:570-6. Ozkan B. Nutritional rickets. J Clin Res Pediatr Endocrinol. 2010. 2: 137- 43. Paassilta P, Lohiniva J, Annunen S, Bonaventure J, Le Merrer M, Pai L, et al. COL9A3: a third locus for multiple epiphyseal dysplasia. Am J Hum Genet. 1999. 64:1036-44. Palaisa K, Morgante M, Tingey S, Rafalski A. "Long-range patterns of diversity and linkage disequilibrium surrounding the maize Y1 gene are indicative of an asymmetric selective sweep". Proc Natl Acad Sci U.S.A. 2004. 101: 9885-90. Parker HG, VonHoldt BM, Quignon P, Margulies EH, Shao S, Mosher DS, et al. An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science. 2009. 325:995-8. Pask AJ, Kanasaki H, Kaiser UB, Conn PM, Janovick JA, Stockton DW, et al. A novel mouse model of hypogonadotrophic hypogonadism: N-ethyl-N-nitrosourea-induced gonadotropin-releasing hormone receptor gene mutation. Mol Endocrinol. 2005. 19:972-81. Porada CD, Tran N, Eglitis M, Moen RC, Troutman L, Flake AW, et al. In utero gene therapy: transfer and long-term expression of the bacterial neo(r) gene in sheep after direct injection of retroviral vectors into preimmune fetuses. Hum Gene Ther. 1998. 9: 1571-85. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010. 11:459-63. Review. Przewratil P, Paduch DA, Kobos J, Niedzielski J. Expression of estrogen receptor alpha and progesterone receptor in children with undescended testicle previously treated with human chorionic gonadotropin. J Urol. 2004. 172:1112-6. Raadsma HW, ISGC Int. Sheep Genomics Consortium. Linkage Disequilibrium In The Sheep Genome: Findings From The ISGC HapMap Initiative. Abstract:W134. Plant & Animal genomes XVIII Conference, Jan. 9-30, 2010. San Diego, CA. Radpour R, Rezaee M, Tavasoly A, Solati S, Saleki A. Association of long polyglycine tracts (GGN repeats) in exon 1 of the androgen receptor gene with cryptorchidism and penile hypospadias in Iranian patients. J Androl. 2007. 28:164-9. 35 Ramchandra R, Hood S, Frithiof R, McKinley MJ, May CN. The Role of the Paraventricular Nucleus of the Hypothalamus in the Regulation of Cardiac and Renal Sympathetic Nerve Activity in Conscious Normal and Heart Failure Sheep. J Physiol. 2012. Jul 2. Read AP, Thakker RV, Davies KE, Mountford RC, Brenton DP, Davies M, et al. Mapping of human X-linked hypophosphataemic rickets by multilocus linkage analysis. Hum Genet. 1986. 73:267-70. Recabarren SE, Padmanabhan V, Codner E, Lobos A, Durán C, Vidal M, et al. Postnatal developmental consequences of altered insulin sensitivity in female sheep treated prenatally with testosterone. Am J Physiol Endocrinol Metab. 2005. 289:E801-6. Rijli FM, Matyas R, Pellegrini M, Dierich A, Gruss P, Dollé P, et al. Cryptorchidism and homeotic transformations of spinal nerves and vertebrae in Hoxa-10 mutant mice. Proc Natl Acad Sci U S A. 1995. 92:8185-9. Rosen DR, Siddique T, Patterson D, Figlewicz DA, Sapp P, Hentati A, et al. Mutations in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis. Nature. 1993. 362:59-62. Rosendo A, Iannuccelli N, Gilbert H, Riquet J, Billon Y, Amigues Y, et al. Microsatellite mapping of quantitative trait loci affecting female reproductive tract characteristics in Meishan x Large White F(2) pigs. J Anim Sci. 2012. 90:37-44. Rothschild MF, Christian LL, Blanchard W. Evidence for multigene control of cryptorchidism in swine. J Hered. 1988. 79:313-4. Savolainen P, Zhang YP, Luo J, Lundeberg J, Leitner T. "Genetic evidence for an East Asian origin of domestic dogs". Science. 2008. 298: 1610–3. Scheuner MT, Yoon PW, Khoury MJ. Contribution of Mendelian disorders to common chronic disease: opportunities for recognition, intervention, and prevention. Am J Med Genet C Semin Med Genet. 2004. 125C:50-65. Review. Seelow D, Schuelke M, Hildebrandt F, Nürnberg P. HomozygosityMapper--an interactive approach to homozygosity mapping. Nucleic Acids Res. 2009. 37(Web Server issue):W593-9. Short RW. "Why the elephant has intra-abdominal testes" (On-line). 2005. Accessed at http://www.publish.csiro.au/?act=view_file&file_id=SRB03Ab59.pdf (pdf). Singh A, Ezeasor D. Ultrastructure of Sertoli cells in scrotal testis of unilaterally cryptorchid goat. Prog Clin Biol Res. 1989. 296:159-64. Smith KC, Brown PJ, Morris J, Parkinson TJ. Cryptorchidism in North Ronaldsay sheep. Vet Rec. 2007. 161:658-9. Spayde EC, Joshi AP, Wilcox WR, Briggs M, Cohn DH, Olsen BR. Exon skipping mutation in the COL9A2 gene in a family with multiple epiphyseal dysplasia. Matrix Biol. 2000. 19:121-8. 36 Srinivasan K, Ramarao P. Animal models in type 2 diabetes research: an overview. Indian J Med Res. 2007. 125: 451-72. St Jean G, Gaughan EM, Constable PD. Cryptorchidism in North American cattle: breed predisposition and clinical findings. Theriogenology. 1992. 38:951-8. Stranger BE, Stahl EA, Raj T. Progress and promise of genome-wide association studies for human complex trait genetics. Genetics. 2011. 187:367-83. Review. Sutter NB, Bustamante CD, Chase K, Gray MM, Zhao K, Zhu L, et al. A single IGF1 allele is a major determinant of small size in dogs. Science. 2007. 316:112-5. Tan YD, Wan C, Zhu Y, Lu C, Xiang Z, Deng HW. An amplified fragment length polymorphism map of the silkworm. Genetics. 2001. 157:1277-84. Teo YY, Small KS, Kwiatkowski DP. Methodological challenges of genome-wide association analysis in Africa. Nat Rev Genet. 11:149-60. Thomas JT, Lin K, Nandedkar M, Camargo M, Cervenka J, Luyten FP. A human chondrodysplasia due to a mutation in a TGF-beta superfamily member. Nat Genet. 1996. 12:315-7. Thonneau PF, Gandia P, Mieusset R. Cryptorchidism: incidence, risk factors, and potential role of environment; an update. J Androl. 2003. 24:155-62. Review. Tieder M, Modai D, Samuel R, Arie R, Halabe A, Bab I, et al. Hereditary hypophosphatemic rickets with hypercalciuria. N Engl J Med. 1985. 312 : 611-7. Vandenberg P, Khillan JS, Prockop DJ, Helminen H, Kontusaari S, Ala-Kokko L. Expression of a partially deleted gene of human type II procollagen (COL2A1) in transgenic mice produces a chondrodysplasia. Proc Natl Acad Sci USA. 1991. 88:7640-4. Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Rosengren Pielberg G, Sigurdsson S, et al. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet. 2011. 7:e1002316. Vinet A, Drouilhet L, Bodin L, Mulsant P, Fabre S, Phocas F. Genetic control of multiple births in low ovulating mammalian species. Mamm Genome. 2012. Aug 8. [Epub ahead of print]. Vissing H, D'Alessio M, Lee B, Ramirez F, Godfrey M, Hollister DW. Glycine to serine substitution in the triple helical domain of pro-alpha 1 (II) collagen results in a lethal perinatal form of short-limbed dwarfism. J Biol Chem. 1989. 264:18265-7. Wang K, Li WD, Zhang CK, Wang Z, Glessner JT, Grant SF, et al. A genome-wide association study on obesity and obesity-related traits. PLoS One. 2011. 6:e18939. Wang N, Fang L, Xin H, Wang L, Li S. Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing. BMC Plant Biol. 2012. 12:148. 37 Wang Y, Barthold J, Kanetsky PA, Casalunovo T, Pearson E, Manson J. Allelic variants in HOX genes in cryptorchidism. Birth Defects Res A Clin Mol Teratol. 2007. 79:269-75. Wayne RK, Ostrander EA. Lessons learned from the dog genome. Trends Genet. 2007. 23:557-67. Woelfle JV, Brenner RE, Zabel B, Reichel H, Nelitz M. Schmid-type metaphyseal chondrodysplasia as the result of a collagen type X defect due to a novel COL10A1 nonsense mutation: A case report of a novel COL10A1 mutation. J Orthop Sci. 2011. 16:245-9. Wylie MC, Johnson VM, Carpino E, Mullen K, Hauser K, Nedder A, et al. Respiratory, neuromuscular, and cardiovascular effects of neosaxitoxin in isoflurane-anesthetized sheep. Reg Anesth Pain Med. 2012. 37:152-8. Yates D, Hayes G, Heffernan M, Beynon R. Incidence of cryptorchidism in dogs and cats. Vet Rec. 2003. 152:502-4. Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008. 40:638-45. Zhuang Z, Gusev A, Cho J, Pe'er I. Detecting identity by descent and homozygosity mapping in whole-exome sequencing data. PLoS One. 2012.7:e47618 Zimmermann S, Schöttler P, Engel W, Adham IM. Mouse Leydig insulin-like (Ley I-L) gene: structure and expression during testis and ovary development. Mol Reprod Dev. 1997. 47:308. Zimmermann S, Steding G, Emmen JM, Brinkmann AO, Nayernia K, Holstein AF, et al. Targeted disruption of the Insl3 gene causes bilateral cryptorchidism. Mol Endocrinol. 1999. 13:681-91. Zuccarello D, Morini E, Douzgou S, Ferlin A, Pizzuti A, Salpietro DC, et al. Preliminary data suggest that mutations in the CgRP pathway are not involved in human sporadic cryptorchidism. J Endocrinol Invest. 2004. 27:760-4. 38 CHAPTER 2. A NOVEL NONSENSE MUTATION IN THE DMP1 GENE IDENTIFIED BY A GENOME-WIDE ASSOCIATION STUDY IS RESPONSIBLE FOR INHERITED RICKETS IN CORRIEDALE SHEEP A paper published in PLoS One 2011;6:e21739. Xia Zhao1 #, Keren E Dittmer2 #, Hugh T Blair2, Keith G Thompson2, Max F Rothschild1, Dorian J Garrick1,2 * 1 Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, IA 50011, USA 2 Institute of Veterinary, Animal & Biomedical Sciences, Massey University, Palmerston North 4474, New Zealand # These two authors contributed equally to this work. ABSTRACT Inherited rickets of Corriedale sheep is characterized by decreased growth rate, thoracic lordosis and angular limb deformities. Previous outcross and backcross studies implicate inheritance as a simple autosomal recessive disorder. A genome wide association study was conducted using the Illumina OvineSNP50 BeadChip on 20 related sheep comprising 17 affected and 3 carriers. A homozygous region of 125 consecutive single-nucleotide polymorphism (SNP) loci was identified in all affected sheep, covering a region of 6 Mb on ovine chromosome 6. Among 35 candidate genes in this region, the dentin matrix protein 1 gene (DMP1) was sequenced to reveal a nonsense mutation 250C/T on exon 6. This mutation introduced a stop codon (R145X) and could truncate C-terminal amino acids. 39 Genotyping by PCR-RFLP for this mutation showed all 17 affected sheep were “T T” genotypes; the 3 carriers were “C T”; 24 phenotypically normal related sheep were either “C T” or “C C”; and 46 unrelated normal control sheep from other breeds were all “C C”. The other SNPs in DMP1 were not concordant with the disease and can all be ruled out as candidates. Previous research has shown that mutations in the DMP1 gene are responsible for autosomal recessive hypophosphatemic rickets in humans. Dmp1_knockout mice exhibit rickets phenotypes. We believe the R145X mutation to be responsible for the inherited rickets found in Corriedale sheep. A simple diagnostic test can be designed to identify carriers with the defective “T” allele. Affected sheep could be used as animal models for this form of human rickets, and for further investigation of the role of DMP1 in phosphate homeostasis. Keywords: nonsense mutation; DMP1 gene; inherited rickets; sheep; genetic disease INTRODUCTION Rickets is a metabolic bone disease in humans and animals with most cases caused by a nutritional deficiency of either vitamin D or phosphorus. This disease leads to softening and weakening of bone caused by defective mineralization of cartilage at sites of endochondral ossification and potentially causes fractures and limb deformities [1], [2], [3]. Rickets may also result from genetic mutations that disrupt genes whose functions are critical for normal bone metabolism. Five types of rickets have been described in humans with common clinical characteristics of renal phosphate wasting, including X-linked hypophosphatemic rickets (XLH), autosomal dominant hypophosphatemic rickets (ADHR), autosomal recessive hypophosphatemic rickets (ARHR) type 1 and 2, and hereditary hypophosphataemic rickets 40 with hypercalciuria (HHRH). A large number of different mutations comprising single base pair deletions, nonsense and missense mutations have been identified in PHEX (phosphate regulation gene with homologies to endopeptidases on the X chromosome) as being responsible for XLH [4], [5], [6]. ADHR is caused by gain of function mutation leading to increased activity of fibroblast growth factor 23 encoded by gene FGF23 [7], and mutations in the dentin matrix protein 1 gene (DMP1) and ecto-nucleotide pyrophosphatase/phosphodiesterase 1 (ENPP1) have been identified in autosomal recessive hypophosphataemic rickets type 1 and 2, respectively [8], [9], [10]. The mutation for HHRH occurs in the gene for the NaPi-IIc protein (or SLC34A3, human chromosome 9q34), a renal Na-P co-transporter [11]. Two other recessively inherited forms of rickets in humans are due to the disruptions of either the synthesis or metabolism of vitamin D. Vitamin D-dependent rickets type I (VDDR-I) is caused by mutations in the CYP27B1 gene, which encodes vitamin D 1-alpha-hydroxylase [12]. Loss of function mutations in the vitamin D receptor gene (VDR) are the genetic basis for vitamin D-dependent rickets type II (VDDR-II), which is also called hereditary vitamin D - resistant rickets (HVDRR) [13], [14]. Inherited forms of rickets have been uncovered frequently in humans but were rare in domestic animals. Recently, an inherited form of rickets has been described in purebred Corriedale sheep from a commercial flock in New Zealand, with an incidence of up to 20 lambs out of 1,600 over a 2-year period. Affected sheep were characterized by decreased growth rate, thoracic lordosis and angular limb deformities (Figure 1) [15], with low serum calcium and phosphate concentrations and normal 25 hydroxyvitamin D and 1,25 dihydroxyvitamin D3 concentrations [16]. Embryo transfer and backcross breeding trials have determined that this disease is likely a simple autosomal recessive disorder [16]. 41 Large animal models of human genetic diseases have received extensive clinical attention due to their advantages over rodent models, including greater physiological similarity to human patients, a longer life span and larger body size. Sheep have been used as an animal model for gene therapy (GT) of human diseases, such as post-traumatic osteoarthritis [17] and steroid-induced ocular hypertension [18]. The convenience of performing in utero GT prior to disease onset will help to achieve long-term expression of the exogenous gene without modifying the germ line of the recipient [19]. Based on the hypothesis that inherited rickets-affected Corriedale sheep could be a potential model for human rickets; and a genetic marker could help sheep breeders to avoid at-risk matings, we performed a genome-wide association study by genotyping 54,241 evenly distributed single nucleotide polymorphisms (SNPs) for 17 affected Corriedale sheep and 3 carriers using the Illumina Ovine SNP50 BeadChip. It has been shown that the analysis of genome-wide high-density SNPs is an effective strategy to map and identify causal mutations for recessive diseases [20]. RESULTS Homozygosity Mapping The genotyping data using a SNP BeadChip for this study was submitted to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under Super Series accession no GSE26310. In our study, exactly 3 homozygous regions of more than 10 consecutive SNPs were identified in all 17 affected sheep whereas the 3 carriers exhibited heterozygosity in those regions. The largest homozygous segment consisted of 125 consecutive SNP loci, whereas the other two segments had only 19 or 11 consecutive SNPs. The largest 125-SNP region started from SNP OAR6_109334543.1 and extended to SNP s03175.1, covering a region of 5.95 Mb (109,334,543 to 115,285,275 bp) on the long arm of sheep chromosome 6 42 (OAR 6). There were 35 genes located in this region based on the bovine reference sequence (Figures 2a, 2b and Table S2). The size of the second homozygous segment was about 0.70 Mb (OAR6: 118,884,834 to 119,587,883 bp) and had only 3.2 Mb separation from the first homozygous segment on the same chromosome (OAR 6). The third homozygous segment covered a 0.66 Mb region (OAR15: 1,131,166 to 1,654,496 bp) on the short arm of OAR 15. Within the second and third homozygous regions there were 8 and 6 positional candidate genes, respectively (Table S2). One gene called DMP1 in the largest 125-SNP region was considered the most plausible candidate due to its biological functions involved in mineralization and phosphate homeostasis and previous reports of its involvement in human rickets [8], [21]. By comparing with the bovine genomic reference sequence it was found that this gene was at 112.20 Mb on OAR6q (Table S2), spanned over 16 kb of size and contained 6 exons. Candidate Gene Sequencing Fine mapping was carried out by individual PCR amplification and sequencing of all the exons of the DMP1 gene to identify the mutation for this disease (Table S1). After blasting the sheep genomic reference sequence with bovine DMP1 coding sequence (ENSBTAT00000025468), there were several alignments for exon 6 with high identity rates (91%-96%). As exon 6 is the last and also the largest exon covering 88% of the open reading frame of DMP1, it was prioritized for re-sequencing. Direct sequencing from 3 carriers and 1 normal control revealed 1 published SNP (s18093.1, OAR6: 112,213,812 bp) and 8 novel ones in this exon (Figure 2e). The complete genomic sequence of exon 6 for sheep DMP1 gene was submitted to GenBank (http://www.ncbi.nlm.nih.gov/genbank/) under accession no. HQ600592. 43 Moreover, we successfully amplified exon 1, exon 3, exon 4 and exon 5 of sheep DMP1 gene using exon flanked intronic sequences of cow reference for each exon. The amplifications of exon 2 failed after using several pairs of primers. Intronic SNPs (SNP1 and SNP2) and a 1bp insertion/deletion (indel: A) were identified respectively in intron 1, intron 5 and intron 3 by direct sequencing (Table S1). Mutation Analysis Seventeen base pairs upstream of SNP s18093.1, a SNP (C/T, OAR6:112,213,795 bp) located at +250bp of exon 6 caused a non-synonymous amino acid change in the DMP1 protein. This C -> T transition introduced a stop codon (R145X) that led to a truncated DMP1 protein at the 145th amino acid (Figure 3a, Figure 3b and Figure S1). Genotyping of this mutation by PCR-RFLP (Polymerase Chain Reaction-Restriction Fragment Length Polymorphism) (Figure 3c) and direct sequencing showed all 17 affected animals had the “T T” genotype; the 3 carriers were “C T”; 24 phenotypically normal related sheep were either “C T” or “C C”; and 46 unrelated control sheep of other breeds were only “C C” genotypes (Table 1). This mutation was 100% concordant with the recessive pattern of inheritance in affected, carrier and normal individuals. Moreover, genotyping of the other 8 SNPs in exon 6 which surrounded the R145X mutation in DMP1, found incomplete linkage disequilibrium for these markers with either the disease or the mutant SNP (Figure 2f). The 3 genomic variants in introns 1, 3 and 5 identified were not located at either the splicing receptor or acceptor sites. Therefore, those mutations could be ruled out as candidate mutation loci. RTPCR/RFLP (Figure 3d) was performed for the R145X mutation and showed the same digestion pattern as PCR-RFLP. 44 Taken together, our results strongly suggest that the R145X substitution is causative for inherited rickets in Corriedale sheep, disrupting the function of the DMP1 protein, preventing successful bone mineralization and phosphate homeostasis. DISCUSSION Homozygosity mapping is a powerful method to map simple recessive traits since the regions immediately flanking the mutation locus will likely be homozygous by descent in closely related offspring. The affected animals in this study resulted from consanguineous mating between pairs of offspring descended from one common male ancestor in the embryo transfer and backcross studies used to characterize the inheritance of the disease [15], [16]. The region containing the mutation should in these circumstances span a large chromosome segment that would be progressively diminished in subsequent meioses when recombination events occur near to the mutation. This mapping strategy has been used successfully for recessive diseases in both animal and human studies [20], [22]. The homozygosity mapping in this study was conducted from knowledge of the SNP genotypes in affected individuals and their genomic location, and used command line UNIX tools including awk, sort and join to identify any regions of at least 10 consecutive homozygous SNPs, equivalent to about 0.5 cM or more. It was presumed that one such region would contain the mutation associated with inherited rickets in Corriedale sheep. In this study, 3 homozygous regions met the criteria of exceeding 10 consecutive homozygous SNP markers. Previously reported rickets genes including PHEX, FGF23, ENPP1, SLC34A3, CYP27B1 and VDR did not show up in any of these homozygous segments based on the ovine genome assembly v1.0, whereas DMP1 was contained in the largest region. This is not surprising because PHEX or FGF23 related rickets have a distinctive inheritance mode that differs from the autosomal recessive 45 inheritance of the rickets studied here. The severe hypophosphatemia in affected sheep indicates disturbed phosphate balance within the body. Therefore, the DMP1 gene was considered to be a high priority candidate gene. The nonsense mutation we identified in the DMP1 gene was 100% concordant with the recessive pattern of inheritance for the disease studied. DMP1 encodes a serine-rich acidic protein and is a member of Small Integrin-Binding Ligand, N-linked, Glycoprotein family (SIBLING) which mineralizes tissue such as bones and teeth by binding strongly to hydroxyapatite using an Arg-Gly-Asp (RGD) tripeptide [23]. Dual functions of the DMP1 protein have been reported based on the in vitro studies. The nonphosphorylated cytoplasmic DMP1 protein translocates to the cell nucleus and initially acts as a transcriptional factor to enhance gene transcription of the osteoblast-specific genes such as alkaline phosphatase and osteocalcin, and then it moves out to the extracellular matrix during the osteoblast to osteocyte transition phase, to promote mineralization and phosphate homeostasis [24], [25]. Loss of function mutations in the human DMP1 gene have been shown to cause autosomal recessive hypophosphatemic rickets in different ethnic groups including Turkish, consanguineous Spanish, Lebanese and Japanese populations. These variant mutations in DMP1 include deletions in exon 6, nucleotide substitution in the splice acceptor sequence of intron 2, and missense mutations in exon 2 or exon 3 that introduce the premature termination codon [8], [21], [26]. The Dmp1-null mouse model also exhibits hypophosphatemic rickets, which indicates the important role of the DMP1 gene in the development of normal bone formation [21]. Since the majority of the dentin matrix protein 1 is encoded by the sixth and last exon of DMP1, typical for proteins in the SIBLING family [27], the novel nonsense mutation (250C/T, R145X) identified here in ovine DMP1 46 gene is a plausible mutation leading to loss of function following nonsense-mediated mRNA decay shown to occur in human patients. The nonsense-mediated mRNA decay is a posttranscriptional mechanism in eukaryotic species. It likely eliminates the abnormal transcripts produced by prematurely terminated translation [28]. The previous studies strongly support the nonsense mutation we identified in exon 6 as causing loss of function of the DMP1 protein and being responsible for ARHR type 1 in Corriedale sheep. Despite the wide variety of mutated genes that result in the different forms of hereditary hypophosphatemic rickets, the clinical features are relatively similar (with the exception of HHRH). In XLH, ADHR, ARHR type 1 and 2, the overriding common feature is increased serum concentration and activity of FGF23. FGF23 inhibits a NPT-2a co-transporter (Na-Pi co-transporter) in the kidneys leading to loss of phosphate from renal tubules into the urine (phosphaturia). FGF23 also inhibits the activity of CYP27B1, thereby inhibiting active vitamin D (1,25-dihydroxyvitamin D3) production [29]. As a result of the increased FGF23 concentrations patients with XLH, ADHR, ARHR type 1 and 2 present with rickets, hypophosphatemia, phosphaturia and inappropriately normal serum 1,25-dihydroxyvitamin D3 concentrations [9], [30]. Corriedale sheep with the mutation in DMP1 also developed rickets, were severely hypophosphataemic and had normal serum 1,25(OH)2D3 concentrations [1], [16]. The mechanism by which mutations in PHEX, DMP1 or ENPP1 result in upregulation of FGF23 is unknown. An unusual feature of ARHR type 2 is that mutations in ENPP1 are also associated with generalized arterial calcification of infancy, again the mechanism by which a mutation in ENPP1 can result in a disease characterised by hypomineralization and one characterised by hypermineralization is unknown [9]. In HHRH, 47 CYP27B1 is actually upregulated, resulting in increased 1,25(OH)2D3 concentrations, and perhaps explaining the hypercalciuria [11]. In conclusion, this is the first report of ARHR type 1 caused by a mutation in the DMP1 gene in a sheep population. In this study, we demonstrate successful use of applying high density SNP arrays for localization of a mutation locus causing a recessive inherited defect. The results can be rapidly applied as a selection marker to identify carriers with the defective “T” allele in order to avoid at-risk matings to improve animal welfare and decrease economic losses. At this stage, the incidence of the mutation in the Corriedale sheep population is unknown, however it is suspected to be widespread and testing of New Zealand Corriedale sheep is planned. More importantly, due to their size and relatively low cost of maintenance, Corriedale sheep with inherited rickets could be used as an animal model for this form of human rickets. Even though the human and sheep mutations are not exactly the same, both disrupt the function of the DMP1 protein. The sheep model can be used for further investigation of the role of DMP1 in phosphate homeostasis and to explore the potential for early medical intervention to reduce the development of the disease in homozygous individuals. MATERIALS and METHODS Ethics statement Approvals dealing with DNA or tissue collection from animals were obtained from the Massey University Animal Ethics Committee (APPROVAL NUMBER 05/123) and Iowa State University Animal Care & Use Committee did not require additional approvals. Animals 48 In total, 17 affected Corriedale sheep (9 males, 8 females) and 27 phenotypically normal crossbred Corriedale sheep (6 males, 21 females) as well as 46 normal controls (Texel and crossbred Perendale sheep) were examined. The age of examined animals ranged from a fetus of 133 days gestation to a 3 year-old animal. Eight affected sheep were from the original commercial property at Marlborough, New Zealand, and 9 affected sheep were their offspring. The 27 phenotypically normal crossbred Corriedale sheep were progeny from the mating of unrelated Romney ewes with the Corriedale carrier ram (F1 generation from outcross), or progeny of the F1 generation daughters mated back to the carrier ram (F2 generation from backcross). SNP genotypes were obtained for the 17 affected and 3 carrier Corriedale sheep, while fine mapping was conducted on all 44 pure or crossbred Corriedale sheep. The 46 normal control sheep used for validation of the mutation were from unrelated populations without evidence of rickets. SNP genotyping DNA was extracted from blood using a Roche MagNA Pure automated analyser. DNA samples were randomized on genotyping plates according to disease status. The genotyping was performed using the Ovine SNP50 BeadChip containing 54,241 SNPs (Illumina, San Diego, CA, USA) with standard procedures at GeneSeek Inc. Lincoln, NE, USA. The SNPs retained had to get a call rate > 80% and a minimum GenTrain score of 0.25. All genotypes with satisfactory call rate and GenTrain score were used regardless of minor allele frequency (MAF). SNP chip data analyses The IBD program used awk, join and sort UNIX commands to scan the whole genome of the genotyped affected cases, to produce a list of all the strings of consensus homozygous 49 markers with a length of more than 10 SNPs. The same process was repeated in the carriers to ensure the regions identified in the affected individuals were not homozygous in the carriers. The homozygosity mapping algorithm involved the following: first, SNPs in the Illumina ovine 50K map file were sorted by chromosome and position, and numerically coded in contiguous order. Second, the names of the “A/B” genotypes (provided by GeneSeek Inc.) for each individual were joined with the renumbered map file to identify the genotypes by their locations. The joined file was separated into three groups according to the disease status of affected, carrier and normal. Third, the number of occurrences of genotypes (“AA”, “AB”, “BB” and no call “--”) for each SNP was counted in the affected and carrier groups. Fourth, the genotype frequencies were calculated for each SNP. In the frequency calculation, the numerator was the count of genotypes being either “AA”, “AB” or “BB”, and the denominator was the total count excluding the category of no call. Finally, if the highest genotype frequency for a SNP was equal to 1 and the two alleles of this genotype were the same, such as “AA” or “BB”, this SNP was considered a homozygous locus. The span of contiguous homozygous loci were accumulated and reported if the region included more than 10 loci. Genes in these IBD regions were examined for potential involvement using comparative maps between the ovine and bovine genomes through the Ovine Genome Assembly v1.0 (http://www.livestockgenomics.csiro.au/perl/gbrowse.cgi/oar1.0/#search). DMP1 gene sequencing Primers were designed to amplify all 6 exons for sheep DMP1 (Table S1). Cow DMP1 genomic sequence was used as a template for primer designing in order to amplify the first 5 exons of sheep DMP1 gene. PCR was performed using a 10 µl system with GoTaq DNA polymerase (Promega, Madison, WI). The annealing temperatures were 58° C or 60° C. 50 PCR products were purified with ExoSAP-IT® (USB, Cleveland, OH), before pooling from 3 carriers and sequencing commercially using a 3730xl DNA Analyzer. Mutation analyses Sheep EST sequence (DY509762) encompassing exon 1 to exon 5 of sheep DMP1 along with re-sequencing results were used to predict the DMP1 protein sequence by applying the ExPASy translate tool (http://ca.expasy.org/tools/dna.html). The predicted sheep DMP1 protein was compared to the wild type (normal) of sheep and cow (AAI49017) using the CLUSTALW alignment tool (http://align.genome.jp/). Each SNP variant was analyzed to check whether it could induce a missense or nonsense mutation. The identified SNPs were genotyped on all 44 related Corriedale sheep and/or 46 normal controls by restriction enzyme digestion (PCR-RFLP) or direct sequencing. The genotyping results from all 9 SNPs distributed in exon 6 of the DMP1 gene were formatted and put into Haploview software (http://www.broadinstitute.org/haploview/haploview). The linkage disequilibrium (LD) between pairs of SNPs in exon 6 was analyzed by Haploview software. Under the LD tabs, r2, the correlation coefficient, was saved to show the LD between each pair of SNPs. The related LD color scheme was produced accordingly. RNA extraction and Reverse transcriptase-polymerase chain reaction (RT-PCR) RNA was extracted from fibroblast cells cultured in vitro from skin biopsies of affected and control sheep to confirm the mutation existed at the transcript level. The fibroblast cell culture is as previously described [16]. Total RNA was extracted using Tri Reagent as per the manufacturer’s instructions (Sigma-Aldrich Co., USA). RT-PCR was performed using the Superscript® one-step RT-PCR system with Platinum® Taq polymerase (Invitrogen 51 Corp., USA). The RT-PCR conditions were: 55ºC for 30 min, 94ºC for 2 min, 40 cycles of 30 s at 94ºC, 30 s at 58º and 30 s at 72ºC, and finally a 5 min elongation step at 72ºC. Restriction enzyme digestion The nonsense mutation (C -> T) created a new recognition site for the restriction enzyme NIaIII (New England Biolabs Inc., USA). DNA fragments from the PCR and RT-PCR reactions were subjected to restriction enzyme digestion with NlaIII. The reaction mix was incubated at 37ºC overnight, and the products were analyzed on a 2.5% (w/v) ultra-pure agarose gel. REFERENCES 1. Dittmer KE, Thompson KG, Blair HT (2009) Pathology of inherited rickets in Corriedale sheep. J Comp Pathol 141: 147-155. 2. Fitch LWN (1943) Rickets in hoggets, with a note on the aetiology and definition of the disease. Australian Vet J 19:2. 3. Ozkan B (2010) Nutritional rickets. J Clin Res Pediatr Endocrinol 2: 137-143. 4. HYP Consortium (1995) A gene (HYP) with homologies to endopeptidases is mutated in patients with X-linked hypophosphatemic rickets. Nat Genet 11: 130-136. 5. Sabbagh Y, Jones AO, Tenenhouse HS (2000) PHEXdb, a locus-specific database for mutations causing X-linked hypophosphatemia. Hum. Mutat 16: 1-6. 6. Ichikawa S, Traxler EA, Estwick SA, Curry LR, Johnson ML, et al. (2008) Mutational survey of the PHEX gene in patients with X-linked hypophosphatemic rickets. Bone 43: 663-666. 7. ADHR Consortium (2000) Autosomal dominant hypophosphataemic rickets is associated with mutations in FGF23. Nat Genet. 26:345-348. 8. Lorenz-Depiereux B, Bastepe M, Benet-Pages A, Amyere M, Wagenstaller J, et al. (2006) DMP1 mutations in autosomal recessive hypophosphatemia implicate a bone matrix protein in the regulation of phosphate homeostasis. Nat Genet 38: 1248-1250. 9. Lorenz-Depiereux B, Schnabel D, Tiosano D, Hausler G, Strom TM (2010) Loss-offunction ENPP1 mutations cause both generalised arterial calcification of infancy and autosomal-recessive hypophosphatemic rickets. Am J Hum Genet 86: 267-272. 10. Levy-Litan V, Hershkovitz E, Avizov L, Leventhal N, Bercovich D, et al. (2010) Autosomal-recessive hypophosphatemic rickets is associated with an inactivation mutation in the ENPP1 gene. Am J Hum Genet 86: 273-278. 52 11. Bergwitz C, Roslin NM, Tieder M, Loredo-Osti JC, Bastepe M, et al. (2006) SLC34A3 mutations in patients with hereditary hypophosphatemic rickets with hypercalciuria predict a key role for the sodium-phosphate cotransporter NaPi-IIc in maintaining phosphate homeostasis. Am J Hum Genet 78: 179-192. 12. Alzahrani AS, Zou M, Baitei EY, Alshaikh OM, Al-Rijjal RA, et al. (2010) A novel G102E mutation of CYP27B1 in a large family with vitamin D-dependent rickets type 1. Clin Endocrinol Metab 95: 4176-4183. 13. Malloy PJ, Pike JW, Feldman D (1999) The vitamin D receptor and the syndrome of hereditary 1,25-dihydroxyvitamin D-resistant rickets. Endo Rev 20: 156-188. 14. Malloy PJ, Xu R, Peng L, Peleg S, Al-Ashwal A, et al. (2004). Hereditary 1,25dihydroxyvitamin D resistant rickets due to a mutation causing multiple defects in vitamin D receptor function. Endocrine 145: 5106-5114. 15. Thompson KG, Dittmer KE, Blair HT, Fairley RA, Sim DF (2007) An outbreak of rickets in Corriedale sheep: evidence for a genetic aetiology. N Z Vet J 55: 137-142. 16. Dittmer KE, Howe L, Thompson KG, Stowell KM, Blair HT, et al. (2010) Normal vitamin D receptor function with increased expression of 25-hydroxyvitamin D(3)24-hydroxylase in Corriedale sheep with inherited rickets. Res Vet Sci 91:362-369. 17. Hurtig M, Frederic D, Darilyn F, Betina L, Antonio C, et al. (2008) BMP-7 Gene Therapy for Mitigation of Post-traumatic Osteoarthritis in Sheep. Eur Cell Mater 16: 52 18. Gerometta R, Spiga MG, Borras T, Candia OA (2010) Treatment of sheep steroidinduced ocular hypertension with a glucocorticoid-inducible MMP1 gene therapy virus. Invest Ophthalmol Vis Sci 51: 3042-3048. 19. Porada CD, Tran N, Eglitis M, Moen RC, Troutman L, et al. (1998) In utero gene therapy: transfer and long-term expression of the bacterial neo(r) gene in sheep after direct injection of retroviral vectors into preimmune fetuses. Hum Gene Ther 9: 1571-1585. 20. Charlier C, Coppieters W, Rollin F, Desmecht D, Agerholm JS, et al. (2008) Highly effective SNP-based association mapping and management of recessive defects in livestock. Nat Genet 40: 449-454. 21. Feng JQ, Ward LM, Liu S, Lu Y, Xie Y, et al. (2006) Loss of DMP1 causes rickets and osteomalacia and identifies a role for osteocytes in mineral metabolism. Nat Genet 38: 1310-1315. 22. Lander ES, Botstein D (1987) Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science 236: 1567-1570. 23. Fisher LW, Fedarko NS (2003) Six genes expressed in bones and teeth encode the current members of the SIBLING family of proteins. Connect Tissue Res 44 Suppl 1: 33-40. 24. Narayanan K, Ramachandran A, Hao J, He G, Park KW, et al. (2003) Dual functional roles of dentin matrix protein 1. Implications in biomineralization and gene transcription by activation of intracellular Ca2+ store. J Biol Chem 278: 1750017508. 25. Schiavi SC (2006) Bone talk. Nat Genet 38: 1230-1231. 53 26. Koshida R, Yamaguchi H, Yamasaki K, Tsuchimochi W, Yonekawa T, et al. (2010) A novel nonsense mutation in the DMP1 gene in a Japanese family with autosomal recessive hypophosphatemic rickets. J Bone Miner Metab 28: 585-590. 27. Fisher LW, Torchia DA, Fohr B, Young MF, Fedarko NS (2001) Flexible structures of SIBLING proteins, bone sialoprotein, and osteopontin. Biochem Biophys Res Commun 280: 460-465. 28. Maquat LE (2004) Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol 5: 89-99. 29. Bergwitz C, Jüppner H (2010) Regulation of Phosphate Homeostasis by PTH, Vitamin D, and FGF23. Annu Rev Med 61: 91-104. 30. Dittmer KE, Thompson KG (2011) Vitamin D metabolism and rickets in domestic animals: a review. Vet Pathol 48: 389-407. 54 Table 1. The genotyping results of the causative mutation (R145X) Sheep Breed Corriedale Phenotype Rickets Status Affected Known Carrier Phenotype Normala Genotype # Sheep 17 3 24 Genotype TT TC TC,CC # Sheep 17 3 24 Rate of Concordance 100% 100% 100% Texel, Normal 46 CC 46 100% Perendale a These sheep are phenotypically normal offspring from F1 generation and F2 backcross generations generated by out-crossing and back-crossing trials. Figure 1. Corriedale sheep with rickets. (a) 1½-year-old Corriedale sheep with inherited rickets showing angular limb deformities of the forelimbs and lordosis of the spine in the mid-thoracic region (white arrow). (b) 1½-year-old Corriedale sheep with inherited rickets showing combined varus and valgus or “windswept” deformity of the forelimbs. 55 Figure 2. Work flow chart for discovering the mutation locus of inherited rickets in Corriedale sheep. (a) The sheep chromosome 6 (OAR 6). (b) The IBD region containing the causative locus is located on OAR 6 between 109 and 115 Mb. (c) The black boxes show 35 cow reference genes encompassing the sheep IBD region based on the comparative genomic maps. (d) & (e) The boxes represent exons of the cow and the predicted sheep DMP1 genes. The solid parts of boxes demonstrate the coding region for DMP1. The vertical lines on exon 6 are representing the mutant SNP (R145X) with the other 8 SNPs. (f) The number and color in each diamond box shows r2 value for each pair of SNPs. Higher numbers and darker colors indicate SNPs are with higher LD. Four SNPs in block 1 show a complete mutual linkage disequilibrium (LD) but not with the causative mutation (R145X). 56 Figure 3. Mutation analyses (a) The chromatogram shows the location of the mutation (Y: C/T). (b) Protein sequence alignments of DMP1. The highlighted R145X substitution induces a stop codon that leads to premature termination of the protein. MUT, R145X mutant; WT, wild type. (c) PCR-RFLP genotyping results for the affected (3 bands) and carrier (4 bands) Corriedale sheep and normal (2 bands) control sheep. M, 1kb marker; A, affected; C, carrier; N, normal. (d) RT-PCR-RFLP results for 2 controls (N1 &N2) and 3 affected sheep (A1, A2 &A3). 57 Figure S1. Complete predicted DMP1 protein sequence alignments among rickets affected sheep, wild type sheep and cow. The amino acid (R145X) with a border is the position where the “C - > T” transition induced a stop codon and lead to a truncated DMP1 protein at the 145th amino acid. Table S1. Primer sequence, annealing temperature, PCR amplicon information and genetic variants identified by sequencing. Primer name Anealing Temp. (ºC) Size (bp) Fragment name Genetic variant (position in the fragment) SNP name SNP characteristic DMP1e1F DMP1e1R DMP1e3F DMP1e3R DMP1e4F DMP1e4R DMP1e5F DMP1e5R DMP1e6_2F a DMP1e6_2Ra 5’- CTTTTCCCATCCTTGGGTCT-3’ 5’- GCACTGTTCTCCCCATCTTT-3’ 5’- CAAAATGTTATCCCCAGACCA-3’ 5’- TTCAGCATTTGCAGTGAAGC-3’ 5’- GGCTAAACACAAGCCAAGGA-3’ 5’- GGAGATTGGGAGGGTCATGT-3’ 5’- CAGGCCATTTGGAAAGTCAT-3’ 5’- GAGGAACATTTAGGGCCACA-3’ 5’- ATGGAAAATGGGGTGACTTG-3’ 5’- CATCCCTTCATCGTCGAACT-3’ 60 560 DMP1e1 C/T (+ 453bp) SNP1 intronic 60 388 DMP1e3 no - no 60 659 DMP1e4 indel (A) (+284bp) - intronic 60 411 DMP1e5 C/G (+ 341bp) SNP2 intronic 58 668 DMP1e6_2 DMP1e6_5F DMP1e6_5R 5’- ATGAGTCCAGGGGTGACAAC-3’ 5’- CAACAATGGGCATCTTTCCT-3’ 58 703 DMP1e6_5 C/T (+455bp) C/T (+472bp) A/G (+493bp) C/G (+101bp) C/T (+134bp) C/T (+140bp) A/G (+167bp) A/C (+218bp) C/T (+602bp) R145X S18033.1 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 exonic exonic exonic exonic exonic exonic exonic exonic 3’ UTR a These primer sets were used for PCR-RFLP genotyping of the mutation R145X. 58 Primer sequence Table S2. The list of ovine positional candidate genes based on the bovine reference gene sequences. Targeted Region (size) 1 (5.95 Mb) Ovine Chromosome Position Bovine Ref. Gene Symbol WDFY3 OAR6:110461394..110535578 OAR6:111226629..111449410 ARHGAP24 PTPN13 OAR6:111458495..111488852 OAR6:111512006..111515283 OAR6:111512014..111515294 OAR6:111512014..111515294 OAR6:111600842..111740902 OAR6:111756587..111811197 OAR6:111885205..111904838 OAR6:111920170..111969527 OAR6:112013056..112050582 OAR6:112063439..112116046 OAR6:112199546..112216300* OAR6:112365289..112411784 OAR6:112588875..112650749 SLC10A6 UBE2D3P LOC783264 LOC783114 AFF1 KLHL8 HSD17B13 HSD17B11 NUDT9 SPARCL1 DMP1* MAN2B2 PPP2R2C OAR6:112757862..112757903 OAR6:112759544..112781041 OAR6:112780837..112910375 OAR6:113024908..113058856 OAR6:113144679..113208484 LOC617028 LOC782334 JAKMIP1 LOC615433 CRMP1 OAR6:113213443..113312073 OAR6:113335352..113473188 OAR6:113451908..113473188 OAR6:113787927..114038860 OAR6:114257470..114261815 OAR6:114391388..114395645 OAR6:114700887..114900429 OAR6:114899716..114940430 OAR6:114899716..114901567 OAR6:115074361..115095979 OAR6:115101064..115121797 OAR6:115135963..115151278 OAR6:115207964..115221470 OAR6:115249664..115251638 EVC EVC2 LOC786485 STK32B CYTL1 MSX1 STX18 NSG1 LOC789276 ZNF509 LYAR TMEM128 LOC525643 DRD5 Similar to WD repeat and FYVE domain-containing protein 3 (Autophagy-linked FYVE protein) (Alfy) Rho GTPase activating protein 24 Protein tyrosine phosphatase, non-receptor type 13 (APO-1/CD95 (Fas)associated phosphatase) Solute carrier family 10 (sodium/bile acid cotransporter family), member 6 Similar to Putative ubiquitin-conjugating enzyme E2 D3-like protein Similar to ubiquitin-conjugating enzyme E2D 3 Similar to ubiquitin-conjugating enzyme E2D 3 Similar to AF4/FMR2 family, member 1 Similar to KIAA1378 protein, transcript variant 1 Hydroxysteroid (17-beta) dehydrogenase 13 Hydroxysteroid (17-beta) dehydrogenase 11 Nudix (nucleoside diphosphate linked moiety X)-type motif 9 SPARC-like 1 (hevin) Dentin matrix acidic phosphoprotein 1 Similar to mannosidase, alpha, class 2B, member 2 Similar to Serine/threonine-protein phosphatase 2A 55 kDa regulatory subunit B gamma isoform Hypothetical LOC617028 Hypothetical LOC782334 Janus kinase and microtubule interacting protein 1 Hypothetical LOC615433 Similar to Dihydropyrimidinase-related protein 1 (DRP-1) (Collapsin response mediator protein 1) Ellis van Creveld syndrome Ellis van Creveld syndrome 2 Hypothetical LOC786485 Similar to serine/threonine kinase 32B Similar to Cytokine-like protein 1 precursor (Protein C17) Msh homeobox 1 Syntaxin 18 Neuron specific gene family member 1 Hypothetical LOC789276 Similar to hCG2039195, transcript variant 1 Ly1 antibody reactive homolog (mouse) Transmembrane protein 128 Similar to otopetrin Dopamine receptor D5 Bovine Gene Accession No. XM_617252 NM_001102234 NM_174590 NM_001081738 NM_001075135 XM_001249763 XM_001250124 XM_001249488 XM_612186 NM_001046616 NM_001046286 NM_001101096 NM_001034302 NM_174038 XM_601803 XM_001250700 XM_882441 XM_001250977 NM_001102251 XM_867237 XM_580336 NM_174747 NM_173927 XM_001254147 XM_607574 XM_581416 NM_174798 NM_001099719 NM_001077957 XM_001256071 XM_583985 NM_001031769 NM_001034454 XM_603996 XM_604584 59 OAR6:109103268..109364732 Bovine Ref. Gene Name Table S2. (continued) Targeted Region (size) 2 (0.70 Mb) 3 (0.66 Mb) Bovine Ref. Gene Symbol OAR6:118851890..118883051 OAR6:118941179..118943474 OAR6:118976061..119063813 OAR6:119204026..119245137 OAR6:119417009..119417107 OAR6:119417009..119417587 OAR15:1220402..1223574 OAR15:1338164..1344393 OAR15:1418829..1442937 PDE6B GPI7 LOC614799 WDR1 LOC786062 ZNF518B KIAA1826 KBTBD3 AASDHPPT OAR15:1484534..1488246 OAR15:1503540..1575517 OAR15:1596294..1619791 OAR15:1220402..1223574 ANKRD49 MRE11A GPR83 KIAA1826 Bovine Ref. Gene Name Phosphodiesterase 6B, cGMP-specific, rod, beta GPI7 protein Similar to solute carrier family 2 (facilitated glucose transporter), member 9 WD repeat domain 1 (WDR1) Similar to Zinc finger protein 518B-like Similar to Zinc finger protein 518B Hypothetical protein LOC511226 Kelch repeat and BTB (POZ) domain containing 3 Similar to aminoadipate-semialdehyde dehydrogenase-phosphopantetheinyl transferase Ankyrin repeat domain 49 Double-strand break repair protein meiotic recombination 11 homolog A Similar to G protein-coupled receptor 83 Hypothetical protein LOC511226 Bovine Gene Accession No. NM_174418 NM_001024518 XM_866430 NM_001046346 XM_001788422 XM_584181 NM_001046078 XM_602528 XM_001250765 NM_001014965 XM_603439 XM_615580 NM_001046078 60 Ovine Chromosome Position 61 CHAPTER 3. A MISSENSE MUTATION IN AGTPBP1 WAS IDENTIFIED IN SHEEP WITH A LOWER MOTOR NEURON DISEASE A paper published in Heredity 2012;109:156-62 Xia Zhao1, Suneel K Onteru1, Keren E Dittmer2, Kathleen Parton2, Hugh T Blair2, Max F Rothschild1, Dorian J Garrick1,2 1 Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, IA 50011, USA 2 Institute of Veterinary, Animal & Biomedical Sciences, Massey University, Palmerston North 4474, New Zealand Running title: A missense mutation in AGTPBP1 and sheep LMN disease Word count for main text: 22,172 characters Keywords: AGTPBP1; lower motor neuron disease; missense mutation; Purkinje cell degeneration; sheep ABSTRACT A type of lower motor neuron disease inherited as autosomal recessive in Romney sheep was characterized with normal appearance at birth, but with progressive weakness and tetraparesis after the first week of life. Here we carried out genome-wide homozygosity mapping using Illumina OvineSNP50 BeadChips on lambs descended from one carrier ram, including 19 sheep diagnosed as affected and 11 of their parents that were therefore known 62 carriers. A homozygous region of 136 consecutive single-nucleotide polymorphism (SNP) loci on chromosome 2 was common to all affected sheep and it was the basis for searching for the positional candidate genes. Other homozygous regions shared by all affected sheep spanned 8 or fewer SNP loci. The 136-SNP-region contained the sheep ATP/GTP-binding protein 1 (AGTPBP1) gene. Mutations in this gene have been shown to be related to Purkinje cell degeneration (pcd) phenotypes including ataxia in mice. One missense mutation c.2909G>C on exon 21 of AGTPBP1 was discovered which induces an Arg to Pro substitution (p.Arg970Pro) at amino acid 970, a conserved residue for the catalytic activity of AGTPBP1. Genotyping of this mutation showed 100% concordant rate with the recessive pattern of inheritance in affected, carrier, phenotypically normal and unrelated normal individuals. This is the first report showing a mutant AGTPBP1 is associated with a lower motor neuron disease in a large mammal animal model. Our finding raises the possibility of human patients with the same etiology caused by this gene or other genes in the same pathway of neuronal development. INTRODUCTION Motor neuron diseases (MND) are a group of disorders involved with selective degeneration of upper motor neurons (UMN), and/or lower motor neurons (LMN). The incidence of MND varied from 0.6 per 100,000 person-years to 2.4 per 100,000 person-years in different ethnic human populations (Lee, 2010). UMN originate in the cerebral cortex or the brain stem and provides indirect stimulation of the target muscles, while LMN connect the brain stem and spinal cord to muscle fibers, and provide nerve signals between the UMN and muscles. Common characteristics of MND are muscle weakness and/or spastic paralysis. Signs of LMN lesions are represented as muscle atrophy, fasciculations and weakness when 63 associated LMN degenerate. Symptoms like spastic tone, hyperreflexia and upgoing plantar reflex signs appear as UMN degenerate (Lee, 2012). The weakness of respiratory or bulbar muscle can even cause death of MND patients due to respiratory insufficiency (Borasio et al., 1998). Based on the time of onset, types of motor neuron involvement and detailed clinical features, MND in humans can be classified into six types of diseases including amyotrophic lateral sclerosis (ALS), hereditary spastic paraplegia (HSP), primary lateral sclerosis (PLS), spinal bulbar muscular atrophy (SBMA), spinal muscular atrophy (SMA) and lethal congenital contracture syndrome (LCCS). MND are considered incurable and no effective treatment is currently available. However, the gene therapy strategy of performing gene transfer to patients by utilizing adeno-associated virus (AAV) vectors is a promising therapeutic approach (Nizzardo et al., 2012). Previous evidence has shown genetic variants within more than 30 genes might be responsible for MND (Dion et al., 2009). For example, mutations in the Cu/Zn-binding superoxide dismutase (SOD1) gene have been identified in about 20% of patients with familial ALS (Rosen et al., 1993). However, genetic causes still remain unclear for more than half of human patients with MND. In 1999, an extended family of Romney lambs was found to have a suspected hereditary MND, characterized by poor muscle tone and progressive weakness from about 1 week of age, leading to severe tetraparesis, recumbency and muscle atrophy. Lambs remained alert up to four weeks of age when they were euthanized. The muscles were atrophied and pale, and histologically had variation in muscle fiber diameter and small angular fibers. Large foamy macrophages were present in cerebrospinal fluid, and there were degenerate neurons in the spinal cord leading to a decrease in the number of motor neurons, consistent with LMN diseases. No histological changes were seen in the cerebellum (Anderson et al., 1999). 64 Outcross and backcross breeding trials were consistent with this LMN disease having simple autosomal recessive inheritance. In order to find the causative locus responsible for this disease, we performed a genome-wide homozygosity mapping process and then fine mapped the mutation using a positional candidate gene approach within the largest homogygous genomic region found in the affected sheep. Our goals of this research were to discover the genetic basis of this disease, to help sheep breeders avoid matings between carrier animals, and to provide a potential animal model for gene therapy treatment of human LMN disease. METHODS Animals Approvals dealing with DNA or tissue collection from animals were obtained from the Massey University Animal Ethics Committee (Approval Number 99/135). Iowa State University Animal Care & Use Committee did not require additional approvals. A single Romney putative carrier ram was mated in New Zealand to 130 crossbred Finnish Landrace ewes to produce 115 daughters that were subsequently backcrossed to their sire. The backcrosses produced 5 affected sheep out of 47 lambs in the first lambing, and 11 out of 71 in the second lambing. The observed rate of affected sheep was consistent with the 1 in 8 expectation for a homozygous recessive disorder from this mating scheme. Any affected offspring and their dams now proven to be carriers, formed the basis of the experimental population. High density SNP genotypes were obtained from 19 affected, which included 3 affected lambs from the original Romney flock, and 11 known carrier sheep. Fine mapping using novel SNPs was conducted on 52 Romney or Romney/Finn crosses including the above mentioned 30 ones with clear disease statuses and 22 potential carriers which were 65 phenotypically normal but sired by the carrier ram. Moreover, 85 control sheep consisted of 14 normal Romney sheep, 25 crossbred Texel sheep and 46 crossbred Corriedale sheep were used for validation of the causative mutation. These sheep were produced from unrelated populations without evidence of LMN disease. SNP genotyping DNA was extracted from blood using the QIAamp DNA Blood Mini Kit (QIAGEN, CA, USA) and the Roche MagNA Pure automated analyser (Roche, USA). The genotyping of 54,241 evenly distributed polymorphic SNP markers across the genome was performed using the Ovine SNP50 BeadChip (Illumina, San Diego, CA, USA) at the commercial company GeneSeek Inc. Lincoln, NE, USA. Standard procedures were performed with a PCR and ligation-free protocol, which routinely achieves high average call rates and accuracy. Genome wide homozygosity mapping In order to define the molecular genetic basis for the LMN disease in these Romney sheep, a search for putative homozygous-by-descent (also named identical by descent, IBD) regions was initiated by performing a genome wide scan using an in-house UNIX script as described in our previous publication (Zhao et al., 2011). The SNP genotypes were ordered according to their genomic location as provided in the Illumina manifest, and used to identify regions of consecutive homozygous SNP loci that were common to all affected animals. The same procedure was applied to known carriers to ensure that the identified region was not simply fixed across all animals. A threshold of 10 SNPs defining a candidate consecutive homozygous region was chosen after calculating expected IBD fragment size for 19 affected lambs produced through the outcross and backcross experiments assuming exactly one crossover event occurred every meiosis on the chromosome carrying the causal mutation 66 with uniform probability along the chromosome expressed in linkage units (cM). The homozygous fragment containing the causative mutation would have exceeded 0.5 cM or about 10 consecutive SNP with probability > 95%. The expected size of the IBD region that contains the causative mutation depends in part on the number of meioses that separate the common ancestor from the affected lambs, and the number of affected individuals. The IBD region that included the causative mutation had 23% probability of being shorter than 2 cM and a 32% probability being longer than 5 cM. The mean expected length of the IBD region was about 4 cM. Genes in the identified IBD regions were examined for potential involvement using comparative homology between ovine and bovine genomes as provided through the Ovine Genome Assembly v1.0 (http://www.livestockgenomics.csiro.au/perl/gbrowse.cgi/oar1.0/#search). Positional candidate genes sequencing Primer3 software (version 0.4.0) and the reference sequences from Ovine Genome Assembly v1.0 were used to design PCR primers flanking the coding regions of ovine AGTPBP1 (Supplementary Table S1). Among the many genes in the identified homozygous regions, AGTPBP1 was considered to be the best candidate gene because of its role in neural functions as reported by several publications (Harris et al., 2000; Fernandez-Gonzalez, et al., 2002). PCR was performed using a 10 µl cocktail mixture and a standard program as previously described (Zhao et al., 2011), with an annealing temperature of 58ºC or 60ºC. Negative controls were performed to check for the presence of contaminants in all reactions. PCR products were verified by 1.5% agarose gel electrophoresis, and treated with ExoSAPIT® (Affymetrix, USA), before pooling them from 2 affected and 2 carrier sheep. The pooled PCR products were sequenced commercially using a 3730xl DNA Analyzer (Applied 67 Biosystems, USA). The Sequencher software (version 3.0) was used for sequence alignment and comparison. Mutation analyses Since annotation of the ovine genome has not yet been completed, the bovine AGTPBP1 reference mRNA sequence (GenBank: XM_001252536.3) was used as a query in crossspecies BLAST searches to identify sheep EST. Both the bovine AGTPBP1 mRNA and corresponding ovine EST served as references to search for the coding regions within ovine AGTPBP1 gene. The AGTPBP1 protein sequence was predicted using successfully amplified and re-sequenced exonic regions of the gene from the sheep population, the bovine AGTPBP1 reference mRNA sequence and ovine EST. In order to understand the functional annotation of this protein, the conserved domains within this sheep protein sequence were searched by blasting the complete protein sequence against the Conserved Domains database (database: CDD-40526 PSSMs) in NCBI (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) while setting an option of 500 as the maximum number of hits. Multiple sequence alignments provide a basis for conserved domain models. The specific hits and super families were given for the queried protein. Moreover, the conserved protein residues involved in conserved features such as binding or catalysis were annotated when the query sequence residues were in the Entrez Protein database. The sequence alignments of protein AGTPBP1 from multiple species including human, mouse, chicken, frog, purpuratus, sea squirt, trichoplax, zebrafish and sheep, were displayed to show the conserved feature sites. The fold structure of the conserved domain of AGTPBP1 in sheep (residues 850-1150) and mouse (residues 800-1110) were predicted using the 3D_PSSM program (http://www.sbg.bio.ic.ac.uk/3dpssm/index2.html). The known 68 crystal structures of carboxypeptidases in the PDB database help to generate the predicted structure of AGTPBP1. Sequence alignment was performed on the ClustalW2 server (http://www.ebi.ac.uk/Tools/msa/clustalw2/) among sheep and mouse AGTPBP1, human (PDB ID: 1DTD) carboxypeptidase A2 (CPA2) and cattle (PDB ID: 1HDU) carboxypeptidase A (CPA). This helped us to be more confident about the conserved features of AGTPBP1 secondary structure and the amino acids critical for carboxypeptidase catalysis. A three dimensional macromolecular structure of the conserved domain was searched and downloaded using homologous sequences with confirmed crystal structure (http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml). Each SNP variant was analyzed to check whether it could induce missense or nonsense mutations. The identified SNPs were then genotyped using restriction enzyme digestion (PCR-RFLP: Polymerase Chain Reaction-Restriction Fragment Length Polymorphism) on the affected and carrier sheep as well as on other normal controls. Restriction enzyme digestion (PCR-RFLP) The sequences of PCR primer “LMN_new” (Supplementary Table S1) were 5’ TGTTAGCACCTCCTCTTTGCT3’ and 5’ AGCAGACCCTTGGCATGATA3’. DNA fragments from the PCR reactions were subjected to restriction enzyme digestion with AciI (New England Biolabs Inc., USA). The missense mutation in AGTPBP1 removes a recognition site of the AciI enzyme. The 10 µl system reaction mix contained 1 µl of NEB buffer 3, 0.5 units of AciI and 2 µl of PCR product. The reaction mix was incubated at 37ºC overnight, and the products were analyzed on a 2.0% (w/v) ultra-pure agarose gel. RESULTS Detection of IBD regions 69 The whole genome scan discovered 209 homozygous fragments consisting of 3 or more consecutive SNPs shared by all 19 affected lambs. The size distribution of common homozygous fragments in these lambs is shown (Figure 1). Only one large homozygous segment exceeded the 10-SNP threshold and it consisted of 136 consecutive SNP loci identified in all 19 affected sheep, whereas the 11 carriers exhibited heterozygosity in this segment. The region started from SNP s56017.1 and extended to SNP OAR2_36428027.1. This segment is physically positioned on the small arm of ovine chromosome 2 (OAR 2) from 29,284,415 bp to 36,428,027 bp, covering a region of 7 Mb (Figure 2a and b). We found this segment likely encompassed 36 genes (Figure 2d and Supplementary Table S2) based on the bovine reference sequences. A gene called ATP/GTP binding protein, transcript variant 1 (AGTPBP1), also named Nna1 (Nervous system nuclear protein induced by axotomy protein 1 homolog) was the most plausible candidate due to its involvement in axonal regeneration and its causal relationship with mouse pcd phenotype (Harris et al., 2000; Fernandez-Gonzalez et al., 2002). The ovine AGTPBP1 gene spans about 0.17 Mb and encodes a 1,225-amino-acid intracellular protein that contains a predicted zinc carboxypepetidase domain near its C-terminus (Harris et al., 2000). This gene was initially cloned from regenerating spinal cord neurons of the mouse (supplied by OMIM 606830). Gene coding region re-sequencing and mutation analyses Functional mutations in candidate genes were examined by re-sequencing all available exons using primers based on the adjacent intronic sequences of these genes. The ovine AGTPBP1 gene was predicted to have 25 exons based on the bovine reference gene and protein sequences. All the exons except exons 11, 12 and 17 for this gene were successfully amplified and three exonic SNPs were identified. The first exonic SNP (called c.2909G>C) 70 was located at the 6th bp of exon 21 of the sheep AGTPBP1 gene. The other two were located at the 93rd and 99th bp of exon 22. An additional 17 SNPs were identified in either introns or the 3’ UTR on this gene. These were immediately excluded as causative loci since none were located either in the splicing donor/receptor regions or other important regulatory regions and were not in concordance with sheep phenotypes (data not shown). After comparing our findings with the bovine coding sequence of AGTPBP1, we observed that the first exonic SNP (c.2909G>C) introduced a missense transition (Arg>Pro) at amino acid 970 (p.Arg970Pro) (Figure 2e), and the third exonic SNP (c.3133T>C) also resulted as a missense mutation causing a Pro to Ser substitution (p.Pro1045Ser). The second exonic SNP (c.3126C>T) was a synonymous mutation and did not change any amino acid. A conserved domain named M14_Nna1 (CDD: cd06906) spanning from amino acid residue 859 to 1136 on AGTPBP1 in sheep was predicted in this study using the “CD-search” option available in NCBI. This domain is a zinc-binding carboxypeptidase domain, which hydrolyzes the peptide bonds at the C-terminal part of certain amino acids in polypeptide chains. The sequence alignment among multiple species demonstrated 2 specific feature hits including metallocarboxypeptidase activity and zinc binding. Nine conserved amino acid residues were involved in these two features hits. (Supplementary Figure S1 and Figure S2; GomisRṻth et al., 1999; Aloy et al., 2001). Three conserved residues (His920, Glu923 and His1017) expressed as a motif “HXXE…H” are responsible for zinc binding, and all 9 conserved residues (His920, Glu923, Arg970, Asn979, Arg980, His1017, Gly1018, Met1027 and Glu1102) are important for substrate binding and catalysis of this protein (Supplementary Figure S1 and Figure 3a). In terms of the size and relative positioning of βsheets and α-helices, the overall predicted secondary structures of AGTPBP1 in sheep and 71 mouse were the same as that of other metallocarboxypepdases in human and cattle by using the 3D-PSSM program. Features and conserved amino acids and motifs are located in similar positions. For example, the well-known zinc-binding HXXE…H motif lies between the second β-sheet and the first α-helix as reported (Wang et al., 2006). Six residues (His920, Glu923, Arg970, Asn979, Arg980 and Glu1102) remain conserved among these proteins (Supplementary Figure S3). The first exonic missense mutation (c.2909G>C) identified on exon 21 is the middle nucleotide within a codon “CGC” which was translated and substituted from Arg970 (codon “CGC”), one of the conserved residues responsible for substrate binding and catalysis, to Pro970 (codon “CCC”) (Figure 3b). Molecular modeling based on the crystal structure of the homologous protein, putative carboxypeptidase (PBDID: 3K2K), also demonstrated that Arg970 forms a part of a loop (the short yellow section of the middle loop in Figure 3c), which might be a key site for substrate binding or catalytic activity. The amino acid residue encoded by the exonic missense mutation on exon 22 (c.3133T>C; p.Pro1045Ser) varies considerably among different species (Figure 3b). It indicates that this missense mutation may not affect the function of the protein AGTPBP1 as it is neither involved in zinc binding nor in substrate binding and catalysis. The Arg970Pro mutation was genotyped by PCR-RFLP (Figure 2e). The 611 base pair length of the “LMN_new” PCR fragment amplified from normal sheep had one restriction site, which could be recognized by the restriction enzyme AciI and cut into 2 fragments with lengths of 519 bp, and 92 bp. However, the restriction site was removed due to the Arg970Pro mutation and the 611 bp fragment amplified from the affected lambs was intact after digestion. Therefore, the PCR fragment amplified from sheep with a genotype of “GG” was cut into 2 fragments with lengths of 519 bp, and 92 bp; the one with a genotype of “GC” 72 had 3 fragments with lengths of 611 bp, 519 bp, and 92 bp; the one with a genotype of “CC” had a single fragment with length 611bp. Our genotyping results showed all 19 affected sheep were “CC”, the 11 known carriers were “GC”, and the 22 phenotypically normal but possible carrier sheep were either “GC” or “GG” genotypes. In contrast, 85 control (unrelated) sheep were only “GG” genotypes (Table 1). This mutation was 100% concordant with the recessive pattern of inheritance in affected, carrier, phenotypically normal and unrelated normal individuals. The two exonic SNP (c.3126C>T and c.3133T>C) on exon 22 were genotyped by direct sequencing. However, there is no perfect concordant rate for these two SNPs based on the counts for each of the genotypes (Table 1). At this stage, we excluded these two SNP as causative mutations for this defect due to their genotyping results and mutation analyses. Taken together, our results strongly suggest that the Arg970Pro substitution is the causative locus for LMN disease in Romney sheep and it could act by decreasing the function of protein AGTPBP1 to generate LMN disease in these lambs. DISCUSSION The AGTPBP1 gene encodes a protein which belongs to a member of the cytosolic carboxypeptidase subfamily (M14D subfamily) (Kalinina et al., 2007; Rodriguez de la Vega et al., 2007). This protein plays a role in protein turnover by cleaving proteasome-generated peptides into amino acids (Berezniuk et al., 2010), enzymatically removing gene-encoded glutamic acids or tyrosine from the C terminus of proteins (Rogowski et al., 2010; Kalinina et al., 2007), and also plays a role in neuronal bioenergetics at least in mouse brains (Chakrabarti et al., 2010). Similar to other cytosolic carboxypeptidases, the mouse Nna1 protein is broadly expressed in brain, pituitary, eye, testis and other tissues (Kalinina et al., 2007). Mutations or abnormal expression of Nna1 have been found to cause a Purkinje cell 73 degeneration (pcd) phenotype in mice characterized by degeneration of Purkinje cells, mitral cells of the olfactory bulb, thalamic neurons and retinal photoreceptor cells as well as degeneration of sperm (Fernandez_Gonzalez et al., 2002; Mullen et al., 1976). Twelve alleles in mouse mutants related to Agtpbp1 have been summarized by the Jackson Laboratory (http://www.informatics.jax.org/searches/accession_report.cgi?id=MGI:2159437). Among which, loss-of-function of Agtpbp1 was indicated as being responsible for pcd3J and pcd5J mouse mutants (Fernandez-Gonzalez, et al., 2002; Wang and Morga, 2007; Chakrabarti et al., 2008). Expression of full-length Nna1 successfully rescued Purkinje cell loss and ataxic behavior in pcd3J, and retinal photoreceptor loss and Purkinje cell degeneration in pcd5J mice (Wang et al. 2006; Chakrabarti et al., 2008). In this study, the missense mutation c.2909G>C on exon 21 of AGTPBP1 induces an Arg to Pro substitution (p.Arg970Pro) at amino acid 970 (R970). Based on the crystal structure of human CPA2, it has been reported that arginine 127 (R127) in CPA2 is an electrophile site crucially responsible for the catalytic activity along with glutamate 270 (Mangani et al., 1994; Garcia-Saez et al., 1997; Christianson et al., 1989). Similarly, the homologous arginine residue R127 (numbered according to mature bovine carboxypeptides A1) in C. elegans AGBL4 (AGP/GTP binding like protein 4) has been predicted to stabilize the oxyanion hole in the S1’ site of the carboxypeptidase domain (Rodriguez de la Vega et al., 2007). Sequence comparison and structural modeling of sheep AGTPBP1 with other carboxypeptidases predicted R970 was located in a region known as the substrate-binding pocket and catalytic center of metallocarboxypeptidase as mouse Nna1 (Supplementary Figure S3; Wang et al., 2006). These results enforce the view that R970 in sheep is a very conserved site being critical for its function of catalytic activity. Therefore, we concluded 74 the Arg970Pro substitution could be a candidate causative mutation possibly altering the characteristic of the AGTPBP1 protein and very likely impairing its function of catalytic activity. Comparing to the phenotype of the original pcd mouse model, the affected sheep do not have apparent degeneration of Purkinje cells but show their specific characteristics including degeneration of neurons in ventral horns of the spinal cord and brain stem, Wallerian degeneration of motor nerve fibers and atrophy of skeletal muscles. The phenotype of these sheep is more similar to the symptoms in the Sod1 transgenetic mice (mSOD1), which were constructed as animal models for human ALS patients (Gurney et al., 1994). It is interesting to note that the Lox (lysyl oxidase) gene, encoding a protein playing an important role in formation and repairing of extracellular matrix, was found to be up-regulated in both mSOD1 mice and Agtpbppcd-sid mice (Li et al., 2004; Li et al., 2010; Lucero and Kagan, 2006). The Agtpbppcd-sid mutant was characterized as a deletion of Agtpbp1 exon 7 that produced a stop codon in exon 8. Absence of protein Agtpbp1 will decrease dendrite development of cerebellar Purkinje cells by changing the expression of downstream components, including Lox propeptide and the NF-κB signaling pathway in mice (Li et al., 2010). Therefore, we assume the p.Arg970Pro substitution might impair the catalytic activity of AGTPBP1 and decrease its ability of protein turnover of LOX. The overabundant protein LOX would cause abnormal development of dendrite in LMN and then develop phenotypes shown in these sheep. In order to ascertain that the Arg970Pro substitution was really a causative mutation, measuring mRNA expression of the genes AGTPBP1 and LOX and their enzymatic activities in the spinal cord, brain stem and cerebellar tissues would be desired. It could exclude the 75 possibility that the Arg970Pro mutation in protein AGTPBP1 is only a coincidental finding from the genetic mapping study or this mutation disrupts expression of another gene in the same region of the genome. Nevertheless, a point mutagenesis of mouse Nna1 gene at arginine 962 (R962), the homologous residue of R970 in sheep, to generate transgenic mice and then using the intact full-length Nna1 gene to rescue phenotypes in mice will provide further evidence for the Arg970Pro substitution being a causative mutation in sheep. The missense mutation c.2909G>C (p.Arg970Pro) in AGTPBP1 appears to be a spontaneous one arising from a single sheep family. In order to avoid spreading out this disease throughout the sheep population, a simple genetic test can be easily designed based on our results to identify and remove carriers with the defective c.[2902C] allele, and to decrease the risk of economic losses in the future . Regenerating motor neurons expressing protein NNA1 in human nervous tissues indicated that NNA1 was a key factor for motor neuronal degeneration (Harris et al., 2000). To date, no human cases have been reported with spontaneous mutations in AGTPBP1 that would implicate the gene as a cause for a LMN disease. Neither have they been reported to occur in other large mammalian species. Mice and hamsters are convenient animal models for studies of human neural diseases (Mashimo et al., 2009; Akita et al., 2007). However, there are still significant structural and functional differences between rodent and large mammalian species. Even among rodents themselves, the gene distributions could be different. Hamsters with undetectable Nna1 mRNA in the brain did not exhibit distinctive neurodegenerative features as serious as mice mutants, which could possibly be explained due to the different distributions of Nna1 and other Nna 1-like genes in the brains of mice and hamsters (Kalinina et al., 2007; Akita et al., 2007; Akita and Arai, 2009). Sheep have large brain size 76 and long life span relative to rodents and similar gestation period as humans. Our finding in sheep, a large mammalian species, raises the possibility that human patients with similar etiology may also result from this gene or other genes in the same pathway of neuronal development. Since there is no effective treatment available for motor neuron diseases, the identification of molecular pathogenetic targets will shed light on the development of gene therapy for this type of diseases (Nizzardo et al., 2011). Here we reported that LMN disease originating in a Romney ram was likely caused by a mutation in a gene locus AGTPBP1 (NNA1). The affected sheep could be used as a valuable animal model for gene therapy if human patients are successfully diagnosed in the future. ACKNOWLEDGEMENTS The authors appreciate the financial support provided by State of Iowa and Hatch Funds and the support from the College of Agriculture and Life Sciences at Iowa State University. Animal studies and diagnoses were funded by Massey University. H.T.B. was partially funded by the National Research Centre for Growth and Development. Conflict of interest The authors declare no conflict of interest. Supplementary information Supplementary information is available at Heredity’s website. Data archiving Data deposited in the Dryad repository: doi:10.5061/dryad.8fp2pc88 REFERENCES Akita K, Arai S (2009). The ataxic Syrian hamster: an animal model homologous to the pcd mutant mouse? Cerebellum 8: 202-210. 77 Akita K, Arai S, Ohta T, Hanaya T, Fukuda S (2007). Suppressed Nna1 gene expression in the brain of ataxic Syrian hamsters. J Neurogenet 21: 19-29. Aloy P, Companys V, Vendrell J, Aviles FX, Fricker LD, Coll M et al. (2001). The crystal structure of the inhibitor-complexed carboxypeptidase D domain II and the modeling of regulator carboxypidases. J Biol Chem 276:16177-16184. Anderson PD, Parton KH, Collett MG, Sargison ND, Jolly RD (1999). A lower motor neuron disease in newborn Romney lambs. N Z Vet J 47: 112-114. Berezniuk I, Sironi J, Callaway MB, Castro LM, Hirata IY, Ferro ES et al. (2010). CCP1/Nna1 functions in protein turnover in mouse brain: Implications for cell death in Purkinje cell degeneration mice. Faseb J 24: 1813-1823. Borasio GD, Gelinas DF, Mechanical YN (1998). Ventilation in amyotrophic lateral sclerosis: a cross-cultural perspective. J Neurol 245: Suppl 2:S7-12. Chakrabarti L, Eng J, Martinez RA, Jackson S, Huang J, Possin DE et al. (2008). The zincbinding domain of Nna1 is required to prevent retinal photoreceptor loss and cerebellar ataxia in Purkinje cell degeneration (pcd) mice. Vision Res 48: 1999-2005. Chakrabarti L, Zahra R, Jackson SM, Kazemi-Esfarjani P, Sopher BL, Mason AG et al. (2010). Mitochondrial dysfunction in NnaD mutant flies and Purkinje cell degeneration mice reveals a role for Nna proteins in neuronal bioenergetics. Neuron 66: 835-847. Christianson DW, Mangani S, Shoham G, Lipscomb WN (1989). Binding of Dphenylalanine and D-tyrosine to carboxypeptidase A. J Biol Chem 264: 12849-12853. Dion PA, Daoud H, Rouleau GA (2009). Genetics of motor neuron disorders: new insights into pathogenic mechanisms. Nat Rev Genet 10: 769-782. Fernandez-Gonzalez A, La Spada AR, Treadaway J, Higdon JC, Harris BS, Sidman RL et al. (2002). Purkinje cell degeneration (pcd) phenotypes caused by mutations in the axotomyinduced gene, Nna1. Science 295: 1904-1906. Garcia-Saez I, Reverter D, Vendrell J, Aviles FX, Coll M (1997). The three-dimensional structure of human procarboxypeptidase A2. Deciphering the basis of the inhibition, activation and intrinsic activity of the zymogen. Embo J 16: 6906-6913. Gomis-Rṻth FX, Companys V, Qian Y, Fricker LD, Vendrell J, Aviles FX et al. (1999). Crystal structure of avian carboxypeptidase D domain II: a prototype for the regulatory metallocarboxypeptidase subfamily. EMBO J 18:5817-5826 Gurney ME, Pu H, Chiu AY, Dal Canto MC, Polchow CY, Alexander DD et al. (1994). Motor neuron degeneration in mice that express a human Cu,Zn superoxide dismutase mutation. Science 264:1772-1775. Harris A, Morgan JI, Pecot M, Soumare A, Osborne A, Soares HD (2000). Regenerating motor neurons express Nna1, a novel ATP/GTP-binding protein related to zinc carboxypeptidases. Mol Cell Neurosci 16: 578-596. 78 Kalinina E, Biswas R, Berezniuk I, Hermoso A, Aviles FX, Fricker LD (2007). A novel subfamily of mouse cytosolic carboxypeptidases. Faseb J 21: 836-850. Lee CN (2012). Reviewing evidences on the management of patients with motor neuron disease. Hong Kong Med J 18:48-55. Li J, Gu X, Ma Y, Calicchio ML, Kong D, Teng YD et al. (2010). Nna1 mediates Purkinje cell dendritic development via lysyl oxidase propeptide and NF-kappaB signaling. Neuron 68: 45-60. Li PA, He Q, Cao T, Yong G, Szauter KM, Fong KS et al. (2004). Up-regulation and altered distribution of lysyl oxidase in the central nervous system of mutant SOD1 transgenic mouse model of amyotrophic lateral sclerosis. Brain Res Mol Brain Res 120:115-122. Lucero HA, Kagan HM (2006). Lysyl oxidase: an oxidative enzyme and effector of cell function. Cell Mol Life Sci 63:2304-2316. Mangani S, Ferraroni M, Orioli P (1994). Interaction of carboxypeptidase A with anions: crystal structure of the complex with the HPO42- anion. Inor Chem 33: 3421–3423. Mashimo T, Hadjebi O, Amair-Pinedo F, Tsurumi T, Langa F, Serikawa T et al. (2009). Progressive Purkinje cell degeneration in tambaleante mutant mice is a consequence of a missense mutation in HERC1E3 ubiquitin ligase. PloS Genet 5:e1000784. Mullen RJ, Eicher EM, Sidman RL (1976). Purkinje cell degeneration, a new neurological mutation in the mouse. Proc Natl Acad Sci USA 73: 208-212. Nizzardo M, Simone C, Falcone M, Riboldi G, Rizzo F, Magri F et al. (2012). Research advances in gene therapy approaches for the treatment of amyotrophic lateral sclerosis. Cell Mol Life Sci 69: 1641-1650. Rodriguez de la Vega M, Sevilla RG, Hermoso A, Lorenzo J, Tanco S, Diez A et al. (2007). Nna1-like proteins are active metallocarboxypeptidases of a new and diverse M14 subfamily. Faseb J 21: 851-865. Rogowski K, van Dijk J, Magiera MM, Bosc C, Deloulme JC, Bosson A et al. (2010). A family of protein-deglutamylating enzymes associated with neurodegeneration. Cell 143: 564-578. Rosen DR, Siddique T, Patterson D, Figlewicz DA, Sapp P, Hentati A et al. (1993). Mutations in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis. Nature 362:59-62. Wang T, Morgan JI (2007). The Purkinje cell degeneration (pcd) mouse: an unexpected molecular link between neuronal degeneration and regeneration. Brain Res 1140: 26-40. Wang T, Parris J, Li L, Morgan JI (2006). The carboxypeptidase-like substrate-binding site in Nna1 is essential for the rescue of the Purkinje cell degeneration (pcd) phenotype. Mol Cell Neurosci 33: 200-213. Zhao X, Dittmer KE, Blair HT, Thompson KG, Rothschild MF, Garrick DJ (2011). A novel nonsense mutation in the DMP1 gene identified by a genome-wide association study is responsible for inherited rickets in Corriedale sheep. PLoS One 6: e21739. 79 Table 1. The genotyping results of three exonic SNPs Exon 21 Sheep Breed Romney and Romney-sired crossbreds Texel/Corrieda le a Exon 22 c.2909G>C (p.Arg970Pro) Rate of Concordan C C G ce C G G T T Affected Known carrier Phenotyp e normala Normal 19 0 0 100% 19 0 0 100% 19 0 0 100% 0 13 0 100% 3 10 0 76.9% 10 3 0 23.1% 0 13 9 100% 1 13 6 95.0% 18 2 0 10.0% 0 0 14 100% Normal 0 0 71 100% LMND Status c.3126C>T C T C C Rate of Concordan ce c.3133T>C (p.Pro1045Ser) Rate of Concordan C C T ce C T T These sheep are including phenotypically normal and suspected sheep which were died without clear reasons. Figure 1. The distribution of identical homozygous fragments in LMND affected lambs. Based on the different number of SNPs spanning the homozygous fragment, the counts of identical homozygous fragments in all affected lambs were shown. Only one homozygous fragment spanning 136 SNPs was identified to be common to all 19 affected lambs when the “larger than 10-SNP” threshold was applied. 80 Figure 2. Work flow chart for discovering the mutation locus of low motor neuron disease in Romney sheep. (a) The sheep chromosome 2 (OAR 2). (b) The IBD region containing the causative locus is located on OAR 2 between 29 and 36 Mb. (c) SNP locations on the Ovine SNP50 BeadChip in the IBD region (d) The black boxes show reference genes encompassing the sheep IBD region based on the comparative genomic maps. The gene AGTPBP1 was depicted by a red lined box. (e) Sequence analyses of AGTPBP1 showed a point mutation in exon 21, resulting in an Arg to Pro substitution. This missense mutation is located in a highly conserved residue for the catalytic activity of this protein. The PCR-RFLPs show different patterns for affected and carrier groups. 81 Figure 3 The conserved domain structures of partial protein sequence of AGTPBP1. (a) The conserved domains includes Zinc-binding sites and putative active sites shared by M14_Nna1 like superfamily genes (b) Alignment of predicted sheep protein AGTPBP1 with 8 most dissimilar protein sequences from species of human, mouse, chicken, frog, purpuratus, sea squirt, trichoplax and zebrafish. The residues labeled by hash-marks on the top were known to be involved in substrate binding or catalytic activity. The two missense mutations were labeled on the protein sequence. Upper case amino acids are aligned and a read to blue color scale represents the degree of conservation, with red showing highly conserved residues. Lower case, grey amino acids are unaligned. Dashes indicate variations in sequence length among multiple sequence alignment. (c) Three dimensional molecular modeling of the conserved domain based on the crystal structure of putative carboxypeptidase (PBDID: 3K2K). The short yellow section in the middle loops represents the position of the mutated amino acid residue. 82 Table S1. Primer sequences, annealing temperatures, PCR amplicon information and targeted encoding regions of gene AGTPBP1. Primer name Primer sequence sAGTe1bF sAGTe1bR sAGTe2F sAGTe2R sAGTe3F sAGTe3R sAGTe4F sAGTe4R sAGTe5F sAGTe5R sAGTe6F sAGTe6R sAGTe7F SAGTE7R sAGTe8F sAGTe8R sAGTe9F sAGTe9R sAGTe10F sAGTe10R sAGTe13_1F sAGTe13_1R sAGTe13_2F sAGTe13_2R sAGTe14F sAGTe14R sAGTe15_16F sAGTe15_16R sAGTe18F sAGTe18R sAGTe19F sAGTe19R sAGTe20F sAGTe20R sAGTe21F sAGTe21R LMN_newF LMN_newR sAGTe22F sAGTe22R sAGTe23F sAGTe23R sAGTe25_1F sAGTe25_1R sAGTe25_2F sAGTe25_2R 5’TCAAACATTCTCCTGTGGTTTC3’ 5’TGCTCAACACTCATTTTCTGC3’ 5’CAGGGTAGGATTTTGGCTTG3’ 5’ATGGGGAAAGGAGGACAATC3’ 5’AGCAACATCAGTGGAAGGAAC3’ 5’CTCCCATTCACTGGCTCTTC3’ 5’GCTCAAGTGATTTGCAAGAATTT3’ 5’GGTTTGAGGGACACTGGAAG3’ 5’CTTCCAGTGTCCCTCAAACC3’ 5’CTTCCTTTTTCCATCACATGC3’ 5’CCCGCCTCCTACTCTTTCTT3’ 5’TCACCTTTGAGGGAGTGGTT3’ 5’GTTGCTTTTGGGCAATTCAC3’ 5’AAGGCCCCAAATATCAAAGG3’ 5’TCCTTTGTGAACTCTGGCCTA3’ 5’CATCAATCAACTGCACTAAAAGA3’ 5’TGTTTTCAATTTGACTTTACCAGTG3’ 5’CCTATCATTAACAATAACAGGAAGAAA3’ 5’GCCATATGAGGTTCTTACAAAGAGA3’ 5’GAAGCCAGCAAGAAGCATGT3’ 5’GGTGTAATGAATTTTGTGCTGGT3’ 5’CAACGTAATTCGGTCCAAGG3’ 5’AGCAGCTCGGTGATGAAAGT3’ 5’TGTCCAATGGCCAGGTTTAT3’ 5’TTTACTTGGAGGAAAGTTGTGC3’ 5’AAACTGCAGCATTCATACAAAATC3’ 5’GTGAGCGAGCAATAATGGTG3’ 5’TCCTCTGGATGAACAGACTTCA3’ 5’ACAGAATTTTTGTGTGGATGAAT3’ 5’GAGATTCAATGCTTCTCATTTAAACA3’ 5’TGTTTGAGATCTGGATGCATTT3’ 5’GGCAGCTCCCTCATATTCAA3’ 5’ATGAAATCAAGGTAATGTTAAAGAACT3’ 5’ACACAGGACAGCCACAAAAA3’ 5’TTGGAGGGTCCAGATAGCAC3’ 5’TGCTTTTGAAACTAACATACAGCTT3’ 5’TGTTAGCACCTCCTCTTTGCT3’ 5’AGCAGACCCTTGGCATGATA3’ 5’ACCCTTGGGACAATGATGAA3’ 5’CAGAATAGCAAAATGGCAACA3’ 5’TTTTTATCTTTGCAGACTTTGCCTA3’ 5’TTCATAAGAACCCAAAGCAACC3’ 5’AACCCAAATTACTGCTTGCAT3’ 5’CCAAACAGGTCCAAGATGCT3’ 5’CTTGAACCCTTTGCCATCAC3’ 5’GGTTTCAAGGCAAACTCTGG3’ Annealing temperature (°C) Targeted exon 60 Exon 1 443 60 Exon 2 678 60 Exon 3 445 60 Exon 4 589 60 Exon 5 521 60 Exon 6 407 60 Exon 7 520 60 Exon 8 321 60 Exon 9 410 60 Exon 10 519 60 Exon 13 815 60 Exon 13 502 60 Exon 14 522 60 Exon 15 & 16 502 60 Exon 18 421 60 Exon 19 540 60 Exon 20 633 60 Exon 21 611 58 Exon 21 611 60 Exon 22 400 60 Exon 23 429 60 Exon 25 522 60 Exon 25 Size (bp) 783 Table S2. The list of ovine positional candidate genes based on the bovine reference gene sequences. Bovine Gene Accession No. XM_586318 NM_001038082 XM_590060 NM_001024527 XM_599340 NM_001034597 NM_173947 NM_001101069 NM_173946 XM_001253324 XM_580638 XM_001788193 NM_001076436 XM_877619 NM_001081523 XM_001788164 XM_613307 NM_001076439 NM_001082606 NM_001103310 XM_599250 NM_174316 XM_610259 NM_001046164 XM_613544 NM_001083686 XM_593294 Bovine Ref. Gene Symbol LOC509375 NINJ1 SUSD3 FGD3 IPPK ECM2 OMD IARS OGN NOL8 ZNF484 LOC616883 C8H9orf21 CDC14B HABP4 LOC100140121 ZNF367 HSD17B3 LOC508357 LOC789977 PTCH1 FANCC AP-O (LOC531757) FBP2 DAPK1 CTSL2 GAS1 OAR2:34660860..34718709 OAR2:34725688..34738850 OAR2:34778116..34782313 OAR2:34874490..34921202 XM_590860 NM_001034470 XM_001787825 XM_595216 ZCCHC6 ISCA1 LOC785863 GOLM1 OAR2:34929288..35016263 OAR2:34929660..35016263 OAR2:35074125..35075486 OAR2:35163069..35333039 OAR2:35990235..36232516 XM_868391 XR_042634 XR_028207 XM_001252536 NM_001075225 MAK10 LOC782560 LOC785119 AGTPBP1 NTRK2 Bovine Ref. Gene Name PREDICTED: Bos taurus similar to Ninjurin 1 Bos taurus ninjurin 1 PREDICTED: Bos taurus similar to sushi domain containing 3 Bos taurus FYVE, RhoGEF and PH domain containing 3 PREDICTED: Bos taurus similar to inositol 1,3,4,5,6-pentakisphosphate 2-kinase Bos taurus extracellular matrix protein 2, female organ and adipocyte specific Bos taurus osteomodulin Bos taurus isoleucyl-tRNA synthetase Bos taurus osteoglycin PREDICTED: Bos taurus nucleolar protein 8 PREDICTED: Bos taurus similar to zinc finger protein 484 PREDICTED: Bos taurus similar to zinc finger protein 782 Bos taurus chromosome 9 open reading frame 21 ortholog PREDICTED: Bos taurus similar to CDC14 homolog B, transcript variant 6 Bos taurus hyaluronan binding protein 4 PREDICTED: Bos taurus similar to rCG47065 PREDICTED: Bos taurus similar to zinc finger protein 367, transcript variant 1 Bos taurus hydroxysteroid (17-beta) dehydrogenase 3 Bos taurus chromosome 9 open reading frame 102 ortholog Bos taurus hypothetical LOC789977 PREDICTED: Bos taurus patched homolog 1 (Drosophila) Bos taurus Fanconi anemia, complementation group C PREDICTED: Bos taurus similar to Aminopeptidase O (AP-O) Description: Bos taurus fructose-1,6-bisphosphatase 2 PREDICTED: Bos taurus similar to death-associated protein kinase 1, transcript variant 1 Bos taurus cathepsin L2 Bos taurus similar to growth arrest-specific 1 PREDICTED: Bos taurus similar to zinc finger, CCHC domain containing 6, transcript variant 1 Bos taurus iron-sulfur cluster assembly 1 homolog (S. cerevisiae) PREDICTED: Bos taurus similar to chromosome 9 open reading frame 153 PREDICTED: Bos taurus similar to golgi membrane protein 1 PREDICTED: Bos taurus MAK10 homolog, amino-acid N-acetyltransferase subunit, (S. cerevisiae) PREDICTED: Bos taurus misc_RNA PREDICTED: Bos taurus misc_RNA PREDICTED: Bos taurus similar to AGTPBP1 protein, transcript variant 1 Bos taurus neurotrophic tyrosine kinase, receptor, type 2 83 Ovine Chromosome Position OAR2:29285324..29294958 OAR2:29285341..29296154 OAR2:29338280..29380982 OAR2:29382607..29413924 OAR2:29570653..29607558 OAR2:29648909..29688207 OAR2:29744485..29759383 OAR2:29878441..29959955 OAR2:29762217..29777791 OAR2:29845850..29871806 OAR2:29995325..30004700 OAR2:30316185..30318955 OAR2:30350587..30352681 OAR2:30476264..30592054 OAR2:30610577..30646729 OAR2:30685815..30689928 OAR2:30688798..30713324 OAR2:30790734..30845612 OAR2:31108137..31147487 OAR2:31292157..31295300 OAR2:31555711..31623497 OAR2:31850742..32286795 OAR2:32301777..32654570 OAR2:32877326..32920047 OAR2:33007170..33228793 OAR2:32942781..32949974 OAR2:33971137..33971720 84 Feature sheep_A sheep_N mice frog purpuratus zebrafish human chicken sea squirt trichoplax 859 859 793 853 694 785 859 876 902 218 # # LQKLESahn-PQQIYFRKDVLCETLSGNSCPLVTITAMPESNYYEHI cq----FRNRPYVFLSARVHPGETNASWVMKGT LQKLESahn-PQQIYFRKDVLCETLSGNSCPLVTITAMPESNYYEHI cq----FRNRPYVFLSARVHPGETNASWVMKGT LQKLESahn-PQQIYFRKDVLCETLSGNICPLVTITAMPESNYYEHI cq----FRTRPYIFLSARVHPGETNASWVMKGT LQKLESlhs-PQQIYFRQEVLCETLGGNGCPVITITAMPESNYYEHV yq----FRNRPYIFLTSRVHPGETNASWVMKGT LSKLQAslgeFSDIYYRKQSLCLSLGGNECPVLTITANPTTLDKEGV lq----FRCRRYLFLSARVHPGESNSSYIMKGT LQKLSAlc--TPEIYYRQEDLCETLGGNGCPLLTITAMPESSSDEHI sq----FRSRPVIFLSARVHPGETNSSWVMKGS LQKLESahn-PQQIYFRKDVLCETLSGNSCPLVTITAMPESNYYEHI ch----FRNRPYVFLSARVHPGETNASWVMKGT LQKLESmhn-PQQIYFRQDALCETLGGNICPIVTITAMPESNYYEHI cq----FRNRPYIFLSARVHPGETNASWVMKGT LQNLMSnid-TSTMYVRQQTLCNTLSGNSVPVLTITSMPISDNAGSA ieavheLRSRPYIFLSARVHPGETNASWTMRGT LEHIRNvad-NNSIYFKCQELCLTLNGNVCNLMTITNSPNKERLNDD ky----VPRRPYIFLSARVHPGESNSSWIMKGL 933 933 867 927 769 858 933 950 980 292 Feature sheep_A sheep_N mice frog purpuratus zebrafish human chicken sea squirt trichoplax 934 934 868 928 770 859 934 951 981 293 # ## LEYLMSn------------- nPTAQSLRESYIFKIVPMLNPDGVINGNH PCSLSGEDLNRQWQSPNPDLHPTIYHAKGLL LEYLMSn------------- nPTAQSLRESYIFKIVPMLNPDGV INGNHRCSLSGEDLNRQWQSPNPDLHPTIYHAKGLL LEYLMSn------------- sPTAQSLRESYIFKIVPMLNPDGVINGNHRCSL SGEGLNRQWQSPNPELHPTIYHAKGLL LEFLMGs------------- sATAQSLRESYIFKIVPMLNPDGVINGNHRCSL SGEDLNRQWQNPNSDLHPTIYHTKGLL LKFLMSt------------- hPTAQALREIFIFKIVPMLNPDGVINGSHRCSL SGEDLNRRWLQPIKELHPTIYHTKGLL LEFLMSc------------- sPQAQSLRQSYIFKIMPMLNPDGVINGNHRCSL SGEDLNRQWQNPNAELHPTIYHAKSLL LEYLMSn------------- nPTAQSLRESYIFKIVPMLNPDGVINGNHRCSL SGEDLNRQWQSPSPDLHPTIYHAKGLL LEYLMSs------------- nPSAQSLRESYIFKIIPMLNPDGVINGNHRCSL SGEDLNRQWQNPNPDLHPTIYHAKGLL LKLLLTppasqdqpediriia SIAEDLRKSYIFKIIPMLNPDGVINGHHRCSL SGEDLNRTWLDPNPQFFPTIYHTKGLL LDFITSd------------- dDCAIQLRESYIFKIIPMLNPDGVVNGCHRCSL SGHDLNRCWISPDPRIHPTIYHTKGLI 1000 1000 934 994 836 925 1000 1017 1060 359 Feature sheep_A sheep_N mice frog purpuratus zebrafish human chicken sea squirt trichoplax 1001 1001 935 995 837 926 1001 1018 1061 360 ## # QYLAAVKRLPLVYCDYHGHSRKKNVFMYGC Siketvwhtndnaapcdvvedv GYRTLPKILSHIAPAFCMSSCSFVVEKS QYLAAVKRLPLVYCDYHGHSRKKNVFMYGC Siketvwhtndnaapcdvvedv GYRTLPKILSHIAPAFCMSSCSFVVEKS QYLAAVKRLPLVYCDYHGHSRKKNVFMYGC Siketvwhthdnsascdivegm GYRTLPKILSHIAPAFCMSSCSFVVEKS QYLSAIKRVPLVYCDYHGHSRKKNVFMYGC Siketvwhtnanaascdmvees GYKTLPKVLNQIAPAFSMSSCSFVVEKS QYLKLIERSPLVYCDFHGHSRKKNVFMYGN Saamshspedl-ikfgipsgvsGAKTLPRILSSIAPAFSHSNCSFAVDKG QYLRATGRTPLVFCDYHGHSRKKNVFMYGC Siketmwqssvntstcdlnedl GYRTLPKLLSQM TPAFSLSSCSFVVERS QYLAAVKRLPLVYCDYHGHSRKKNVFMYGC Siketvwhtndnatscdvvedt GYRTLPKILSHIAPAFCMSSCSFVVEKS QYLAAIKRLPLVYCDYHGHSRKKNVFMYGC Aiketmwhtnvntascdlmedp GYRVLPKILSQTAPAFCMGSCSFVVEKS QYLQSIQKAPLVYCDYHGHSRKMNVFMYGC Shkdsaaaget-sttvggpddtGFKTLPKLLNHFAPSFSLKGCSYVVEKS QYMVTIGKSPLIFCDFHGHSRRKNIFTYAC Ypytntn---------hraedqMLRALPRALQEVSPVFSYQLCSFAVEKC 1080 1080 1014 1074 915 1005 1080 1097 1139 430 Feature sheep_A sheep_N mice frog purpuratus zebrafish human chicken sea squirt trichoplax 1081 1081 1015 1075 916 1006 1081 1098 1140 431 # KESTARVVVWREIGV QRSYTMESTLCGCDQGKYK QLQIGTRELEEMGAKFCVGLLR KESTARVVVWREIGV QRSYTMESTLCGCDQGKYK QLQIGTRELEEMGAKFCVGLLR KESTARVVVWREIGV QRSYTMESTLCGCDQGRYK QLQIGTRELEEMGAQFCVGLLR KESTARVVVWKEIGV QRSYTMESTLCGCDQGKYK DLQIGTKELEEMGAKFCVGLLR KESAARVVVWREIGV ARSYTMESTYCGFDQGKYK QLQVTTAMLEEMGQHFCEGLYK KETTARVVVWREIGV QRSYTMESTLCGCDQGKYK QLQIGTAELEEMGSQFCLALLR KESTARVVVWREIGV QRSYTMESTLCGCDQGKYK QLQIGTRELEEMGAKFCVGLLR KESTARVVVWREIGV QRSYTMESTLCGCDQGKYK QLQIGTKELEEMGAKFCVGLLR KVTTARVVVWKEIGV ARSYTMESTYCGCDQGRYR QLQVGTKELEEMGARFVEVLLH KESTARVVLWRELCI LRSYTMESTYCGMDQGVYK QYHINTTALEDMGKDFCRGLLK 1136 1136 1070 1130 971 1061 1136 1153 1195 486 Figure S1. Conserved amino acids predicted by “CD-search”. Alignment of the complete predicted sheep protein AGTPBP1 with 8 most dissimilar protein sequences from species of human, mouse, chicken, frog, purpuratus, sea squirt, trichoplax and zebrafish predicted 2 specific feature hits consisting of 9 conserved amino acid residues. His920, Glu923 and His1017 labeled by hash-marks on the top consist as the zinc binding motif” HXXE…H”, the conserved feature 1. The conserved feature 2 represents metallocarboxypeptidase active sites consisting of all 9 conserved residues including zinc binding residues. These residues include His920, Glu923, Arg970, Asn979, Arg980, His1017, Gly1018, Met1027 and Glu1102 which are labeled by hashmarks on the top. 85 sheep MSKLKVI PEKSLANNSRI VGLLAQLEKI NTESAESDTARYVTAKI LHLAQSQEKTRREMT 60 cat t l e MSKLKVI PEKSLTNNSRI VGLLAQLEKI NTESTESDTARYVTSKI LHLAQSQEKTRREMT 60 human MSKLKVI PEKSLTNNSRI VGLLAQLEKI NAEPSESDTARYVTSKI LHLAQSQEKTRREMT 60 mouse MSKLKVVGEKSLTNSSRVVGLLAQLEKI NTDSTESDTARYVTSKI LHLAQSQEKTRREMT 60 ** ** * *: *** *: * . ** : * ** ** *** ** *: : . : ** ** ** ** *: * *** *** *** *** ** * * sheep AKGSTGMEVLLSTLENTKDLQTTLNI LSI LVELVSAGGGRRASFLVSKGGSQI LLQLLMN 120 cat t l e AKGSTGMEVLLSTLENTKDLQTTLNI LSI LVELVSAGGGRRASFLVSKGGSQI LLQLLMN 120 human AKGSTGMEI LLSTLENTKDLQTTLNI LSI LVELVSAGGGRRVSFLVTKGGSQI LLQLLMN 120 mouse TKGSTGMEVLLSTLENTKDLQTVLNI LSI LI ELVSSGGGRRASFLVAKGGSQI LLQLLMN 120 : * ** * *** : ** ** * *** ** ** *. *** ** ** : * *** : * ** ** . ** **: *** *** *** ** * * sheep ASKESPLNEELMVQI HSI LAKI GPKDKKFGVKARI NGALNI TLNLVKQNLQNHRLVLPCL 180 cat t l e ASKESPLNEELMVQI HSI LAKI GPKDKKFGVKARI NGALNI TLNLVKQNLQNHRLVLPCL 180 human ASKESPPHEDLMVQI HSI LAKI GPKDKKFGVKARI NGALNI TLNLVKQNLQNHRLVLPCL 180 mouse ASKDSPPHEEVMVQTHSI LAKI GPKDKKFGVKARVNGALTVTLNLVKQHFQNYRLVLPCL 180 ** *: * * : *: : ** * ** ** ** ** *** ** ** ** **: ** ** . : *** *** *: : **: *** ** * * sheep QLLRVYSANSVNSVSLGKNGVVELMFKI I GPFSKKNSSLMKVALDTLAALLKSKTNARRA 240 cat t l e QLLRVYSANSVNSVSLGKNGVVELMFKI I GPFSKKNSNLMKVALDTLAALLKSKTNARRA 240 human QLLRVYSANSVNSVSLGKNGVVELMFKI I GPFSKKNSSLI KVALDTLAALLKSKTNARRA 240 mouse QLLRVYSTNSVNSVSLGKNGVVELMFKI I GPFSKKNSGLMKVALDTLAALLKSKTNARRA 240 ** ** * **: *** ** * *** ** ** ** *** ** ** ** *** ** . * : * *** * ** ** ** ** *** ** ** sheep VDRGYVQVLLTI YVDWHRHDNRHRNMLI RKGI LQSLKSVTNI KLGRKAFI DANGMKI LYN 300 cat t l e VDRGYVQVLLTI YVDWHRHDNRHRNMLI RKGI LQSLKSVTNI KLGRKAFI DANGMKI LYN 300 human VDRGYVQVLLTI YVDWHRHDNRHRNMLI RKGI LQSLKSVTNI KLGRKAFI DANGMKI LYN 300 mouse VDRGYVQVLLTI YVDWHRHDNRHRNMLI RKGI LQSLKSVTNI KLGRKAFI DANGMKI LYN 300 ** ** * *** *** ** * *** ** ** ** *** ** ** ** *** ** ** ** *** *** *** *** *** ** * * sheep TSQECLAVRTLDPLVNTSSLI MRKCFPKNRLPLPTI KSSFHFQLPVI PVTGPVAQLYSLP 360 cat t l e TSQECLAVRTLDPLVNTSSLI MRKCFPKNRLPLSTI KSSFHFQLPVI PVTGPVAQLYSLP 360 human TS- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - QLPVI PVTGPVAQLYSLP 320 mouse TSQECLAVRTLDPLVNTSSLI MRKCFPKNRLPLPTI KSSFHFQLPI I PVTGPVAQLYSLP 360 ** sheep ** *: * ** ** ** *** *** * PEVDDVVDESDDNDDI DLEAENETENEDDLDQNFKNDDI ETDI NKLKPQQEPGRTI EELK 420 cat t l e PEVDDVVDESDDNDDI DLEAENETENEDDLDQNFKNDDI ETDI NKLKPQQEPGRTI EELK 420 human PEVDDVVDESDDNDDI DVEAENETENEDDLDQNFKNDDI ETDI NKLKPQQEPGRTI EDLK 380 mouse PEVDDVVDESDDNDDI DLEVENELENEDDLDQSFKNDDI ETDI NKLRPQQVPGRTI EELK 420 ** ** * *** *** ** * *** : * . * ** ** ** ** ** . ** ** ** ** *** **: *** ** *** *: * * Figure S2. Characteristics of protein AGTPBP1. Letters highlighted in green denotes the aspartic acid rich domain; in pink corresponds to the ATP binding domain; in red for zinc-binding site; in blue for the carboxypeptidase catalytic site and substrate binding sites. The entire enzymatic domain is indicated by bold lettering. 86 AGTPBP1 200 400 600 800 1000 Agtpbp1_mouse AGTPBP1_sheep CPA2_human CPA_cattle ICPLVTITAMPESNYYEHICQFR-TRPYIFLSARVHPGETNASWVMKGTLEYLMS---NS SCPLVTITAMPESNYYEHICQFR-NRPYVFLSARVHPGETNASWVMKGTLEYLMS---NN PMNVLKF------------STGG-DKPAIWLDAGIHAREWVTQATALWTANKIVSDYGKD PIYVLKF------------STGGSNRPAIWIDLGIHSREWITQATGVWFAKKFTENYGQN Agtpbp1_mouse AGTPBP1_sheep CPA2_human CPA_cattle * PTAQSLRESYIFKIVPMLNPDGV-----INGNHRCSLS---------GEGLNRQW----Q PTAQSLRESYIFKIVPMLNPDGV-----INGNHRCSLS---------GEDLNRQW----Q PSITSILDALDIFLLPVTNPDGYVFSQTKNRMWRKTRSKVSAGSLCVGVDPNRNWDAGFG PSFTAILDSMDIFLEIVTNPNGFAFTHSENRLWRKTRSKVTSSSLCVGVDANRNWDAGFG Agtpbp1_mouse AGTPBP1_sheep CPA2_human CPA_cattle SP----NP---ELH-------PTIYHAKGLLQYLAAVKRLPLVYCDYHGHSRKKNVFMYG SP----NP---DLH-------PTIYHAKGLLQYLAAVKRLPLVYCDYHGHSRKKNVFMYG GPGASSNPCSDSYHGPSANSEVEVKSIVDFIKSHG----KVKAFIILH--S-YSQLLMFP KAGASSSPCSETYHGKYANSEVEVKSIVDFVKNHG----NFKAFLSIH--S-YSQLLLYP Agtpbp1_mouse AGTPBP1_sheep CPA2_human CPA_cattle CSIKETVWHTHDNSASCDIVEGMGYRTLPKILSHIAPAFCMSSCSFVVEKSKESTARVVV CSIKETVWHTNDNAASCDVVEDVGYRTLPKILSHIAPAFCMSSCSFVVEKSKESTARVVV YGYKCT--KLDD-FDELSEVAQKAAQSLSRLHGTKY--KVGPICSVIYQASGGSIDWSYD YGYT-T-QSIPD-KTELNQVAKSAVAALKSLYGTSY--KYGSIITTIYQASGGSIDWSYN Agtpbp1_mouse AGTPBP1_sheep CPA2_human CPA_cattle WREIGVQRSYTMESTLCGCDQGRYKGLQIGTRELEEMGAQFCVGLLRLKRLTSSLEYNLP WREIGVQRSYTMESTLCGCDQGKYKGLQIGTRELEEMGAKFCVGLLRLKRLTSPLEYNLP Y---GIKYSFAFELR----DTGRYG-FLLPARQILPTAEETWLGLKAIMEHVRDHPY--Q---GIKYSFTFELR----DTGRYG-FLLPASQIIPTAQETWLGVLTIMEHTVNN----- Figure S3. Secondary structure modeling of the AGTPBP1 protein indicates critical amino acid residues. The fold structure of the carboxypeptidase (CP) domain (orange box in upper panel) of the AGTPBP1 protein in sheep and mouse were predicted using the 3D_PSSM software. The sequences of CPA2-human (PDB ID: 1DTD) and CPA_cattle (PDB ID: 1HDU) which have known tertiary structures were also included in the alignment. Green highlighting indicates β-sheet and blue highlighting indicates α-helix. The conserved amino acids being critical for substrate binding and catalysis are in red color (His920, Glu923, Arg970, Asn979, Arg980 and Glu1102). The location of arginine 970 in sheep AGTPBP1 gene is marked with an asterisk on its top. The numbering is based on the sheep AGTPBP1 protein. 87 CHAPTER 4. IN A SHAKE OF A LAMB’S TAIL: USING GENOMICS TO UNRAVEL A CAUSE OF CHONDRODYSPLASIA IN TEXEL SHEEP A paper published in Animal Genetics 2012;43 Suppl 1:9-18. Xia Zhao1, Suneel K Onteru1, Susan Piripi2,3, Keith G Thompson2, Hugh T Blair2, Dorian J Garrick1,2, Max F Rothschild1 1 Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, IA 50011, USA 2 Institute of Veterinary, Animal & Biomedical Sciences, Massey University, Palmerston North 4474, New Zealand 3 College of Veterinary Medicine, Oregon State University, Corvallis, OR 97331, USA Running title: the SLC13A1 gene responsible for Texel chondrodysplasia Keywords: deletion; SLC13A1 gene; chondrodysplasia; Texel sheep; genetic disease SUMMARY Chondrodysplasia in Texel sheep is a recessively inherited disorder characterized by dwarfism and angular deformities of the forelimbs. A genome-wide association study using the Illumina OvineSNP50 BeadChip on 15 sheep diagnosed as affected and 8 carriers descended from 3 affected rams was conducted to uncover the genetic cause. A homozygous region of 25 consecutive SNP loci was identified in all affected sheep, covering a region of 1Mbp on ovine chromosome 4. Seven positional candidate genes including solute carrier 88 family 13, member 1 (SLC13A1) were identified and used to search for new single nucleotide polymorphisms (SNPs) for fine mapping of the causal locus. The SLC13A1 gene, encoding a sodium/sulfate transporter, was the primary candidate gene due to similar phenotypes observed in the Slc13a1_knockout mouse model. We discovered a 1-bp deletion of T (g.25513delT) at 107 bp position of exon 3 in the SLC13A1 gene. Genotyping by direct sequencing and restriction fragment length polymorphism analysis for this mutation showed that, all 15 affected sheep were “g.25513delT/g.25513delT”; the 8 carriers were “g.25513delT/T” and 54 normal controls were “T/T”. The mutation g.25513delT shifts the open reading frame of SLC13A1 to introduce a stop codon and truncate C-terminal amino acids. It was concluded that the g.25513delT mutation in the SLC13A1 gene was responsible for the chondrodysplasia seen in these Texel sheep. The knowledge can be used to identify carriers with the defective g.[25513delT] allele in order to avoid at-risk matings to improve animal welfare and decrease economic losses. INTRODUCTION Nearly 30 years have passed since the seminal research of Soller and colleagues (Soller & Beckman 1983; Beckman & Soller 1983) which described the use of genetic markers for genetic improvement of crops and livestock. Considerable advancements have allowed animal geneticists to springboard from these original marker papers to the development of genetic maps, quantitative trait loci analyses and on to genome sequencing and resultant development of high-density panels of single nucleotide polymorphisms (SNPs). These recent tools greatly facilitate the exploration of genetic abnormalities in livestock species. 89 Chondrodysplasia is a general term for diseases of development of cartilage that are present in human beings and many animal species. Other terms like achondroplasia, achondrogenesis, hypochondrogenesis and chondrodystrophy have been used to describe various forms of chondrodysplasia in published clinical and research literature (Thompson et al. 2005; Martínez et al. 2000; Superti_Furga et al. 1999; Zankl et al. 2008; Hastbacka et al. 1999). Chondrodysplasia is primarily caused by defective genes affecting normal chondrogenesis and cartilage development; these display phenotypically as abnormal shape and structure of the skeleton. Mutations in several genes related to the synthesis of collagen or proteoglycans, or the signal-transduction mechanisms have been reported to be involved in inherited chondrodysplasia. Three forms of chondrodysplasia in humans including achondrodrogenesis 1B, atelosteogenesis type II and diastrophic dysplasia are due to mutations in the solute carrier family 26, member 2 (SLC26A2) gene, which encodes a sulfate transporter protein (DTDST). The impaired sulfate uptake affected the normal development of cartilage (Superti_Furga et al. 1999; Hastbacka et al. 1996; Hastbacka et al. 1999). A nonsense mutation of the collagen, type X, alpha 1 (COL10A1) gene has been found to be responsible for Schmid type metaphyseal chondrodysplasia (MCDS) in a human patient (Woelfle et al. 2011). A multibreed association study in domestic dogs demonstrated that an evolutionarily recently acquired fgf4 retrogene, an extra copy of the fibroblast growth factor 4 gene (FGF4), was strongly associated with chondrodysplasia (Parker et al. 2009). The most common inherited chondrodysplasia of sheep is the “spider lamb syndrome” (SLS) in the Suffolk breed. The mutation responsible for the disease is a non-synonymous T > A 90 transversion on the highly conserved tyrosine kinase II domain of the fibroblast growth factor receptor 3 (FGFR3) gene (Cockett et al. 1999; Beever et al. 2006). Chondrodysplasia in Texel sheep has been determined by outcross and backcross experiments to be a simple autosomal recessive disorder (Byrne et al. 2008). It was distinguished from SLS and other chondrodysplasias of sheep by gross phenotypic observations and microscopic analysis (Thompson et al. 2005; Piripi 2008). Chondrodysplasia in Texel sheep is phenotypically characterized by disproportionate dwarfism, a stocky appearance with shortened neck and limbs, bilateral varus deformities of the forelimbs, flexion of the carpal joints, a barrel-shaped chest and wide-based stance (Fig. 1A &1B). Consistent lesions were observed grossly and microscopically in many cartilaginous tissues especially in hyaline cartilage. The marked narrowing of the tracheal lumen (Fig. 1C) in the affected lambs could result in sudden death due to tracheal collapse. Immunohistochemistry of cartilage in Texel chondrodysplasia demonstrated that both types II and VI collagen were distributed abnormally in the matrix, suggesting that the extracellular assembly of these collagen types was disturbed. Moreover, a significantly reduced level of chondroitin 4- sulfate was observed using capillary electrophoresis in chondrodysplastic sheep compared to age-matched controls (Piripi et al. 2011). Most cases were able to be diagnosed at 14 days of age, but all those showing a characteristic chondrodysplastic appearance by approximately 4 months of age were diagnosed as affected. Due to the similar sulfation pattern with achondrogenesis 1B and diastrophic dysplasia in humans, the ovine SLC26A2 gene was previously considered as a prime candidate gene for this disease, but no mutations were identified in 85.4% of exonic DNA sequenced (Byrne, 91 2005). That initial investigation was followed by targeted linkage disequilibrium (LD) mapping to locate the causative mutation for this disorder. However, it again failed since the microsatellite markers were selected only from ovine chromosomes 1, 5, 6, 13, and 22 based on the limited known locations of functional candidate genes or their linkage to chondrodysplastic diseases in other species (Piripi 2008). In order to understand the genetic basis for chondrodysplasia in Texel sheep, we recently performed a homozygosity mapping through whole genome analysis using the Illumina OvineSNP50 BeadChip to identify the genomic region containing the mutation, and then identified the disease-causing mutation by direct examination of the positional candidate genes. MATERIALS AND METHODS Ethics statement Approvals dealing with DNA or tissue collection from animals were obtained from the Massey University Animal Ethics Committee (APPROVAL NUMBER 05/123). Iowa State University Animal Care & Use Committee did not require additional approvals. Animals Three related affected Texel rams were out-crossed to 221 unrelated phenotypically normal mixed-breed ewes. Then 123 F1 hogget daughters were backcrossed to the same 3 rams and these matings produced 46 surviving back-cross lambs. In another mating trial 22 mixedbreed ewes, which either had chondrodysplasia or had previously produced chondrodyplastic offspring, were mated to the 3 affected Texel rams and produced 26 lambs. The 3 affected rams and the 22 ewes with chondrodysplasia background were purchased from the Southland 92 property in New Zealand which was the original source for this disease. A total of 23 lambs were genotyped using the OvineSNP50 BeadChip for genome wide homozygosity mapping and the following fine mapping experiments. These lambs included 11 affected (5 males and 6 females) and 7 presumed carriers (4 males and 3 females) chosen from the back-cross lambs, and 4 affected females with 1 presumed male carrier chosen from the lambs produced in the second mating trial. The pedigree information for these 23 lambs is provided (Fig. S1). Fine mapping also included 54 Romney control sheep from an unrelated population without evidence of this inherited chondrodysplasia. DNA extraction DNA was extracted from samples of spleen collected during necropsy. Most samples were extracted using the Wizard genomic DNA extraction kit (Promega, Madison, WI, USA). DNA for an additional 2 samples were extracted with a GenRlute mammalian genomic DNA miniprep kit (Sigma_Aldrich Corp., St. Louis, MO, USA), and a high pure PCR template preparation kit (Roche Diagnostics, Mannheim, Germany). The DNA extractions were made following standard procedures for each of the kits. Genome wide homozygosity mapping DNA samples from the 23 Texel crossbred sheep were genotyped using the Ovine SNP50 BeadChip (Illumina, San Diego, CA, USA) with standard procedures at GeneSeek Inc. Lincoln, NE, USA. The homozygosity mapping was conducted using the same in-house IBD (identical by descent) approach implemented using command line UNIX tools as described previously (Zhao et al. 2011). This approach identified common homozygous segments in the affected lambs. Among those segments, it was presumed that one would contain the 93 mutation associated with chondrodysplasia in Texel sheep. A list of positional candidate genes were obtained when the targeted IBD segments were identified using comparative maps between ovine and bovine genomes accessed through the Ovine Genome Assembly v1.0. Copy number variation (CNV) analysis The CNV analysis based on the whole genome SNP chip genotyping data was carried out with the CNV partition software, a plug-in within the GenomeStudio program (V1.0 Illumina). The CNV partition software helped to count the copies of probes for each SNP on the ovineSNP50 BeadChip within each individual. Three files including the sample sheet, the intensity and the manifest files were required for data evaluations in this program. Software parameters were kept at the developers’ default values. The CNV value and confidence for chromosomal regions in each sample were computed. An estimated copy number for each allele was represented by the CNV value. Viewing the results of CNV analyses was accomplished by the heat map produced using the CNV partition software. Each column in the heat map represents the copy numbers of SNP loci across the genome for one individual animal. New genetic variants discovery In order to amplify exons and adjacent genomic sequences for the sheep positional candidate genes located in the homozygous region, primers (IDT, Coralville, IA, USA) were designed using the online Primer3 tool (Rozen & Skaletsky 2000) targeting the sheep genomic sequences which were downloaded from the Ovine Genome Assembly v1.0 (Table S1). PCR was performed using a 10 µl system, which included 1x GoTaq reaction buffer 94 containing 1.5 mM Mg2+ (Promega, Madison, WI, USA), 0.125 mM of each dNTPs, 2.5 pM of each primer, 12.5 ng of genomic DNA, and 0.25 units of GoTaq DNA polymerase (Promega, Madison, WI, USA). The PCR program was 35 cycles of 30 s at 94°, 30 s at the corresponding annealing temperature, and 30 s at 72°C with additional 3 min denaturation in the first cycle and 5 min extension in the last cycle. In all reactions, negative controls were performed to check for the presence of contaminants. PCR products were verified by 1.5% agarose gel electrophoresis, and treated with ExoSAP-IT®. The PCR pools from 2 affected sheep, 2 carriers and 2 normal controls, were used for commercial re-sequencing using the 3730xl DNA Analyzer. The sequences were aligned and compared to discover new genetic variants using Sequencher software version 3.0. Mutation analyses The cattle reference SLC13A1 mRNA (GenBank: NM_001193219) was used as a query in cross-species BLAST searches to identify sheep ESTs. The sheep re-sequencing results using templates from the affected, carrier and normal control sheep along with identified sheep ESTs were used to predict a complete ovine SLC13A1 mRNA sequence. The ovine mRNA sequence was then translated into a protein sequence by applying the ExPASy translate tool. The predicted sheep SLC13A1 protein sequence was compared to proteins in other species including cattle (Bos Taurus, GenBank: DAA30458), human (Homo sapiens, GenBank: NP_071889), mouse (Mus musculus, GenBank: AAH22672), rat (Rattus norvegicus, GenBank: NP_113839), dog (Canis lupus familiaris, GenBank: XP_539548) and Zebrafish (Danio rerio, GenBank: NP_954975) using the CLUSTALW alignment tool to check the completeness of the sheep protein sequence and the conserved regions. The 95 topology prediction of transmembrane domains (TMDs) was undertaken using the online tool TopPred 0.01. The upper and lower cutoff values were set as 1 and 0.6. Each genetic variant was analyzed to check whether it was a missense, or nonsense mutation or whether it could shift the open reading frame of the candidate gene. Every suspected genetic variant was genotyped on all the sheep samples to check whether it was concordant with the recessive inheritance pattern of the disease. Mutation “g.25513delT” genotyping Direct sequencing and PCR restriction enzyme digestion (PCR-RFLP) were carried out to genotype the suspected causative mutation “g.25513delT” on all Texel crossbred sheep with clear status of chondrodysplasia as well as 54 Romney control sheep. The restriction enzyme digestion procedures were as follows: DNA fragments from PCR reactions were subjected to restriction enzyme digestion with HpyAV (New England Biolabs Inc., USA). The 10 µl system reaction mix contained 1 µl of NEB buffer 4, 3 units of HpyAV (1.5 µl), 0.1 µl of BSA (100X) and 3 µl of PCR products. Then the reaction mix was incubated at 37º C overnight, and the products were analyzed on a 2.5% (w/v) ultra-pure agarose gel. RESULTS Identity by descent region The IBD analysis narrowed the defective region down to a single 1 Mb homozygous segment consisting of 25 consecutive SNP loci on sheep chromosome 4 (OAR 4) in all 15 affected sheep, by using a larger than 10-SNP threshold in our IBD approach (Fig. S2). This region was heterozygous for 7 presumed carriers but not for lamb 14-04. Lamb 14-04 was phenotypically normal but produced affected offspring. It was considered as a carrier, but 96 was homozygous in the 1 Mb segment with the same haplotype as affected sheep. The 25SNP homozygous segment was defined by SNP OAR4_92447165.1 to SNP OAR4_93438733.1, spanned a chromosomal region between 92.45 Mb and 93.44 Mb (Fig. 2A & 2B). It encompassed 7 genes based on the cow reference sequences including Ca2+ dependent activator protein for secretion 2 (CADPS2) (OAR4:92155977..92738612), solute carrier family 13 member 1 (SLC13A1) (OAR4:92888272..92993056), IQ motif and ubiquitin domain containing (IQUB) (OAR4:93209447..93265179), ring finger protein 148 (RNF148) (OAR4:92554935..92556138), ankyrin repeat and SOCS box-containing 15 (ASB15) (OAR4:93343952..93353409), (LMOD2)(OAR4:93370342..93377214) and leiomodin Wiskott-Aldrich 2 (cardiac) syndrome-like (WASL)(OAR4:93431017..93481693) (Fig. 2C). The SLC13A1 gene, spanning over 79 kb in length, was the most plausible candidate due to its biological functions and previous reports of its involvement with chondrodysplasia in other species. Most of the subsequent effort then focused on this gene in further mutation analyses. CNV analysis The heat map from CNV analyses showed copy numbers of SNP loci for each individual sheep. No CNVs were discovered that were associated with chondrodysplasia in Texel sheep. There were only two loci with CNV values from 0 to 0.5 on OAR1 and OAR3, respectively (the dark red horizontal bars in Fig. S3). However, the 2 CNVs with rare copy numbers showed up mostly among the first 12 animals on the heat map, and did not occur specifically in either the affected group or the carrier group (Fig. S3). Therefore, the CNV analysis was not pursued further. 97 Mutation analysis Two corresponding ovine EST entries (GenBank: FE030665.1, FE030666.1) were found using the cattle reference SLC13A1 mRNA as a query. The putative ovine genomic structure was determined by alignments of the 2 ovine ESTs, the complete bovine SLC13A1 mRNA and the human SLC13A1 mRNA. Even though the human SLC13A1 gene contains 15 exons and spanned 83 kb in length (Lee et al. 2000), the evidence from the alignments with the bovine homolog showed the presence of exon 16 in both bovine and ovine SLC13A1 mRNA, in which exon 16 only contained a long 3’ un-translated region. The “GT-AG” splice site groups in the ovine SLC13A1 gene support this genomic structure. The translated protein length ranged from 583 to 597 amino acids in the 7 species. In sheep and cattle, it is 597 amino acids long, 2 amino acids longer than in humans. There are many conserved regions in this protein shared by the 7 species (Fig. S4). The topology prediction showed the predicted SLC13A1 protein was comprised of 13 transmembrane domains (TMDs) in sheep. Each peak above the upper cut-off value represents a TMD (Fig. 3). Re-sequencing all available exons and the adjacent introns of the SLC13A1 gene identified a mutation potentially responsible for this disease, with one base pair deletion T (JN108880:g.25513delT) at the 107 bp position of exon 3 (148 bp on the “sRFLPdelT” fragment, Table S1) by comparing affected, carriers and normal controls. This 1-bp deletion was 3,751 bp far away from the nearest published SNP OAR4_92963794.1 on one side and 27,654 bp from SNP s11336.1 on another side. An additional 11 novel SNPs in this candidate gene and 5 genetic variants in the other 6 candidate genes were identified to be either intronic or synonymous, and were excluded as mutations for this defect (Table S1). 98 The g.25513delT deletion on SLC13A1 gene changed the open reading frame and induced a nonsense transition (p.Val113Stop) at amino acid 113on the predicted ovine SLC13A1 protein sequence (Fig 2D). Genotyping of mutation g.25513delT Genotyping all the sheep including unrelated controls by direct sequencing showed that all the affected Texel sheep were homozygous with the deletion of “T” (g.25513delT/g.25513delT) on the exon 3 of SLC13A1 gene, the 8 carriers including lamb 14-04 were heterozygous with a single “T” allele (g.25513delT/T) and the pooled normal controls were all homozygous with “T” allele (T/T) (Fig. 4). The g.25513delT mutation created a HpyAV restriction endonuclease recognizing site for specific cuttings on the PCR fragment “sRFLPdelT”, which was used to distinguish the affected, carriers and normal controls. In the PCR-RFLP experiment, PCR fragment “sRFLPdelT” with a “g.25513delT/g.25513delT” genotype was cut into 2 fragments with the lengths of 192bp and 156bp; the one with a “g.25513delT/T” genotype was cut into 4 fragments with lengths of 349bp, 348bp, 192bp, and 156bp; and the one with a “T/T” genotype could not be cut and kept the original fragment size with a length of 349bp. The PCR-RFLP genotyping results were consistent with the results deduced from direct sequencing. This g.25513delT mutation was the only polymorphism 100% concordant with the recessive pattern of inheritance in affected, carriers and unrelated normal individuals (Table 1). Taken together, our results strongly support the conclusion that the deletion “T” on exon 3 of SLC13A1 gene is the causative mutation for inherited chondrodysplasia in Texel sheep. 99 DISCUSSION The use of genetic markers to either develop a genetic map or for QTL analysis has come a long way since early reports suggesting their use (Soller & Beckmann, 1983). However, with the advent of the ovine genome sequence and a high density SNP chip, the tools for analyses of quantitative inherited traits and genetic disorders have improved markedly. These tools, combined with comparative genomic approaches to determine useful candidate genes (Rothschild & Soller, 1997), make it possible to find genetic causes of both quantitative traits and monogenic simple recessive disorders in a relatively short time. Previous reports showed that the mutations in the SLC26A2 gene were responsible for human achondrogenesis. The structural, microscopic, and biochemical analyses on chondrodysplasia in Texel sheep suggested a similar pathogenic mechanism involved in this defect. The genes called solute carriers, encoding membrane transport proteins, especially the sulfate transporters, are prime candidate genes. However, there are almost 300 members for solute carrier genes distributed throughout the human genome and these are organized into 51 gene families (Hediger et al. 2004). Both the low density microsatellite - based linkage mapping and the direct candidate gene method were not suitable to handle these data directly and were incapable of localizing the mutations underlying this Mendelian disease. The recent development of the ovine SNP chip containing more than 50,000 SNP markers well-distributed across the sheep genome has become the best and most efficient tool for the genome-wide dissection of likely mutant loci (Becker et al. 2010). The rationale for this is as follows. Provided a disease is recessive, accurately diagnosed when present, and the defective allele is non-existent in that part of the population not descended from the founder, then regardless of the level of penetrance the affected animals must all be identical by 100 descent at the causative mutation and most likely in the surrounding area. The expected size of the IBD region that contains the causative mutation will depend in part upon the number of meioses that separate the common ancestor from the affected individual, and the number of affected individuals. In our mapping strategy, a homozygous fragment spanning larger than 10 SNPs was chosen as a threshold for valid candidate regions that needed to be examined (Zhao et al. 2011). A 10-SNP interval on the sex-average map (Sheep Best Positions Linkage Map Version 4.7, http://rubens.its.unimelb.edu.au/~jillm/jill.htm) is about 0.64 cM. The probability that a 0.64 cM region would be inherited intact is 1-0.0064=0.9936, and that this would occur in all three meioses that separate a founder and a back-crossed affected individual is 0.9936^3=0.9810. The 10-SNP region will only be IBD if it was inherited intact in all 11 backcross offspring, that outcome having a probability of 0.9810^11 =0.81. This value is large enough to give us confidence that the causative mutation would be contained in regions of larger than 10 consecutive SNP and that we should examine those regions first as they had the highest likelihood of carrying the mutation. In this study, only one such IBD fragment was detected, which comprised 25 SNP loci and the causative mutation was right located on this fragment. The results from our current and previous studies (Zhao et al. 2011) show our genome wide homozygosity mapping strategy has enough power to detect causative loci for rare recessive genetic diseases where consanguineous matings are available. The candidate gene approach (Rothschild & Soller, 1997) can act like a bridge between the targeted genomic regions and the causative mutations. This helps to verify and find the mutation. Candidate gene analysis can be accomplished by direct sequencing or PCR-RFLP tests. The RFLP method was first described by Dr. Soller and colleagues in 1983 (Soller & Beckmann, 1983; Beckmann & Soller, 1983). It has been 101 successfully applied for about 30 years for a variety of polymorphism analyses including the SNP analysis associated with many interesting traits (Rothschild et al. 1997; Zhao et al. 2009; Zhao et al. 2010, etc.). The candidate gene approach using RFLP tests is usually more convenient and historically more economic than using direct sequencing. Our work provides a successful example for combining GWAS and the standard candidate gene analysis. In this study, the presumed carrier lamb 14-04 was not heterozygous for the consecutive 25 SNPs in this targeted IBD region as were the other carriers, but was homozygous with the same haplotype as the affected sheep. This was probably due to this lamb carrying the ancestral genomic region within which the g.25513delT mutation arose in some other ancestor. The genotyping results of g.25513delTfor lamb 14-04 demonstrated it was in fact heterozygous which was consistent with the presumed disease status. The SLC13A1 gene was considered as the most plausible functional candidate gene within the 1Mb-interval on OAR 4. This gene belongs to the solute-linked carrier gene family 13 (SLC13) and encodes a protein called renal Na+ - dependent inorganic sulfate transporter-1 (Sodium/sulfate cotransporter, NaS1). DNA sequencing revealed the 1-bp deletion of “T” was perfectly associated with the chondrodysplasia phenotype in Texel sheep. We also demonstrated that this deletion was not observed in any of the 54 normal Romney sheep controls. We confirmed the presence of this deletion as a mutation on the genomic DNA and believe that the ovine NaS1 protein in the affected sheep may undergo a nonsense mRNA mediated decay, a process which detects nonsense mutations and prevents the expression of truncated or erroneous proteins. The RT-PCR analysis on SLC13A1 mRNA could provide further evidence to support our conclusions. Unfortunately, most of the originally affected 102 animals either died due to tracheal collapse or were euthanized at approximately 5 months of age. New lambs from designed matings between carrier animals would need to be produced for any future experiments. Our study is the first report showing that the mutation in the SLC13A1 gene is responsible for inherited chondrodysplasia in a large mammal species. A previous research described that a homologous Slc13a1 knockout mouse model produced similar clinical phenotypes as our affected Texel sheep, including reduced serum sulfate concentration and a general growth retardation throughout adulthood (Dawson et al. 2003). The knockout mouse model further supports our findings. The NaS1 co-transporter locates on the brush border membranes of epithelial cells lining the renal proximal tubule and the gastrointestinal tract. It helps to absorb inorganic sulfate in the small intestine and to reabsorb in the kidney proximal tubule to maintain the concentration of serum sulfate (Markovich, 2001). Sulfate is the fourth most abundant anion in human plasma, and plays an essential role during growth, development, cellular metabolism, and bone/cartilage formation. For example, the sulfated proteoglycans as the largest group of sulfoconjugate in mammals, are required for the maintenance of normal structure of bone and cartilage (Fernandes, et al. 1997). The SLC13 family was believed to be distinct functionally and structurally from the SLC26 gene family which constitutes Na+ - independent inorganic sulfate transporter (Markovich, 2008). A recent study has indicated that the SLC26A2 protein expression was limited to the microvillus membrane of kidney proximal tubule (Chapman & Karniski, 2010). We believe NaS1 and SLC26A2 proteins would share similar 103 functions during the sulfate transportation due to their similar localizations. The exact roles of sulfate transportation for each of the two proteins are still uncertain. Copy number variation analysis is receiving more attention since the development of high density SNP chips as well as the dramatic improvements in new sequencing technologies. Several CNVs have been found associated with human diseases such as obesity (Chen et al. 2011), autoimmune disease (Mamtani et al. 2010) and autism (Sebat et al. 2007). Moreover, an extra copy of the FGF4 gene, as a form of evolutionarily acquired fgf4 retrogene, was identified strongly associated with chondrodysplasia in dogs (Parker et al. 2009). With the intention of determining whether CNVs were associated with this inherited chondrodysplasia in Texel sheep, we performed a CNV analysis based on the SNP chip genotyping data. However, there were no CNVs apparently associated only with this defect. The CNV variations detected on OAR1 and OAR3 using the partition software probably came from the SNP chips themselves rather than being due to the disease status themselves. In summary, our study indicates the 1-bp coding deletion of “T” in the ovine SLC13A1 gene is responsible for this form of inherited chondrodysplasia in Texel sheep. Our results suggest NaS1 is essential for maintaining sulfate homeostasis in sheep. The comparison of the gene expression of SLC13A1 or directly detecting the levels of the NaS1 protein using in situ hybridization in the kidney tissues among chondrodysplasia affected, carriers and normal Texel sheep should be done in the future. Using the ovine high density SNP array with a candidate gene approach was confirmed again by this study as an efficient way for localization of a mutation locus causing a simple recessive inherited defect in sheep. Similar methodologies have been carried out to identify mutations in other recessive inherited defects 104 in sheep, such as microphthalmia and inherited rickets (Becker et al. 2010; Zhao et al. 2011). The SLC13A1 gene can be rapidly applied as a selection marker to identify carriers with the defective g.[25513delT] allele in order to cull carriers, or avoid at-risk matings to improve animal welfare and decrease economic losses. Due to the moderate size and the relatively low cost of maintenance of carriers, sheep with chondrodysplasia could be used as a potential animal model for future human chondrodysplasia clinical drug development if any similar human cases are diagnosed in the future. ACKNOWLEDGEMENTS The authors appreciate the opportunity for this paper to be included in this journal’s special issue in honor of Dr. Moshe Soller and thank the editor and referees for their comments. The authors appreciate the financial support provided by State of Iowa and Hatch Funds and the support from the College of Agriculture and Life Sciences at Iowa State University. Animal studies and diagnoses were funded by Massey University. H.T.B. was partially funded by the National Research Centre for Growth and Development. Conflicts of Interest The authors have declared no potential conflicts. Web URLs Primer3: http://fokker.wi.mit.edu/primer3/ Ovine Genome Assembly v1.0: http://www.livestockgenomics.csiro.au/perl/gbrowse.cgi/oar1.0/#search ExPASy translate tool: http://ca.expasy.org/tools/dna.html 105 CLUSTALW: http://align.genome.jp/ TopPred 0.01: http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html REFERENCES Becker D., Tetens J., Brunner A., Burstel D., Ganter M., Kijas J. & Drogemuller C. (2010) Microphthalmia in Texel sheep is associated with a missense mutation in the pairedlike homeodomain 3 (PITX3) gene. PLoS One 5, e8689. Beckmann J.S. & Soller M. (1983) Restriction fragment length polymorphisms in genetic improvement: methodologies, mapping and costs. Theoretical and Applied Genetics 67, 35-43. Beever J.E., Smit M.A., Meyers S.N., Hadfield T.S., Bottema C., Albretsen J. & Cockett N.E. (2006) A single-base change in the tyrosine kinase II domain of ovine FGFR3 causes hereditary chondrodysplasia in sheep. Animal Genetics 37, 66-71. Byrne T.J. (2005) Investigation into the inheritance and biochemistry of chondrodysplasia in Texel sheep. Master’s thesis. Massey University, New Zealand. Byrne T.J., Blair H.T., Thompson K.G., Stowell K.M. & Piripi S. (2008) Inheritance of chondrodysplasia in Texel sheep. Proceedings of the New Zealand Society of Animal Production 68, 20-2. Chapman J.M. & Karniski L.P. (2010) Protein localization of SLC26A2 (DTDST) in rat kidney. Histochemistry Cell Biology 133, 541-7. Chen Y., Liu Y.J., Pei Y.F., Yang T.L., Deng F.Y., Liu X.G., Li D.Y. & Deng H.W. (2011) Copy Number Variations at the Prader-Willi Syndrome Region on Chromosome 15 and associations with Obesity in Whites. Obesity (Silver Spring) 19, 1229-34. Cockett N.E., Shay T.L., Beever J.E., Nielsen D., Albretsen J., Georges M., Peterson K., Stephens A., Vernon W., Timofeevskaia O., South S., Mork J., Maciulis A. & Bunch T.D. (1999) Localization of the locus causing Spider Lamb Syndrome to the distal end of ovine Chromosome 6. Mammalian Genome 10, 35-8. Dawson P.A., Beck L. & Markovich D. (2003) Hyposulfatemia, growth retardation, reduced fertility, and seizures in mice lacking a functional NaSi-1 gene. Proceedings of the National Academy of Sciences USA 100, 13704-9. Fernandes I., Hampson G., Cahours X., Morin P., Coureau C., Couette S., Prie D., Biber J., Murer H., Friedlander G. & Silve C. (1997) Abnormal sulfate metabolism in vitamin D-deficient rats. The Journal of Clinic Investigation 100, 2196-203. Hastbacka J., Kerrebrock A., Mokkala K., Clines G., Lovett M., Kaitila I., de la Chapelle A. & Lander E.S. (1999) Identification of the Finnish founder mutation for diastrophic dysplasia (DTD). European Journal of Human Genetics 7, 664-70. Hastbacka J., Superti-Furga A., Wilcox W.R., Rimoin D.L., Cohn D.H. & Lander E.S. (1996) Atelosteogenesis type II is caused by mutations in the diastrophic dysplasia sulfatetransporter gene (DTDST): evidence for a phenotypic series involving three chondrodysplasias. The American Journal of Human Genetics 58, 255-62. 106 Hediger M.A., Romero M.F., Peng J.B., Rolfs A., Takanaga H. & Bruford E.A. (2004) The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteins - Introduction. Pflugers Archive 447, 465-8. Lee A., Beck L. & Markovich D. (2000) The human renal sodium sulfate cotransporter (SLC13A1; hNaSi-1) cDNA and gene: organization, chromosomal localization, and functional characterization. Genomics 70, 354-63. Mamtani M., Anaya J.M., He W. & Ahuja S.K. (2010) Association of copy number variation in the FCGR3B gene with risk of autoimmune diseases. Genes Immunity 11, 155-60. Markovich D. (2001) Physiological roles and regulation of mammalian sulfate transporters. Physiological reviews 81, 1499-533. Markovich D., Romano A., Storelli C. & Verri T. (2008) Functional and structural characterization of the zebrafish Na+-sulfate cotransporter 1 (NaS1) cDNA and gene (slc13a1). Physiology Genomics 34, 256-64. Martínez S., Valdés J. & Alonso R.A. (2000) Achondroplastic dog breeds have no mutations in the transmembrane domain of the FGFR-3 gene. Canadian Journal of Veterinary research 64, 243-5. Parker H.G., VonHoldt B.M., Quignon P., Margulies E.H., Shao S., Mosher D.S., Spady T.C., Elkahloun A., Cargill M., Jones P.G., Maslen C.L., Acland G.M., Sutter N.B., Kuroki K., Bustamante C.D., Wayne R.K. & Ostrander E.A. (2009) An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science 325, 995-8. Piripi S.A. (2008) Chondrodysplasia of Texel sheep. Doctorate thesis. Massey University, New Zealand. Piripi S.A., Williams M.A.K. & Thompon K.G. (2011) On the sulfation pattern of polysaccharides in the extracellular matrix of sheep with chondrodysplasia. Cartilage 2, 36-9. Rothschild M., Jacobson C., Vaske D., Tuggle C., Wang L., Short T., Eckardt G., Sasaki S., Vincent A., McLaren D., Southwood O., van der Steen H., Mileham A. & Plastow G. (1996) The estrogen receptor locus is associated with a major gene influencing litter size in pigs. Proceedings of the National Academy of Sciences USA 93, 201-5. Rothschild M.F. & Soller M. (1997) Candidate gene analysis to detect traits of economic importance in domestic livestock. Probe 8, 13-22. Rozen S. & Skaletsky H.J. (1999) Primer3 on the WWW for general users and for biologist programmers. In: Bioinformatics Methods and Protocols:Methods in Molecular Biology (eds. by Misener S & Krawetz SA), pp. 365-86. Humana Press Inc., Totowa, NJ. Sebat J., Lakshmi B., Malhotra D., Troge J., Lese-Martin C., Walsh T., Yamrom B., Yoon S., Krasnitz A., Kendall J., Leotta A., Pai D., Zhang R., Lee Y.H., Hicks J., Spence S.J., Lee A.T., Puura K., Lehtimaki T., Ledbetter D., Gregersen P.K., Bregman J., Sutcliffe J.S., Jobanputra V., Chung W., Warburton D., King M.C., Skuse D., Geschwind D.H., Gilliam T.C., Ye K. & Wigler M. (2007) Strong association of de novo copy number mutations with autism. Science 316, 445-9. Soller M. & Beckmann J.S. (1983) Genetic polymorphisms in varietal identification and genetic improvement. Theoretical and Applied Genetics 67:25-33. 107 Superti-Furga A., Hastbacka J., Wilcox W.R., Cohn D.H., van der Harten H.J., Rossi A., Blau N., Rimoin D.L., Steinmann B., Lander E.S. & Gitzelmann R. (1996) Achondrogenesis type IB is caused by mutations in the diastrophic dysplasia sulphate transporter gene. Nature Genetics 12, 100-2. Thompson K.G., Blair H.T., Linney L.E., West D.M. & Byrne T. (2005) Inherited chondrodysplasia in Texel sheep. New Zealand Veterinary Journal 53, 208-12. Woelfle J.V., Brenner R.E., Zabel B., Reichel H. & Nelitz M. (2011) Schmid-type metaphyseal chondrodysplasia as the result of a collagen type X defect due to a novel COL10A1 nonsense mutation : A case report of a novel COL10A1 mutation. Journal of Orthopaedic Science 16, 245-9. Zankl A., Mornet E. & Wong S. (2008) Specific ultrasonographic features of perinatal lethal hypophosphatasia. American Journal of Medical Genetics 146A, 1200-4. Zhao X., Dittmer K.E., Blair H.T., Thompson K.G., Rothschild M.F. & D.J. G. (2011) A novel nonsense mutation in the DMP1 gene identified by a genome-wide association study is responsible for inherited rickets in Corriedale sheep. PLoS ONE 6, e21739. doi:10.1371/journal.pone.0021739. Zhao X., Du Z.Q. & Rothschild M.F. (2010) An association study of 20 candidate genes with cryptorchidism in Siberian Husky dogs. Journal of Animal Breeding and Genetics 127, 327-31. Zhao X., Du Z.Q., Vukasinovic N., Rodriguez F., Clutter A.C. & Rothschild M.F. (2009) Association of HOXA10, ZFPM2, and MMP2 genes with scrotal hernias evaluated via biological candidate gene analyses in pigs. Journal of Animal Breeding and Genetics 70, 1006-12. 108 Table 1. The genotyping results of the causative mutation (g.25513delT) Sheep breed Texela Romney a Phenotype Genotype Rate of concordance Status No. of sheep Direct sequencing PCR-RFLP No. of Sheep Affected 15 g.25513delT/ g.25513delT g.25513delT/ g.25513delT 15 100% 8 g.25513delT/T g.25513delT/T 8 100% 54 T/T T/T 54 100% Presumed Carrier Normal These sheep include pure Texel sheep and Texel crossbred sheep. 109 Figure 1. Chondrodysplasia of Texel sheep (A) A pair of full sibling lambs at 56 days. The left lamb (lamb ID: 23-04) was phenotypically normal and its twin brother (lamb ID: 24-04) was characterized with stunted growth. (B) A thirty-two day-old chondrodysplasia lamb with characteristic short neck and wide-based stance. (C) Sections through the trachea of a chondrodysplastic lamb (bottom) and a normal lamb (top) illustrating the thickened, flattened cartilage ring in the affected lamb. 110 Figure 2. Work flow chart for discovering the mutation locus of chondrodysplasia in Texel sheep. (A) The sheep chromosome 4 (OAR 4). (B) The IBD region containing the causative locus is located on OAR 4 between 92.45 and 93.44 Mb. (C) Seven cow reference genes including SLC13A1 encompassing the sheep IBD region based on the comparative genomic maps. (D) The comparisons of the partial SLC13A1 mRNA sequences and the partial protein sequences between the normal (wide type) and chondrodysplasic sheep. The 3 underlined nucleotides in the mRNA sequence highlighted with yellow color represent the codon affected by the g.25513delT mutation. The corresponding amino acids were also underlined in the protein sequences. The next following codon in the protein sequence of affected sheep, highlighted in red color, introduced a stop codon represented by “X”. WT: wild type; A: affected sheep 111 Figure 3. The topology prediction of transmembrane domains (TMDs) of ovine SLC13A1 protein. The peaks above the cut-off value of 1 represent the TMDs. There are 13 putative TMDs in this protein. 112 Figure 4. The direct sequencing results of the causative mutation g.25513delT. All the affected sheep with two copies of deletion T; the presumed carriers show one copy of deletion T; the pooled normal controls show no copy of deletion T. 113 Figure S1. The pedigree information for the 23 sheep genotyped by Ovine SNP chip. The result was attained from microsatellite results (data not shown). The boxes below animal identifiers indicate the status of chondrodysplasia. The 3 terminal rams are at the top-left. They sired dams 3-17 in the out-cross trials, and F1 daughters in the backcross and other 4 ewes with chondrodysplasia background. Arrows indicate parentage results deduced by observation of maternal behavior soon after birth of lambs and microsatellite makers. M: male; F: female. 114 Figure S2. The distribution of homozygous fragments commonly in affected lambs. Based on the different number of SNPs spanning the homozygous fragment, the counts of homozygous fragments commonly in all affected lambs were shown. Only one homozygous fragment spanning 25 SNPs was identified commonly in all 15 affected lambs (pointed by the blue color arrow), when the “larger than 10-SNP” threshold was applied. 115 Figure S3. The CNV region display. Each column represents one individual sheep with the copy numbers of SNP loci across the whole genome. Different colour shows different ranges of copy numbers. The chondrodysplasia status for each sheep was listed on the top of the display map. A: affected; C: presumed carriers. 116 CLUSTALW ( 1. 81) Mouse Rat Sheep Cat t l e Human Dog Zebr af i sh mul t i pl e sequence al i gnment MKLLNYALVYRRFLLVVFTI LVFLPLPLI I RTKEAQCAYI LFVI AI FWI TEALPLSI TAL MKLLNYAFVYRRFLLVVFTVLVLLPLPLI I RSKEAECAYI LFVI ATFWI TEALPLSI TAL MKFFSYVLVYRRLLLVVFTLLI LLPLPLI LRSKEAECAYTLFVVAI FWLTEALPLSI TAL MKFFSYI LVYRRLLLI VFTLLI LLPLPLI LRSKEAECAYTLFVI AI FWLTEALPLSI TAL MKFFSYI LVYRRFLFVVFTVLVLLPLPI VLHTKEAECAYTLFVVATFWLTEALPLSVTAL MKVFSYI LVYRRFLLI VLPPLVLLPLPI I VRTKEAECAYTWAVVAALWLTEALPLPVTAL MRRLRCSWSALKPLI I I STPLAI LPLPLVI KTKEAECAYVLI LMAVYWMTEVLPLSVTAL *: : : *: : : . * : ** * * : : : : : * ** : * ** : : * *: * *. * ** . : ** * Mouse Rat Sheep Cat t l e Human Dog Zebr af i sh LPGLMFPMFGI MRSSQVAS- AYFKDFHLLLI GVI CLATSI EKWNLHKRI ALRMVMMVGVN LPGLMFPMFGI MSSTHVAS- AYFKDFHLLLI GVI CLATSI EKWNLHKRI ALRMVMMVGVN LPGLMFPMFGI MPSKEVAS- AYFKDFHLLLI GVI CLATSI EKWNLHKRI ALRMVMMVGVN LPGLMFPMFGI MTSKEVAS- AYFKDFHLLLI GVI CLATSI EKWNLHKRI ALRMVMMVGVN LPSLMLPMFGI MPSKKVAS- AYFKDFHLLLI GVI CLATSI EKWNLHKRI ALKMVMMVGVN LPGLLFPLFGI TASKEVVASAYFKDFHLLLI GVI CLATSI EKWNLHTRI ALRMVMTVGVN I PALLFPLFGI MKSSQVAS- QYFKDFHLLLLGVI CLATSI EKWGLHRRI ALRLVTGLGVN : * . *: : * : * * * *. . * . : * * ** * * ** * : * * ** * * ** * ** * . * * ** * *: : * : ** * Mouse Rat Sheep Cat t l e Human Dog Zebr af i sh PAWLTLGFMSSTAFLSMWLSNTSTAAMVMPI VEAVAQQI I SAEAEAEATQMTYFNESAAH PAWLTLGFMSSTAFLSMWLSNTSTAAMVMPI VEAVAQQI TSAEAEAEATQMTYFNESAAQ PAWLSLGFMSSTAFLSMWLSNTSTAAMVMPI VEAVAQQVI SAEAEVEATQMTYFSGSTNQ PAWLSLGFMSSTAFLSMWLSNTSTAAMVMPI VEAVAQQVI SAEAEVEATQMTYFSGSTNQ PAWLTLGFMSSTAFLSMWLSNTSTAAMVMPI AEAVVQQI I NAEAEVEATQMTYFNGSTNH PTWLTLGLMSSTAFLSMWLSNTSTAAMVMPI VEAVAQQI I SAQAEVEATQMTSLGGSTNH PGMLMLGFMVGCAFLSMWLSNTSTVAMVMPI VEAVI QQVLSASEEANKDHKGALNGI SNP * * * * : * . * ** * * ** * ** * *. * * ** * *. * ** * *: . * . * . : : :. : Mouse Rat Sheep Cat t l e Human Dog Zebr af i sh GLDI DETVI GQETNEKKEKTKPAPGSSHDKGKVSRKMETEKNAVTG- - AKYRSRKDHMMC GLEVDETI I GQETNERKEKTKPALGSSNDKGKVSSKMETEKNTVTG- - AKYRSKKDHMMC ALEI DESI NGQEANERKEKTKPVPGYSNDAGKI SSKMELEKNSGTNSGNKYRTKKDHMMC ALEI DESI NGQEANERKEKTKPVPGYSNDAGKI SSKMELEKNSGTNSGNKYRTKKDHMMC GLEI DESVNGHEI NERKEKTKPVPGYNNDTGKI SSKVELEKNSGMR- - TKYRTKKGHVTR GLEI DESVNGQETNERNEKTI PAPGYGNDAGKI SSKTEPEKNSDTG- - TKYRTKKDHMMC ALQLEDI EAQHDQSE- - - KI KSI EAGI TEPEPTSEEHETAKAPTPPYSGKYRTREDHMMC . *: : : : :: .* * . . : * : * * . * * *: : : . * : Mouse Rat Sheep Cat t l e KLMCLSVAYSSTI GGLTTI TGTSTNLI FSEHFNTRYPDCRCLNFGSWFLFSFPVALI LLL KLMCLCI AYSSTI GGLTTI TGTSTNLI FSEHFNTRYPDCRCLNFGSWFLFSFPVAVI LLL KLMCLCI AYSSTI GGLTTI TGTSTNLI FAEHFNMRYPDCRCI NFGSWFI MSFPAAI I I LL KLMCLCI AYSSTI GGLTTI TGTSTNLI FAEHFNMRYPDCQCI NFGSWFI MSFPAAI VI LL Figure S4. The comparison of the SLC13A1 protein sequences among 7 species. The comparison was conducted using the multiple sequence alignment tool (CLUSTALW). Table S1. A list of candidate gene symbol, primer sequence, annealing temperature, PCR amplicon information, genetic variants identified by sequencing and accessed template sequence. Gene symbol SLC13A1 Primer name Primer sequence 5’AGCCACCTGTGCAAGACAAT3’ 5’CGCATTGCTGCCTAGTCTC3’ 5’AATAACCATTTGCGGCCCTA3’ sSLCre2R sSLCdelrF sSLCdelrR sRFLPdelTFa sRFLPdelTRa sSLCre5_6F 5’AAGTCAAACATTCGCAGGAAA3’ 5’TGGATAAGGGATGTGGAACTG3’ 5’TGCCTGGATGTGCATTACTT3’ 5’TGAATTACACTGCCATACAAATG A3’ 5’GGTCAAAGTTAGGAGCAGCAA3’ 5’TCCACCTTCACACACACACA3’ sSLCre5_6R sSLCre8F sSLCre8R sSLCre9F sSLCre9R 5’TGCAAACTGCCATCACTATG3’ 5’CTGTTCCCAGGTCTGCTTGT3’ 5’AGAACACGGAACCCAGACAC3 5’CTCGCCACCATATCCCTCTA3’ 5’AACCAGAAACAAAGCCAGGA3’ sSLCre10F sSLCre10R sSLCre11F 5’CCAGGTAGAAGGATAAGAAGCTG A3’ 5’GCTCCAAGGTCTCTACTCCTGA3’ 5’AAGGGAACTTTTGGCCTAGC3’ sSLCre11R 5’AGAGGAAGGGTGCAGTCTGA3’ sSLCre12F 5’AAAATGCAAAGAAGTTTTCCAA3’ sSLCre12R sSLCre13F 5’GCTGGGACATCATAAAAATGG3’ 5’AAAAACCACAGGAAAATATGAAA GA3’ 5’CTGCTGCTTGAACATGGAAT3’ 5’CAATCACCATATAAACAACCGAA A3’ 5’GCACGTTTGCTCCAGGTTAT3’ 5’GAAACTTCTCTGATGCTCTCCAA3’ sSLCre13R sSLCre14F sSLCre14R sSLCre15F Variant name SNP characteristic Sequence access No. Annealing Temp. (ºC) Size (bp) Fragment name 60 472 sSLCe1 no - - JN108880 60 301 sSLCe2 no - - OAR4:9288600092995055 56 411 sSLCdel 349 sRFLPdelT g.25395C>T g.25513delTb g.25513delTb intronic exonic exonic JN108880 58 C > T (170bp) Deletion T (288bp) Deletion T (148bp) 64 510 sSLCe5_6 A > G (181bp) g.78757A>G intronic 64 585 sSLCe8 A > G (261bp) C > T (381bp) g.78677A>G g.59017C>T intronic intronic 58 711 sSLCre9 C > T (61bp) A > G (212bp) g.63875C>T g.64026A>G 58 609 sSLCre10 no - exonic, synonymous - 60 545 sSLCre11 G > T (169bp) g.39802G>T intronic g.39732A>C g.39527C>T - intronic intronic - (GenBank/Ovine 1.0) JN108880 OAR4:9288600092995055 JN108880 JN108880 JN108880 OAR4:9288600092995055 60 535 sSLCre12 A > C (239bp) C > T (444bp) no 60 505 sSLCre13 no - - OAR4:9288600092995055 60 507 sSLCre14 A > T (54bp) g.30951A>T intronic OAR4:9288600092995055 58 529 sSLCre15 A > G (27bp) g.29232A>G intronic OAR4:9288600092995055 OAR4:9288600092995055 117 sSLCre1F sSLCre1R sSLCre2F Genetic variant (position in the fragment) Table S1. (continued) Annealing Temp. (ºC) Size (bp) Fragment name 5’GGCACATAGATGCTCTTGGTT3’ 5’TGTTGAGTAGCCAGTTGTGGTT3’ 58 567 sSLCre16 no - - 5’TTGCAATTGACTTTCCTTGC3’ 5’TGGTGGGCAATTCACTTCTT3’ OAR4:9288600092995055 58 339 sCADPS-1 no - - OAR4:9215574592255745 58 405 sCADPS-2 no - - OAR4:9215574592255745 60 515 sCADPS2-3 no - - OAR4:9215574592255745 58 353 sCADPS2-4 no - - OAR4:9215574592255745 5’AGTTCCAGGGAGACCATTCC3’ 58 302 sIQUB-1 no - - sIQUB-1R sIQUB-2F 5’TGATTAAACCACCAGCACCA3’ 5’GAATACAGGGCAGGGAAAGG3’ OAR4:9320975293265179 58 320 sIQUB-2 no - - sIQUB-2R sIQUB-3R 5’GCAATGGTTCATTACTTTTGGA3’ 5’CCCCCAAAAGCAGAACTTTA3’ OAR4:9320975293265179 58 392 sIQUB-3 no - - sIQUB-3R sIQUB-4F 5’GCATATCTACTTCAGAGGGCTTT3’ 5’AAATGAAGCCAGTAGCGGAAT3’ OAR4:9320975293265179 58 404 sIQUB-4 no - - sIQUB-4R sASB15_1F sASB15_1R sASB15_2F sASB15_2R sASB15_3F sASB15_3R sASB15_4F 5’TTTGTTATCATGGAAGAGACTTGC3’ 5’ CTGTGTGAAGCGATGGTCAG3’ 5’GGCACTCGGATTTGTAGGTC3’ 5’GGGGACCTACAAATCCGAGT3’ 5’GGGGAAAGTGGTCTGTTTGA3’ 5’TTGAAAAGGCCAAGAACACA3’ 5’CTGTTTGGATTTTGGGAGACA3’ 5’TTGCCTGTGGCTCTCATACA3’ OAR4:9320975293265179 58 607 sASB15-1 A > G (150bp) g.150A>G exonic JN108881 58 519 sASB15-2 no - - JN108881 58 404 sASB15-3 no - - JN108881 58 343 sASB15-4 no - - sASB15_4R 5’TTACCAGCACAGCCATACGA3’ OAR4:9334394893353368 Primer name Primer sequence SLC13A1 sSLCre15R sSLCre16F sSLCre16R sCADPS21F sCADPS21F sCADPS22F sCADPS22R sCADPS23F sCADPS23R sCADPS24F sCADPS24R sIQUB-1F IQUB ASB15 Variant name SNP characteristic 5’CGGGTCAACTTAGCAGGTTT3’ 5’TTGCAGACACACCCTCAAAA3’ 5’GGACAGAGAAGCCTGGAGTG3’ 5’GAACCTTCCTGTTGATGCAAG3’ 5’CCATGTTTCCAGATAAATTCTGTTC3’ 5’AAATCATGGGCATTCAGACC3’ 5’CATGAAAGGCAGCATGAGAA3’ (GenBank/Ovine 1.0) 118 Gene symbol CADPS2 Sequence access No. Genetic variant (position in the fragment) Table S1. (continued) Annealing Temp. (ºC) Size (bp) Fragment name 5’TGCAGATGCCCTCATTGTTA3’ 58 307 sASB15-5 no - - sASB15_5R sASB15_6F 5’CCTGTTGAACGGAGCGTAA3’ 5’TCCAAACATGCAATCCAGAA3’ 58 372 sASB15-6 no - - sASB15_6R sLMOD2-1F 5’ACTGTATGGCACTGGGGAAG3’ 5’AGCTGGACGACATTGAACCT3’ OAR4:9334394893353368 58 434 sLMOD2-1 C > T (207bp) g.470C>T intronic sLMOD2-1R sLMOD2-2F 5’TCTAAACCCGTGCTTTGCTT3’ 5’GGGTGTCAACCCATTTCAAC3’ 58 444 sLMOD2-2 G > T (230bp) no g.493G>T - intronic - OAR4:9337023993377211 sLMOD2-2R sLMOD2-3F 5’GTGCCAACAGGACTTGGTTT3’ 5’TGTGAGTTCATCCCTGGTCA3’ 58 337 sLMOD2-3 no - - sLMOD2-3R sLMOD2-4F 5’TGCTCTGGAAAACCCTTGAT3’ 5’CAGCGAAGGAGAGGAAAGAA3’ OAR4:9337023993377211 63 430 sLMOD2-4 no - - sLMOD2-4R sLMOD2-5F 5’GATGTTGACGCTGGTGATGT3’ 5’CCTTTGGCTACAGGAGAGGA3’ OAR4:9337023993377211 58 306 sLMOD2-5 no - - sLMOD2-5R sWASL-1F 5’GGGATTGGGAATAGCTGAGG3’ 5’GGGGGTCTCGTCTTTTCTCT3’ OAR4:9337023993377211 63 466 sWASL-1 C > T (248bp) g.256C>T intronic sWASL-1R sWASL-2F 5’TTGTCCCTCTGCTGTCACTG3’ 5’TGTTTTGTGTAAAGGGCCAGA3’ OAR4: 9343101793481693 60 494 sWASL-2 no - - sWASL-2R sWASL-3F 5’TCCATGCTTTTTGCCTTCTT3’ 5’CGTTGCTTCCTATTAAGGCGTA3’ OAR4: 9343101793481693 58 381 sWASL-3 Deletion A (309bp) g.5692delA intronic sWASL-3R sWASL-4F 5’TGCTCTGTGTGTAAAGGCTGTT3’ 5’AACGCCCATGCTTCAATATC3’ OAR4: 9343101793481693 58 399 sWALS-4 no - - sWASL-4R 5’GGGGTTCTGGTTCAGTGTCT3’ OAR4: 9343101793481693 Gene symbol Primer name Primer sequence ASB15 sASB15_5F LMOD2 Variant name SNP characteristic (GenBank/Ovine 1.0) OAR4:9334394893353368 OAR4:9337023993377211 119 WASL Sequence access No. Genetic variant (position in the fragment) 120 CHAPTER 5. BAYESIAN INFERENCE IDENTIFIES A CANDIDATE REGION ASSOCIATED WITH CANINE CRYPTORCHIDISM THAT INCLUDES THE AMHR2 GENE A paper prepared to submit to the Journal of Heredity Xia Zhao, Suneel Onteru, Mahdi Saatchi, Dorian Garrick, Max Rothschild Department of Animal Science, Iowa State University, IA, 50011 USA. Key word: AMHR2, cryptorchidism, BayesB analysis ABSTRACT Cryptorchidism is a condition whereby one or both testes fail to descend into the scrotal sac. It is a common congenital disorder prevalent in many mammalian species. Cryptorchidism is a complex genetic disease considered to be caused by the interaction of genetic, epigenetic and environmental factors. No single mutated gene has been found to cause more than 10% of cryptorchids. Here we performed a genome wide association study (GWAS) using case-control analysis in PLINK software and the BayesB approach in GenSel software with a 1Mb window of SNPs or haplotypes. The haplotypes were constructed from a genealogical tree on the population of 204 Siberian Huskies and its 3 subgroups clustered by genomic relationships to account for population structure. No significant association was observed in the case-control analyses after FDR corrections in PLINK. In contrast, the BayesB analysis identified a 1Mb region on dog chromosome 27 121 (CFA27) containing the AMHR2 (anti-mullerian hormone type II receptor) gene that explained a high percentage of genetic variance in subgroup 1 (~ 10 fold greater than the expected value for an infinitesimal model in the SNP window analysis and ~ 9 fold greater in the haplotype window analysis). It is known that mutations in AMHR2 cause persistent mullerian duct syndrome (PMDS) in humans and the majority of such patients with PMDS exhibit cryptorchidism. In conclusion, we identified a putative candidate region at the 4th Mb on CFA27 (CanFam2.0) harboring a very promising functional candidate gene AMHR2 associated with the development of cryptorchidism in a subgroup of our Siberian Husky population. INTRODUCTION Testicular descent is an essential developmental stage for normal reproduction in male individuals of many mammalian species. Normally, testes descend from an intra- abdominal position down into the scrotal sac. That environment with 2 - 4 °C below normal body temperature is a prerequisite for spermatogenesis in humans and many other mammals (Klonisch et al. 2004; Thonneau et al. 1998). The approximate timing of testicular descent varies in different species. In mice, rats and especially dogs, testicular descent is normally completed postnatally, while in humans, rabbits, pigs and horses, this process is normally completed before birth (Amann and Veeramachaneni, 2007; Hughes and Acerini, 2008). The failure of a testis to descend into the scrotal sac is called cryptorchidism (Greek: hidden testicle). This defect is regarded as one of the most common congenital human defects. About 3% of full term and 30% of premature infant boys are born with at least one testicle undescended (Klonisch et al. 2004). Cryptorchidism has been observed in other mammalian species, such as dogs (Cox et al. 122 1978), cats (Yates et al. 2003), horses (Cox et al. 1979), pigs (Rothschild et al. 1988), goats (Singh and Ezeasor 1989), sheep (Smith et al. 2007) and cattle (St Jean et al. 1992), as well as in wild animals like maned wolves (Burton and Ramsay, 1986), black bears (Dunbar et al. 1996) and reindeer (Leader-Williams 1979). An undescended testicle is considered a risk factor for germ-cell tumors and bilateral cryptorchidism can result in male infertility (Berkowitz et al. 1993; Carizza et al. 1990). A general perception of causes for cryptorchidism is a multiplicity of factors including genetic, epigenetic and environmental components (Spencer et al. 1991; Klonisch et al. 2004; Thonneau et al. 2003; Hutson and Hasthorpe, 2005; Amann and Veeramachaneni 2007). A number of environmental factors have been recognized to be associated with an undescended testis in humans, and these factors include low birth weight, premature birth, maternal consumption of alcohol, smoking and caffeine, gestational diabetes and prenatal exposure to pesticides (Hughes and Acerini 2008; Wang and Wang 2002). Familial analyses show that cryptorchidism is a genetic disorder. Various genes implicated in the regulation of testicular descent have been investigated for cryptorchidism. The processes of testicular descent have been proposed as a biphasic model, with a “transabdominal” phase followed by an “inguinalscrotal” phase (Hutson 1985). Androgens might play a role in both phases. Impaired fetal androgen action can result in cryptorchidism. Genes related to androgen action have been investigated in the study of cryptorchidism. The androgen receptor gene (AR) is strongly expressed in the gubernaculums (George and Peterson 1988). The length variant of trinucleotide repeats like GGN (encoding glycine) or CAG (encoding glutamine) in AR was identified respectively to be associated with a risk of cryptorchidism in an Iranian population and in 123 Hispanic boys (Radpour et al. 2007; Davis_Dao et al. 2012). Mice with conditional inactivation of Ar developed cryptorchidism with testes located in a suprascrotal position (Kaftanovskaya et al. 2012). Androgens act on testicular descent via the release of a calcitonin gene-related peptide by the genitofemoral nerve. Calcitonin gene-related peptides are encoded by two separated genes named CALCA (CGRP-alpha) and CALCB (CGRP-beta). Exogenous calcitonin gene-related peptides injection into the scrotum can change the direction of gubernacular migration in the mutant trans-scrotal (TS) rat (Griffiths et al. 1993; Clarnette and Hutson 1999). However, the CGRP gene was not found to be a major factor for cryptorchidism based on a mutation association analysis in a human study (Zuccarello et al. 2004). The swelling and migration of male gubernaculum with relaxation of the cranial suspensory ligament in the process of testicular descent allow testes to move from the posterior abdominal wall to the internal inguinal ring (Hutson 1985). The gubernaculum is a continuous column of mesenchyme connecting the testis to the scrotal sac. Insulinlike factor 3 (INSL3) is expressed in the pre-and postnatal Leydig cells of the testis and postnatal theca cells of the ovary, and is a member of the insulin-like hormone superfamily (Zimmermann et al. 1997). Targeted disruption of this gene prevents gubernaculum growth and causes bilateral cryptorchidism in mice (Nef and Parada 1999; Zimmermann et al. 1999). The receptor of INSL3 is encoded by a gene called relaxin/insulin-like family peptide receptor 2 (RXFP2), also named the leucine-rich repeat-containing G protein-coupled receptor 8 gene (LGR8). LGR8 is highly expressed in the gubernaculum and is responsible for mediating hormonal signals that control normal testicular descent (Bogatcheva et al. 2003). Mutant mice for this gene exhibit 124 bilateral intra-abdominal cryptorchidism due to the undifferentiated gubernaculum (Gorlov et al. 2002). Summarized from previous reports, only a minority of affected cases have a mutation either in the INSL3 gene or its receptor gene LGR8. Disruptions of other hormone related genes also show associations with aberrant testicular descent. Mullerian inhibiting substance (MIS), or anti-Mullerian hormone (AMH), encoding a 140-kDa glycoprotein produced by Sertoli cells is critical for the mullerian duct regression during male sexual differentiation (Bashir and Wells 1995) and is associated with the swelling reaction of gubernaculum occurring during the first phase of testicular descent (Kubota et al. 2002). The mutated gonadotropin-releasing hormone receptor (Gnrhr) gene is associated with undescended testes in mice (Pask et al. 2005). In humans, maternal exposure to the synthetic nonsteroidal estrogen, diethylstilbestrol (DES), results in a high risk of offspring with cryptorchidism (Gill et al. 1979). The receptor of estrogen, estrogen receptor 1 (ESR1), is one of the ligand-activated transcription factors and is important in the process of testicular descent (Donaldson et al. 1996; Przewratil et al. 2004). Retracted gonads in knockout male mice of Esr1 suggest estrogens played a direct role in the scrotal positioning of the testis (Donaldson et al. 1996). Investigations have also been carried out on homeobox (Hox) genes in mice for cryptorchidism. Homeobox A10 (Hoxa10) is highly expressed in the gubernaculum and this gene has been found to be essential for the gubernacular migration during testicular descent (Nightingale et al. 2008). In mice, disruption of this gene and another Hox gene, called Hoxa11, results in the development of cryptorchidism (Satokata et al. 1995; HsiehLi et al. 1995). 125 There is a high prevalence (about ~14%) of cryptorchidism in Siberian Husky dogs (Zhao et al. 2010). Although several studies show association of some candidate genes with cryptorchidism, the genetic basis of the development of cryptorchidism is still controversial. Here we performed a genome wide association study in Siberian Husky dogs using the Illumina CanineHD BeadChip with more than 170,000 single nucleotide polymorphisms (SNPs) and tried to identify genomic regions and propose some positional candidate genes associated with variation in the incidence of cryptorchids. METHODS Animals and phenotypes A total of 205 male Siberian Huskies and 12 male Chinooks were included in this study. Among them, 43 Siberian Huskies were from the UK, 8 from Canada, and the remaining 154 Siberian Huskies and all Chinooks were from USA. Dogs were diagnosed at 6 months of age by veterinarians or dog owners as being either unilateral or bilateral cryptorchids, or as normal unaffected dogs. In this study, there are 106 dogs with cryptorchidism and 99 diagnosed normal in the Siberian Husky breed. Six affected and 6 control Chinooks were included as a control population for population stratification. Samples and DNA preparation Buccal samples were collected by owners from 186 Siberian Huskies and 12 male Chinooks using sterile cytology brushes (Fisher Scientific, Hampton, NH, USA). The Canine Health Information Center (CHIC) DNA repository provided 19 male Siberian Huskies DNA samples with an average concentration of 50 ng/ul. DNA was extracted from 2 to 3 cytology brushes per dog within a single reaction of the QIAamp DNA Mini Kit (QIAGEN, Inc., Valencia, CA, USA). The DNA concentrations ranged from 10.0 to 126 50.0 ng/µl. Samples with DNA concentration less than 20 ng/µl were subjected to wholegenome amplification (WGA) using the GenomiPhi (GE Healthcare, Buckinghamshire, England) kit to yield high-quality DNA suitable for the Illumina infinium canine genotyping platform. Data generation and analyses SNP array genotyping and quality control DNA samples with a total DNA amount of 700-1,000 ng were sent to GeneSeek, Inc. (Lincoln, NE, USA) for genotyping on the CanineHD BeadChip. The SNPs with call rate ≤ 40%, Gentrain score ≤ 40%, and minor allele frequency ≤ 0.01 were excluded from the data set. Analyses of population structure Genotypes of all 217 dogs including 205 Siberian Huskies and 12 Chinooks were imported into Haploview v4.2 software (Barrett et al. 2005). Pairwise linkage disequilibrium (LD) comparisons among markers of each chromosome were performed. Tagging SNPs were identified using the TAGGER routine implemented in Haploview. In order to remove the underlying group effects we stratified the population, using the model-based clustering method in STRUCTURE v2.3 software (Pritchard et al. 2000) for inferring population structure using multi-locus Illumina 170K CanineHD BeadChip genotypes for all animals. The admixture model was chosen in STRUCTURE on selected tagging SNP makers and most of the parameters were at their default values. The length of both the burn-in and post burn in Markov chain Monte Carlo (MCMC) interations were 5,000 wth 3 runs for each K (the number of presumed population 127 clusters). The presumed value of K was varied from 1 to 7. To detect the true value of K, a second order rate of changes of the likelihood function with respect to K represented as ∆K was applied. The highest value of the ∆K distribution was used as an indicator of the strength of the signal detected by STRUCTURE (Evanno et al., 2005). The dogs were assigned into different subgroups according to the constitution of Q values (estimated membership coefficient) in presumed clusters for each individual. Genome wide association studies Genome wide association studies were carried out on each of the subgroups assigned by STRUCTURE as well as on all male Siberian Huskies. Three approaches were applied for the data analyses. The first approach was the case-control analysis implemented in PLINK v1.06 software (Purcell et al. 2007) using single SNP markers. The linear step-up procedure of Benjamini and Hochberg FDR control in PLINK was utilized in order to correct for multiple testing. The second approach was a BayesB model averaging approach using the GenSel v4R software (http:// bigs.ansci.iastate.edu) with the categorical analysis option, to obtain the variance explained by SNPs in one megabase (1Mb) genomic windows (which we refer SNP model). The statistical model used for BayesB was as follows: 1 where is the phenotype of animal i, is an overall mean, is the total number of SNPs in the panel, is genotype at marker j in individual i, is a random substitution effect for marker with its own variance 0 ( with a probability 1- ) or 0 (with a 128 probability of ), and ei is the random residual on animal i that are assumed to be normally distributed N (0, . The parameter was assumed to be 0.9999 so as to fit 10-20 markers (0.0001x 170K) per iteration of the Markov chain in a mixture model for the estimation of individual SNP effects. A total of 51,050 MCMC iterations with burnin of 1,000 iterations and with an output frequency of 50 iterations were used for inference. There were 2,346 unique 1Mb windows across the dog genome based on the genotyped markers. The number of markers in each 1Mb window varied according to the genomic regions. It was assumed that under an infinitestimal model all windows would account for the same variance of about 0.04%. Hence, the 1Mb windows explaining at least 0.4% of genetic variance, which is 10 fold greater than expected, were considered as indicative candidate regions for a gene influencing the incidence of cryptorchids. In order to utilize haplotype information for identifying genes underlying complex traits, a tree-based association method that captures information about the evolutionary history (genealogy) of the genome were implemented in the analysis. The third approach, we refer to as the BayesB analysis with haplotype model, was carried out on each of the subgroups and their combined dataset to further verify regions showing associations with cryptorchidism in the BayesB analysis with SNP model. To perform this analysis, genotypes for each individual were phased on each chromosome (based on the CanFam 2.0 assembly) using BEAGLE v3.3 software in which a localized haplotype-cluster mode was fitted using an EM algorithm (Browning & Browning 2007). Subsequently, local phylogenetic trees were constructed for each SNP marker with default values in Blossoc software (Mailund et al. 2006) and were drawn as a rooted binary tree (Sahana et al. 2011). At the third level of each tree, there are eight possible nodes and all haplotypes 129 descending from each of the nodes (with the same phylogenetic lineage) were considered as a cluster. A new incidence matrix was developed to relate each of the genotyped animals to the detected haplotype clusters and its original SNP genotypes. The same BayesB model was applied to the new incidence matrix and the genetic variance explained by each 1 Mb genomic window was obtained using GenSel software. The windows passing the criteria (> 0.40% of genetic variance) in the SNP model and also explaining at least 0.20% of genetic variance, which is 5 fold greater than expected, were retained as candidate regions in this study. RESULTS There were 143,286 SNPs for GWAS analyses after quality control edits. One Siberian Husky dog was excluded due to its low genotype call rate and the remaining 105 affected and 99 controls were included for the GWAS analyses. A total of 47,672 tagging SNP markers were selected using Haploview (data not shown) and then used to run simulations in STRUCTURE. Under a model assuming admixture, the results recover 3 population clusters (K=3) according to the ∆K statistic which is based on the rate of change in the log probability of the data (Figure 1B). The estimated proportion of ancestry (Q value) derived from the 3 assumed population clusters varied among individual dogs. The 12 Chinook dogs were distinct from the Siberian Husky dogs (Figure 1A). Here we re-assigned dog samples into 4 subgroups according to their constitution of Q values in the 3 population clusters (Figure 1C, Supplementary Table S1). The first 3 subgroups are Siberian Huskies and subgroup 4 included all 12 Chinooks with extreme Q values of cluster 3 (0.928 to 1). Due to the small sample size, the Chinook dogs in subgroup 4 were used only as a control breed for the structure clustering, but not 130 included in subsequent GWAS. There were 129 dogs in subgroup 1 with a range of Q values from 0.709 to 1 for cluster 1, the majority of these dogs were collected from USA (95%) and 5% were from Canada. Sixty dogs are cryptorchids and 69 are controls in this subgroup. Subgroup 3 included 20 cryptorchids and 15 control dogs, with high Q values of cluster 2 (0.700-0.838). Most of these dogs were sampled from the UK (94%). Individuals in the subgroup 2 had intermediate Q values for both clusters 1 and 2. These dogs were of mixed origins, being sampled from USA (71%), the UK (24%) and Canada (5%). There were 41 individuals in subgroup 2, and 26 were cryptorchids. The sample sources and breeds for each dog subgroup are in Figure 1C and the supplementary Table S1. Twelve 1Mb SNP windows showed a high percentage of genetic variance (> 0.4%, 10 fold greater than the expected value) when the BayesB analyses were conducted in subgroup 1. These regions include one window on each of dog chromosomes CFA1, CFA3, CFA14 and CFA18, four windows on each of CFA5 and CFA27. Notably, the four windows on CFA27 explain 4.67% of genetic variance and one of them located at the 5th Mb explains 3.24% of genetic variance by itself (Table 1, Figure 2). Furthermore, three out of these four windows on CFA27 remain with high genetic variance compared to other regions (> 5 fold of the expected values, 0.34%, 0.53% and 0.22% for the 1Mb windows located at the 4th, 5th and 6th Mb on CFA27, respectively) when the window approach using haplotype model was conducted in BayesB analyses for subgroup 1 (Table 1). The windows on CFA1 and CFA14 were both at 50 Mb and were also detected by the haplotype analysis accounting for genetic variance over 0.2% (Table 1, Figure 2). By searching the database of CanFam 2.0 using Ensembl browser, 74 characterized genes 131 have been mapped between the 4th to the 6th Mb on CFA27, 8 on the 50th Mb of CFA1, and 5 on the 50th Mb of CFA14 (Supplementary Table S2). Among them, a candidate gene named anti-mullerian hormone type II receptor (AMHR2) was found to be located in the region of the 4th Mb of CFA27 (Figure 2, Supplementary Table S2). Human patients with mutations in this gene typically develop cryptorchidism (Clarnette et al. 1997). Moreover, a cluster of HOX-C genes including HOXC4, HOXC5, HOXC6 HOXC8, HOXC10, HOXC11, HOXC12 and HOXC13 were identified as being located within the 5th Mb window on CFA27 (Table S2). The results from BayesB analyses by SNP model for subgroup 2 and subgroup 3 identified six different regions. Among those six 1Mb regions, only two of the 1Mb windows were also identified as candidate regions with the haplotype model, those windows at 75 Mb on CFA4 for subgroup 2 (Table 1, Supplementary Figure S1) and 18 Mb on CFA31 for subgroup 3 (Table 1, Supplementary Figure S2). Eight characterized genes such as CAPSL (calcyphosine-like), IL7R (interleukin 7 receptor) and SPEF2 (sperm flagellar 2) etc. have been mapped to 75 Mb of CFA4 and two types of U6 spliceosomal RNA have been mapped to 18 Mb of CFA31 (Supplementary Table S3). No apparent genes in these regions have been reported so far to be associated with the processes of testicular descent or the development of cryptorchidism. The sample size for these two subgroups is relative small. More samples are needed in the future for discovering new loci if it is the fact that dogs with cryptorchidism in these two subgroups have distinct causative mutations from individuals in subgroup1. The results from GWAS analyses using all Siberian Huskies (105 affected and 99 controls) show that 1Mb SNP windows on CFA3, 6, 9, 24, 27, X explain a high 132 percentage of genetic variance (higher than 0.40%) (Table 1). Among those, four regions including the 31st and 33rd Mb windows on CFAX, 56th Mb on CFA9 and 13th Mb window on CFA27 also showed a high percentage of variance (higher than 0.20%) when the window were fitted using the haplotype model (Table 1, Supplementary Figure S3). Comparing the results of this pooled analysis with previous analyses from each of the 3 subgroups separately, we did not identify a common window explaining high percentage of genetic variance. It is to be noted that most of the SNPs with lowest raw P-values on each chromosome in the case-control analyses in PLINK were located within the windows detected from the BayesB analyses (Figure 2A, Supplementary Figure S1A, S2A and S3A), even though their significance disappeared after FDR corrections (data not shown). Additionally, we summarized the results from the BayesB analyses for genes that had previously been shown to be associated with cryptorchidism, including ESR1, NR5A1 (nuclear receptor subfamily 5, group a, member 1), GNRHR, HOXA10, HOXA11, FGFR1 (fibroblast growth factor receptor 1), SOS1 (son of sevenless, drosophila, homolog 1), WT-1 (Wilms tumor gene 1), INSL3, AMH, CALCA, PROKR2 (prokineticin receptor 2), LGR8, COL2A1, KAL1 (Kallmann syndrome interval gene 1) and AR (Table 2). There are no regions covering these genes passing the criteria set in this study to show apparent associations with cryptorchidism, neither in the three subgroup analyses nor in the pooled analysis of the Siberian Husky population. The causative loci for cryptorchidism in our study do not seem to coincide with those previously reported candidate genes. 133 DISCUSSION Genetic association studies relate differences in disease status reflected as case and control groups with differences in allele frequencies at SNPs. Undetected population substructure in an admixed population can lead to spurious allelic associations (Cardon and Palmer 2003). Diseases could be heterogeneous when the same condition is caused by different genes or different alleles at the same gene in different populations. The genetic basis for cryptorchidism is likely heterogeneous in different populations. Perhaps at least 20 genes might be associated with cryptorchidism, since no single mutated gene has been found to cause more than 10% of cryptorchids (Klonisch et al. 2004; Amann and Veeramachaneni 2007). Previous studies have shown that only 3-5% cryptorchid men exhibit gene aberrance for INSL3 or GREAT and < 16% for AR or ESR1 (Amann and Veeramachaneni 2007). In another instance, a specific haplotype of ESR1 has been found to be associated with cryptorchidism status in a Japanese population (Yoshida et al. 2005), but only being associated with the level of severity of cryptorchidism in other populations (Wang et al. 2008). In such cases, isolating a relatively homogeneous subpopulation from a heterogeneous population is necessary for cryptorchidism studies. Here we used STRUCTURE to infer the presence of distinct population clusters and then re-assigned individuals to 4 new subgroups in order to discover aberrant genes associated with the condition in each subsample. The re-assigned dog subgroups were consistent for the most part with their geographic distribution but the regions (windows) identified in each subgroups were distinctly different in each sub population. It demonstrates that different genes might contribute to cryptorchidism in different groups of dogs. 134 Comparing the results of the pooled analyses (all 204 Siberian Huskies) using BayesB approaches with results from the three subgroups separately, we did not identify a common window accounting for high genetic variance. This may be due to fact that dogs from different geographical backgrounds in the pooled population blended the allele frequencies of SNPs associated with the disease in any particular subgroup. In this study, the AMHR2 gene was considered a promising candidate gene for canine cryptorchidism in Siberian Husky subgroup 1 based on results from BayesB analyses using the window approaches. The SNP (BICF2G630137913, 4791435bp on CFA27) located within AMHR2 gene was not significantly associated with cryptorchidism from the results of single marker analysis in PLINK after FDR corrections (praw = 0.002, pgenome = 0.9971). This marker is about 0.59Mb away on CFA27 from SNP BICF2G6301380 which has the smallest raw P value in the analyses (praw = 2.39E-05). By checking the LD structure for SNPs on CFA27 using Haploview, BICF2G630137913 did not show strong LD with nearby SNPs (r2 = 0.2 with the left SNP BICF2G630137904 or the right SNP BICF2G630137922). The average LD for BICF2G630137913 with other markers on CFA27 was very low (r2 = 0.04). It is not surprising to find that the window containing BICF2G630137913 shows high percentage of genetic variance, but no significant association was observed between this SNP and cryptorchidism in the single marker analysis using PLINK. Supposing a true causative mutation in AMHR2 is not in high LD with SNP BICF2G630137913, the single SNP analysis will not help to capture the association with the studied disease. Therefore, the cumulative effects of 1Mb SNP windows or windows with integration of haplotype information would facilitate detection of regions harboring the causative loci that may not be apparent in single SNP analyses. 135 To confirm whether AMHR2 is a causative locus for cryptorchidism, expression array studies and/or fine mapping by sequencing the whole gene are necessary. The case-control analysis using single SNP tests in the PLINK program is a frequentist approach by calculating a p-value for the null hypothesis of no association. Multiple testing corrections like FDR and Bonferroni adjustment are usually performed (Noble 2009). The sample size and minor allele frequency will strongly affect the results using this approach (Stephens and Balding 2009). In this study, there is no significant association remaining after FDR corrections for SNPs on the BeadChip with the given disease in any data sets. The relatively small sample size would limit the power of detecting associated markers or regions for a complex disease. In this study, Bayesian approaches with SNP or haplotype windows were also used to assess evidence for an association between genomic regions containing multiple genetic variants and cryptorchidism. The 1Mb window approach accounts for linkage disequilibrium among adjacent SNP and accumulates the effects from multiple genetic variants in any particular region. This method reduces the chance of spurious associations that can easily occur from a single locus analysis (Balding, 2006). Incorporating biological information of genes with GWAS analyses may also help find causative gene loci. Here we found a potential gene AMHR2 that has a known biological function on sex differentiation and is plausibly related to the development of cryptorchidism. It is known that anti-mullerian hormone (AMH) is one of the hormones mediating male sex differentiation by regression of mullerian ducts which would otherwise differentiate into the uterus and fallopian tubes (Behringer 1994). Double-knockout mouse models for AMH and AMHR2 suggest that AMH is the only ligand of the AMHR2 receptor (Mishina 136 et al. 1996). Mutations in AMH and AMHR2 have been found to cause persistent mullerian duct syndrome (PMDS), a rare form of male pseudohermaphroditism (Imbeaud et al. 1995; Messika-Zeitoun et al. 2001). There are 60-70% patients with PMDS that have their testes located in the abdomen at a position analogous to the ovaries without descending into the normal scrotal destinations (Clarnette et al. 1997). Many PMDS patients have been found with the absence of gubernacular structures or feminized gubernaculum (Clarnette et al. 1997). The disruptions of AMH or AMHRs may cause human cryptorchidism by preventing transabdominal descent through failure of the gubernacular swelling reaction (Clarnette et al. 1997). It was reported that a boy carried a heterozygous WT1 nonsense mutation had bilateral cryptorchidism, nystagmu and Wilms’ tumor. The nonsense mutation leads to a truncated protein by losing three zincfinger domains for DNA binding (Terenziani et al. 2009). The cDNA microarray analyses on Wt1 (Wilms’ tumor suppressor gene) knockout mice have shown that inactivation of the Wt1 gene dramatically down-regulates gene expression of Amkr2. This shows that Amkr2 is a target gene of Wt1 when it plays the role in urogenital and gonadal development during sex differentiation (Klattig et al. 2007). Therefore, the WT1, AMHR2 and AMH genes might be within the same pathway to regulate sex differentiation. Gene disruptions of one of these genes could prompt the development of undescended testicles in males. It is known that HOX genes are a group of related genes determining the regional identity of an embryo along the anterior- posterior axia and the type of segment structure (Joseph et al. 2005). The gubernaculum is central to testicular descent, with evidence suggesting that it elongates to the scrotum like a limb bud (Huynh et al. 2007). The genes HOXA10 137 and HOXA11 involved in limb bud outgrowth are expressed within the gubernaculum (Yuan et al. 2006), and these two genes have been found to play a role in the developing genito-urinary tract. The Hoxa10-/- and Hoxa11-/- mutant male mice present cryptorchidism (Satokata et al. 1995; Hsieh-Li et al. 1995). Disrupted gene functions or ectopic expression of some of the Hoxc genes such as Hoxc4, Hoxc6 or Hoxc8 have been considered being responsible for the lack of hindlimb phenotypes in chickens (Nelson et al. 1996). However, there is no sufficient research evidence to show relationships between any of the HOX-C genes with cryptorchidism. In conclusion, the important findings in this present GWAS are (i) the AMHR2 gene was identified in a putative candidate region associated with canine cryptorchidism by using 1Mb window approach and Bayesian inference (ii) different chromosomal regions and thus different genes appear to be associated with the disease in dogs from pedigrees from different geographical regions. Population structure clustering will help correct for stratification; (iii) cryptorchidism is a complex disease and shows genetic heterogeneity; (iv) single SNP analyses may not capture the genomic regions harboring the causative loci when there is not enough LD between markers on the chip nearby the causative mutations. The 1Mb window approach along with BayesB analysis may help to resolve this problem; (v) whole genome sequencing for pairs of affected and unaffected dogs from the same family may facilitate finding causative mutations without needing to consider LD structure and disease heterogeneity. Software BEAGLE: http://faculty.washington.edu/browning/beagle/beagle.html Blossoc: http://www.daimi.au.dk/~mailund/Blossoc/index.html GenSel: http://bigs.ansci.iastate.edu 138 Haploview: http://www.broadinstitute.org/scientific-community/science/programs/medical-andpopulation-genetics/haploview/haploview PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/ STRUCTURE: http://pritch.bsd.uchicago.edu/structure.html Funding This work was supported by the AKC Canine Health Foundation (Grant No. 1248) and State of Iowa Funds. ACKNOWLEDGEMENTS The help of sample collections provided by Sheila E. (Blanker) Morrissey (DVM), individual dog breeders and owners, Eddie Dziuk (Chief Operating Officer, CHIC), Caroline Kisko (secretary, Kennel Club in UK) and Dr. Nancy Bartol (Chinook Club of America) is appreciated. We also thank Ziqing Weng for helping in utilizing BEAGLE program. REFERENCES Amann RP, Veeramachaneni DN, 2007. Cryptorchidism in common eutherian mammals. Reproduction. 133:541-61.Balding DJ, 2006. A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791. Barrett JC, Fry B, Maller J, Daly MJ, 2005. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 21:263-5. Bashir MS, Wells M, 1995. Müllerian inhibiting substance. J Pathol. 176:109-10. Behringer RR, Finegold MJ, Cate RL, 1994. Müllerian-inhibiting substance function during mammalian sexual development. Cell. 79:415-25. Behringer RR, 1994. The in vivo roles of müllerian-inhibiting substance. Curr Top Dev Biol. 29:171-87. Review. Berkowitz GS, Lapinski RH, Dolgin SE, Gazella JG, Bodian CA, Holzman IR. Prevalence and natural history of cryptorchidism, 1993. Pediatrics. 92:44-9. 139 Bogatcheva NV, Truong A, Feng S, Engel W, Adham IM, Agoulnik AI, 2003. GREAT/LGR8 is the only receptor for insulin-like 3 peptide. Mol Endocrinol. 17:263946. Browning SR, Browning BL, 2007. Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. Am J Hum Genet. 81:1084-97. Burton M, Ramsay E, 1986. Cryptorchidism in Maned Wolves. J Zoo An Med. 17:133-5. Cardon LR, Palmer LJ, 2003. Population stratification and spurious allelic association. Lancet. 361:598-604. Review. Carizza C, Antiba A, Palazzi J, Pistono C, Morana F, Alarcón M, 1990. Testicular maldescent and infertility. Andrologia. 22:285-8. Clarnette TD, Hutson JM, 1999. Exogenous calcitonin gene-related peptide can change the direction of gubernacular migration in the mutant trans-scrotal rat. J Pediatr Surg. 34:1208-12. Clarnette TD, Sugita Y, Hutson JM, 1997. Genital anomalies in human and animal models reveal the mechanisms and hormones governing testicular descent. Br J Urol. 79:99-112. Cox JE, Edwards GB, Neal PA, 1979. An analysis of 500 cases of equine cryptorchidism. Equine Vet J. 11:113-6. Cox VS,Wallace LJ, Jessen CR, 1978. An anatomic and genetic study of canine cryptorchidism. Teratology. 18:233-40. Davis-Dao C, Koh CJ, Hardy BE, Chang A, Kim SS, De Filippo R, Hwang A, Pike MC, Carroll JD, Coetzee GA, 2012. Shorter androgen receptor CAG repeat lengths associated with cryptorchidism risk among Hispanic white boys. J Clin Endocrinol Metab. 97:E3939. Donaldson KM, Tong SY, Washburn T, Lubahn DB, Eddy EM, Hutson JM, Korach KS, 1996. Morphometric study of the gubernaculum in male estrogen receptor mutant mice. J Androl. 17:91-5. Dunbar MR, Cunningham MW, Wooding JB, Roth RP, 1996. Cryptorchidism and delayed testicular descent in Florida black bears. J Wildl Dis. 32:661-4. Evanno G, Regnaut S, Goudet J, 2005. Detection the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 14:2611-20. George FW, Peterson KG, 1988. Partial characterization of the androgen receptor of the newborn rat gubernaculum. Biol Reprod. 39:536-9. Gill WB, Schumacher GF, Bibbo M, Straus FH 2nd, Schoenberg HW, 1979. Association of diethylstilbestrol exposure in utero with cryptorchidism, testicular hypoplasia and semen abnormalities. J Urol. 122:36-9. Gorlov IP, Kamat A, Bogatcheva NV, Jones E, Lamb DJ, Truong A, Bishop CE, McElreavey K, Agoulnik AI, 2002. Mutations of the GREAT gene cause cryptorchidism. Hum Mol Genet. 11:2309-18. 140 Griffiths AL, Middlesworth W, Goh DW, Hutson JM, 1993. Exogenous calcitonin generelated peptide causes gubernacular development in neonatal (Tfm) mice with complete androgen resistance. J Pediatr Surg. 28:1028-30. Hadziselimovic NO, de Geyter Ch, Demougin P, Oakeley EJ, Hadziselimovic F, 2010. Decreased expression of FGFR1, SOS1, RAF1 genes in cryptorchidism. Urol Int. 84:35361. Hou JW, Tsai WY, Wang TR, 1999. Detection of KAL-1 gene deletion with fluorescence in situ hybridization. J Formos Med Assoc. 98:448-51. Hsieh-Li HM, Witte DP, Weinstein M, Branford W, Li H, Small K, Potter SS, 1995. Hoxa 11 structure, extensive antisense transcription, and function in male and female fertility. Development. 121:1373-85. Hughes IA, Acerini CL, 2008. Factors controlling testis descent. Eur J Endocrinol. 159 Suppl 1:S75-82. Hutson JM, Hasthorpe S, 2005. Abnormalities of testicular descent. Cell Tissue Res. 322:155-8. Review. Hutson JM, 1985. A biphasic model for the hormonal control of testicular descent. Lancet. 24:419-21. Huynh J, Shenker NS, Nightingale S, Hutson JM, 2007. Signalling molecules: clues from development of the limb bud for cryptorchidism? Pediatr Surg Int. 23:617-24. Imbeaud S, Faure E, Lamarre I, Mattéi MG, di Clemente N, Tizard R, Carré-Eusèbe D, Belville C, Tragethon L, Tonkin C, 1995. Insensitivity to anti-müllerian hormone due to a mutation in the human anti-müllerian hormone receptor. Nat Genet. 11:382-8. Kaftanovskaya EM, Huang Z, Barbara AM, De Gendt K, Verhoeven G, Gorlov IP, Agoulnik AI, 2012. Cryptorchidism in mice with an androgen receptor ablation in gubernaculum testis. Mol Endocrinol. 26:598-607. Klattig J, Sierig R, Kruspe D, Besenbeck B, Englert C, 2007. Wilms' tumor protein Wt1 is an activator of the anti-Müllerian hormone receptor gene Amhr2. Mol Cell Biol. 27:4355-64. Klonisch T, Fowler PA, Hombach-Klonisch S, 2004. Molecular and genetic regulation of testis descent and external genitalia development. Dev Biol. 270:1-18. Kubota Y, Temelcos C, Bathgate RA, Smith KJ, Scott D, Zhao C, Hutson JM, 2002. The role of insulin 3, testosterone, Müllerian inhibiting substance and relaxin in rat gubernacular growth. Mol Hum Reprod. 8:900-5. Leader-Williams N, 1979. Abnormal testes in reindeer, Rangifer tarandus. J Reprod Fertil. 57:127-30. Mailund T, Besenbacher S, Schierup MH, 2006. Whole genome association mapping by incompatibilities and local perfect phylogenies. BMC Bioinformatics. 7:454. Messika-Zeitoun L, Gouédard L, Belville C, Dutertre M, Lins L, Imbeaud S, Hughes IA, Picard JY, Josso N, di Clemente N, 2001. Autosomal recessive segregation of a truncating mutation of anti-Müllerian type II receptor in a family affected by the 141 persistent Müllerian duct syndrome contrasts with its dominant negative activity in vitro. J Clin Endocrinol Metab. 86:4390-7. Mishina Y, Rey R, Finegold MJ, Matzuk MM, Josso N, Cate RL, Behringer RR, 1996. Genetic analysis of the Müllerian-inhibiting substance signal transduction pathway in mammalian sexual differentiation. Genes Dev. 10:2577-87. Nef S, Parada LF, 1999. Cryptorchidism in mice mutant for Insl3. Nat Genet. 22:295-9. Nelson CE, Morgan BA, Burke AC, Laufer E, DiMambro E, Murtaugh LC, Gonzales E, Tessarollo L, Parada LF, Tabin C, 1996. Analysis of Hox gene expression in the chick limb bud. Development. 122:1449-66. Nightingale SS, Western P, Hutson JM, 2008. The migrating gubernaculum grows like a "limb bud". J Pediatr Surg. 43:387-90. Noble WS, 2009. How does multiple testing correction work? Nat Biotechnol. 27:1135-7. Pask AJ, Kanasaki H, Kaiser UB, Conn PM, Janovick JA, Stockton DW, Hess DL, Justice MJ, Behringer RR, 2005. A novel mouse model of hypogonadotrophic hypogonadism: N-ethyl-N-nitrosourea-induced gonadotropin-releasing hormone receptor gene mutation. Mol Endocrinol. 19:972-81. Pearson JC, Lemons D, McGinnis W, 2005. Modulating Hox gene functions during animal body patterning. Nat Rev Genet. 6:893-904. Review. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P, 2000. Association mapping in structured populations. Am J Hum Genet. 67:170-81. Przewratil P, Paduch DA, Kobos J, Niedzielski J, 2004. Expression of estrogen receptor alpha and progesterone receptor in children with undescended testicle previously treated with human chorionic gonadotropin. J Urol. 172:1112-6. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81:559-75. Radpour R, Rezaee M, Tavasoly A, Solati S, Saleki A, 2007. Association of long polyglycine tracts (GGN repeats) in exon 1 of the androgen receptor gene with cryptorchidism and penile hypospadias in Iranian patients. J Androl. 28:164-9. Rothschild MF, Christian LL, Blanchard W, 1988. Evidence for multigene control of cryptorchidism in swine. J Hered. 79:313-4. Sahana G, Mailund T, Lund MS, Guldbrandtsen B, 2011. Local genealogies in a linear mixed model for genome-wide association mapping in complex pedigreed populations. PLoS One.6:e27061. Sarfati J, Dodé C, Young J, 2010. Kallmann syndrome caused by mutations in the PROK2 and PROKR2 genes: pathophysiology and genotype-phenotype correlations. Front Horm Res. 39:121-32. Review 142 Satokata I, Benson G, Maas R, 1995. Sexually dimorphic sterility phenotypes in Hoxa10deficient mice. Nature. 374:460-3. Singh A, Ezeasor D, 1989. Ultrastructure of Sertoli cells in scrotal testis of unilaterally cryptorchid goat. Prog Clin Biol Res.296:159-64. Smith KC, Brown PJ, Morris J, Parkinson TJ, 2007. Cryptorchidism in North Ronaldsay sheep. Vet Rec. 161:658-9. Spencer JR, Torrado T, Sanchez RS, Vaughan ED Jr, Imperato-McGinley J, 1991.Effects of flutamide and finasteride on rat testicular descent. Endocrinology. 129:741-8. Stephens M, Balding DJ, 2009. Bayesian statistical methods for genetic association studies. Nat Rev Genet. 10:681-90. St Jean G, Gaughan EM, Constable PD, 1992. Cryptorchidism in North American cattle: breed predisposition and clinical findings. Theriogenology. 38:951-8. Terenziani M, Sardella M, Gamba B, Testi MA, Spreafico F, Ardissino G, Fedeli F, Fossati-Bellani F, Radice P, Perotti D, 2009. A novel WT1 mutation in a 46,XY boy with congenital bilateral cryptorchidism, nystagmus and Wilms tumor. Pediatr Nephrol. 24:1413-7. Thonneau P, Bujan L, Multingner L, Mieusset R, 1998. Occupational heat exposure and male fertility: a review. Hum Reprod. 13: 2122. Thonneau PF, Gandia P, Mieusset R, 2003. Cryptorchidism: incidence, risk factors, and potential role of environment; an update. J Androl. 24:155-62. Review. Wada Y, Okada M, Hasegawa T, Ogata T, 2005. Association of severe micropenis with Gly146Ala polymorphism in the gene for steroidogenic factor-1. Endocr J. 52:445-8. Wang J, Wang B, 2002. Study on risk factors of cryptorchidism Zhonghua Liu Xing Bing Xue Za Zhi. 23:190-3. Wang Y, Barthold J, Figueroa E, González R, Noh PH, Wang M, Manson J, 2008. Analysis of five single nucleotide polymorphisms in the ESR1 gene in cryptorchidism. Birth Defects Res A Clin Mol Teratol. 82:482-5. Wang Y, Barthold J, Kanetsky PA, Casalunovo T, Pearson E, Manson J, 2007. Allelic variants in HOX genes in cryptorchidism. Birth Defects Res A Clin Mol Teratol. Apr;79(4):269-75. Yates D, Hayes G, Heffernan M, Beynon R, 2003. Incidence of cryptorchidism in dogs and cats. Vet Rec. 152:502-4. Yoshida R, Fukami M, Sasagawa I, Hasegawa T, Kamatani N, Ogata T, 2005. Association of cryptorchidism with a specific haplotype of the estrogen receptor alpha gene: implication for the susceptibility to estrogenic environmental endocrine disruptors. J Clin Endocrinol Metab. 90:4716-21. Yuan FP, Lin DX, Rao CV, Lei ZM, 2006. Cryptorchidism in LhrKO animals and the effect of testosterone-replacement therapy. Hum Reprod. 21:936-42. Zhao X, Du ZQ, Rothschild MF, 2010. An association study of 20 candidate genes with cryptorchidism in Siberian Husky dogs. J Anim Breed Genet. 127:327-31. 143 Zimmermann S, Schöttler P, Engel W, Adham IM, 1997. Mouse Leydig insulin-like (Ley I-L) gene: structure and expression during testis and ovary development. Mol Reprod Dev. 47:30-8. Zimmermann S, Steding G, Emmen JM, Brinkmann AO, Nayernia K, Holstein AF, Engel W, Adham IM, 1999. Targeted disruption of the Insl3 gene causes bilateral cryptorchidism. Mol Endocrinol. 13:681-91. Zuccarello D, Morini E, Douzgou S, Ferlin A, Pizzuti A, Salpietro DC, Foresta C, Dallapiccola B, 2004. Preliminary data suggest that mutations in the CgRP pathway are not involved in human sporadic cryptorchidism. J Endocrinol Invest. 27:760-4. 144 Figure 1. Re-grouping dogs based on the value of cluster membership estimated from STRUCTURE. A. Summary plot of estimated Q values (cluster membership coefficients) for each individual dog collected from USA, UK and Canada. Each individual is represented by a single vertical line broken into K colored segments, with lengths proportional to each of the K inferred clusters. B. Identification of the most possible estimate of K calculated from 47,672 tagging SNPs. Delta K was calculated for K=1 through K= 6 under a model assuming admixture. K=3 is the best estimate based on ∆K statistics on the plot. C. Dogs were assigned into 4 subgroups based on Q values when K=3. The re-grouping of dogs is consistent with the dogs’ geographical distributions. SH: Siberian Husky; CHI: Chinook. 145 Figure 2. Whole genome association analyses for subgroup 1 comprising 129 mostly USA sourced (60 cases and 69 controls) Siberian Huskies. A. Shows results of single marker analyses in PLINK before false discovery rate (FDR) implementation. B. Depicts results from the GenSel 1Mb SNP window analyses (SNP model). Each spot is a 1Mb window containing a group of consecutive SNPs. C. Display results using haplotypes and SNP genotypes together in GenSel (SNP & haplotype models). Each spot is a 1Mb window containing a group of consecutive SNPs and haplotypes. The red breaking arrows link SNPs in PLINK to their 1Mb SNP windows with genetic variance being over 0.40% (10 folds than the expected value). The blue breaking arrows show windows repeatedly identified with high variance (> 0.20%, 5 folds than the expected value) when haplotypes were integrated into the analyses in The most promising putative regions for cryptorchidism in this GenSel. subpopulation are on CFA27, 1 and 14. 146 Figure S1. Whole genome association analyses for subgroup 2 comprising 41 mixed sourced (26 cases and 15 controls) Siberian Huskies. A. Shows results of single marker analyses in PLINK before false discov discovery ery rate (FDR) implementation. B. Depicts results from the GenSel 1Mb SNP window analyses (SNP model). Each spot is a 1Mb window containing a group of consecutive SNPs. C. Displays results using haplotypes and SNP genotypes together in GenSel (SNP & haplotype models). Each spot is a 1Mb window containing a group of consecutive SNPs and haplotypes. The red breaking arrows link SNPs in PLINK to their 1Mb SNP windows with genetic variance being over 0.40% (10 folds than the expected value). The blue breaking arrows show windows repeatedly identified with high variance (> 0.20%, 5 folds than the expected value) when haplotypes were integrated into the analyses in GenSel. The most promising putative regions for cryptorchidism are on CFA4 in subgroup 2. 147 Figure S2. Whole genome association analyses for subgroup 3 comprising 35 mostly UK sourced (20 cases and 15 controls) Siberian Huskies. A. Shows results of single marker analyses in PLINK before false discov discovery ery rate (FDR) implementation. B. Depicts results from the GenSel 1Mb SNP window analyses (SNP model). Each spot is a 1Mb window containing a group of consecutive SNPs. C. Displays results using haplotypes and SNP genotypes together in GenSel (SNP & haplotype models). Each spot is a 1Mb window containing a group of consecutive SNPs and haplotypes. The red breaking arrows link SNPs in PLINK to their 1Mb SNP windows with genetic variance being over 0.40% (10 folds than the expected value). The blue breaking arrows show windows repeatedly identified with high variance (> 0.20%, 5 folds than the expected value) when haplotypes were integrated into the analyses in GenSel. The most promising putative region for cryptorchidism is on CFA31 in subgroup 3. 148 Figure S3. Whole genome association analyses for a pooled sample of 204 Siberian Huskies in which 105 were diagnosed with cyptorchidism. A. Shows results of single marker analyses in PLINK before false discov discovery ery rate (FDR) implementation. B. Depicts results from the GenSel 1Mb SNP window analyses (SNP model). Each spot is a 1Mb window containing a group of consecutive SNPs. C. Displays results using haplotypes and SNP genotypes together in GenSel (SNP & haplotype models). Each spot is a 1Mb window containing a group of consecutive SNPs and haplotypes. The red breaking arrows link SNPs in PLINK to their 1Mb SNP windows with genetic variance being over 0.40% (10 folds than the expected value). The blue breaking arrows show windows repeatedly identified with high variance (> 0.20%, 5 folds than the expected value) when haplotypes were integrated into the analyses in GenSel. Windows on CFA9, 27 and X were identified as putative candidate regions for cryptorchidism in the joined population. 149 Table1. The top 1 Mb windows with the highest percentage of genetic variance obtained from BayesB analyses in different Siberian Husky subgroups 1Mb window Group SNP model SNP & haplotype (%) models (%) Subgroup 1 1_50 0.73 0.22 (60 A & 69 U) 3_86 0.44 0.08 5_8 0.54 0.16 5_10 0.47 0.07 5_12 0.68 0.06 5_16 1.04 0.05 14_50* 0.4 0.22 18_11 0.55 0.18 27_4* 0.42 0.34 27_5* 3.24 0.53 27_6* 0.59 0.22 27_18 0.42 0.07 Subgroup 2 4_75* 0.56 0.23 (26 A & 15 U) 27_13 0.58 0.16 35_26 0.4 0.02 Subgroup 3 24_37 0.39 0.04 (20 A & 15 U) 25_14 0.47 0.14 31_18* 0.52 0.33 Pooled 3_87 0.42 0.15 (subgroup1, 6_62 0.45 0.16 subgroup 2 and 9_56* 1.42 0.27 subgroup 3) 24_31 0.55 0.11 (106 A & 99 U) 24_36 0.55 0.16 27_13* 1.14 0.21 27_18 0.43 0.14 X_31* 0.71 0.31 X_33* 1.1 0.39 X_34 0.43 0.15 X_35 0.53 0.14 *These windows were considered as putative candidate regions for cryptorchidism with genetic variance over 0.40% (10 fold than the expected value) in the 1Mb SNP window analyses (SNP model) and repeatedly being with over 0.20% (5 fold than the expected value) for the 1Mb window analyses by integrating with haplotypes (SNP & haplotype models). The positional candidate genes were listed in the supplementary Table S2, S3 &S4. Window (Chr_Mb) Table 2. The results of BayesB analyses on 17 previously reported genes being associated with cryptorchidism Gene symbol Chromosom e (CFA) Gene name CFA1 Estrogen receptor 1 NR5A1 (SF1) CFA9 GNRHR CFA13 HOXA1 0 HOXA1 1 FGFR1 CFA14 nuclear receptor subfamily 5, group A, member 1 gonadotropinreleasing hormone receptor homeobox A10 CFA14 homeobox A11 CFA16 SOS1 CFA17 WT-1 CFA 18 fibroblast growth factor receptor 1 son of sevenless homolog 1 Wilms tumor 1 INSL3 CFA20 insulin-like factor 3 AMH (MIS) RAF1 CFA20 CALCA CFA21 anti-Mullerian hormone v-raf-1 murine leukemia viral oncogene homolog 1 Calcitonin/calcitonin -related polypeptide, alpha PROKR2 CFA24 LGR8 CFA 25 COL2A1 CFA27 KAL1 CFA X CFA8 prokineticin receptor 2 Relaxin receptor 2 collagen, type II, alpha 1 Kallmann syndrome 1 sequence window (chr_Mb) Ref. Subpop1 SNP Haplo. Haplo. SNP Haplo. Pooled population SNP Haplo. Subpop2 SNP Subpop3 4512999345409811 6180475561824397 1_45 Yoshida et al. 2005 0.01 0.05 0 0.04 0.03 0.07 0.03 0.03 9_61 Wada et al. 2005 0.05 0.01 0.03 0.05 0.08 0.09 0.08 0.06 6119058861204076 13_61 Pask et al. 2005 0 0.04 0.03 0.02 0.04 0.03 0.01 0.04 4330196443304348 4331298143315443 2999143430031766 3411260134186729 381195393167266 4807272148074397 5992629759928993 88990948975879 14_43 Satokata et al. 1996 0 0.07 0.08 0.09 0.02 0.03 0 0.05 14_43 Hsieh-Li et al. 1995 0 0.07 0.08 0.09 0.02 0.03 0 0.05 16_30 0.15 0.17 0.12 0.06 0.03 0.05 0.14 0.03 0 0.02 0.07 0.01 0.02 0.03 0.04 0.06 18_38 Hadziselimovic et al. 2010 Hadziselimovic et al. 2010 Terenziani et al. 2008 0.09 0.11 0.08 0.04 0.03 0.05 0.04 0.13 20_48 Nef and Parada 1999 0.12 0.1 0.04 0.04 0.1 0.03 0.09 0.07 20_59 Behringer et al. 1994 0.02 0.05 0.03 0.04 0 0.03 0.02 0.04 20_8 Hadziselimovic et al. 2010 0.02 0.04 0.04 0.05 0.06 0.05 0.1 0.06 4110482241108840 21_41 Griffiths et al. 1993; Clarnette and Hutson 1999) 0.03 0.02 0.04 0.04 0.06 0.04 0.14 0.03 1935153719359239 1129467111340203 97685119799248 53699415428059 24_19 Sarfati et al. 2010 0.01 0.03 0.07 0.09 0.02 0.06 0 0.04 25_11 Tomyama et al. 2003 0.01 0.03 0.02 0.12 0.13 0.05 0.03 0.03 27_9 Zhao et al. 2010 0.06 0.07 0.06 0.07 0.06 0.03 0.03 0.05 X_5 Hou et al. 1999 0 0.03 0 0.06 0 0.05 0 0.01 17_34 150 ESR1 Genome position (CanFam2.0) Percentage of Genetic Variance (%) (1Mb window with BayesB approach) Table S1. Re-grouping dogs based on the value of cluster membership (K=3) K=3 Breed Subgroup 1 Subgroup 2 Subgroup 3 Subgroup 4 SH SH SH CHI No. of cases 60 26 20 6 No. of controls 69 15 15 6 Cluster1 0.709-1 0.304-0.694 0.162-0.3 0.01-0.031 Q value Cluster2 0-0.291 0.306-0.696 0.7-0.838 0-0.041 Cluster3 0-0.036 0-0.092 0-0.024 0.928-1 USA_SH 0.95 0.71 0.06 0 Geographical distribution Canada_SH UK_SH 0.05 0 0.05 0.24 0 0.94 0 0 USA_CHI 0 0 0 1 The Q value represents the estimated proportion of ancestry when 3 assumed populations were given. SH: Siberian Husky; CHI: Chinook. 151 152 Table S2. A list of genes in the putative regions for subgroup1 Window (chr_Mb) 1_50th 14_50 th 27_4th CFA Gene End (bp) 50258876 Ensembl Gene ID Gene symbol Description 1 Gene Start (bp) 50145766 ENSCAFG00000000618 ZDHHC14 zinc finger, DHHC-type containing 14 1 50286593 50286847 ENSCAFG00000028358 7SK 7SK RNA 1 50340242 50341092 ENSCAFG00000000624 Novel gene pseudogene Ensembl genes 1 50436290 50498735 ENSCAFG00000000627 SNX9 sorting nexin 9 1 50550972 50622090 ENSCAFG00000000644 SYNJ2 synaptojanin 2 1 50642818 50692684 ENSCAFG00000000647 SERAC1 serine active site containing 1 1 50711829 50723881 ENSCAFG00000000648 GTF2H5 general transcription factor IIH, polypeptide 5 1 50919150 50993873 ENSCAFG00000000652 TULP4 tubby like protein 4 1 50935969 50936076 ENSCAFG00000028188 U6 U6 spliceosomal RNA 14 50199420 50200394 ENSCAFG00000003182 NOVEL pseudogene 14 50211571 50244090 ENSCAFG00000003194 HERPUD2 14 50219951 50220703 ENSCAFG00000003199 homocysteine-responsive endoplasmic reticulum-resident ubiquitin-like domain, member 2 protein NOVEL_pseudogene 14 50373100 50456616 ENSCAFG00000003216 SEPT7L septin 7-like 14 50573856 50682597 ENSCAFG00000003227 EEPD1 14 50708853 50745061 ENSCAFG00000003232 KIAA0895 endonuclease/exonuclease/phosphatase family domain containing 1 KIAA0895 14 50773901 50815300 ENSCAFG00000003243 ANLN Actin-binding protein anillin 14 50856685 50950463 ENSCAFG00000003252 AOAH Acyloxyacyl hydrolase (neutrophil) 27 4008479 4010926 ENSCAFG00000006572 NFE2 nuclear factor (erythroid-derived 2) 27 4016380 4020656 ENSCAFG00000006580 Novel gene Uncharacterized protein 27 4038247 4050570 ENSCAFG00000006593 CBX5 chromobox homolog 5 27 4090107 4092358 ENSCAFG00000006603 SMUG1 27 4211932 4213230 ENSCAFG00000006611 HOXC4 single-strand-selective monofunctional uracilDNA glycosylase 1 homeobox C4 27 4231278 4232665 ENSCAFG00000006618 HOXC5 homeobox C5 27 4231753 4231837 ENSCAFG00000028294 cfa-mir-615 cfa-mir-615 27 4235965 4237469 ENSCAFG00000006622 HOXC6 homeobox C6 27 4254284 4256431 ENSCAFG00000006626 HOXC8 homeobox C8 27 4272335 4272444 ENSCAFG00000020562 cfa-mir-196a-2 cfa-mir-196a-2 27 4274730 4278914 ENSCAFG00000006636 HOXC10 homeobox C10 27 4288980 4291010 ENSCAFG00000025599 HOXC11 homeobox C11 27 4307265 4308845 ENSCAFG00000006645 HOXC12 homeobox C12 27 4318372 4324857 ENSCAFG00000006648 HOXC13 homeobox C13 27 4408336 4408471 ENSCAFG00000025595 Novel gene Uncharacterized protein 27 4465522 4465593 ENSCAFG00000028208 Novel gene miRNA ncRNAs 27 4471215 4473053 ENSCAFG00000006653 Novel gene Uncharacterized protein 27 4526984 4543023 ENSCAFG00000006699 CALCOCO1 calcium binding and coiled-coil domain 1 27 4618776 4707507 ENSCAFG00000006726 ATF7 activating transcription factor 7 27 4716529 4721245 ENSCAFG00000006874 TARBP2 TAR (HIV-1) RNA binding protein 2 27 4730433 4736128 ENSCAFG00000006918 MAP3K12 mitogen-activated protein kinase kinase kinase 12 153 Table S2. (continued) Window (chr_Mb) 27_5 th CFA Gene End (bp) 4760860 Ensembl Gene ID Gene symbol Description 27 Gene Start (bp) 4737476 ENSCAFG00000006821 PCBP2 poly(rC) binding protein 2 27 4771276 4771641 ENSCAFG00000006927 PRR13 proline rich 13 27 4790871 4796699 ENSCAFG00000006934 AMHR2 anti-Mullerian hormone receptor, type II 27 4808070 4846934 ENSCAFG00000006950 SP1 Sp1 transcription factor 27 4852684 4913472 ENSCAFG00000007016 PFDN5 prefoldin subunit 5 27 4886171 4888554 ENSCAFG00000025559 SP7 Sp7 transcription factor 27 4893129 4903977 ENSCAFG00000006963 AAAS 27 4904239 4909825 ENSCAFG00000006981 C12orf10 achalasia, adrenocortical insufficiency, alacrimia chromosome 12 open reading frame 10 27 4915195 4937670 ENSCAFG00000007030 ESPL1 27 4945566 4947561 ENSCAFG00000007035 MFSD5 27 4962069 4981222 ENSCAFG00000007050 RARG extra spindle pole bodies homolog 1 (S. cerevisiae) major facilitator superfamily domain containing 5 retinoic acid receptor, gamma 27 4993303 4998519 ENSCAFG00000007065 ITGB7 integrin, beta 7 27 5002178 5004869 ENSCAFG00000007070 ZNF740 zinc finger protein 740 27 5028008 5039893 ENSCAFG00000007077 CSAD cysteine sulfinic acid decarboxylase 27 5056925 5066187 ENSCAFG00000007085 SOAT2 sterol O-acyltransferase 2 27 5067628 5071493 ENSCAFG00000007097 IGFBP6 insulin-like growth factor binding protein 6 27 5084602 5098690 ENSCAFG00000007101 SPRYD3 SPRY domain containing 3 27 5100630 5115952 ENSCAFG00000007105 TENC1 27 5123868 5154188 ENSCAFG00000007117 EIF4B tensin like C1 domain containing phosphatase (tensin 2) eukaryotic translation initiation factor 4B 27 5193231 5197229 ENSCAFG00000007154 Novel gene Uncharacterized protein 27 5214844 5215090 ENSCAFG00000025408 Novel gene Uncharacterized protein 27 5225567 5233056 ENSCAFG00000007199 Novel gene Uncharacterized protein 27 5261219 5268719 ENSCAFG00000025402 KRT78 keratin 78 27 5273786 5283987 ENSCAFG00000007204 KRT79 keratin 79 27 5294126 5300384 ENSCAFG00000007208 KRT4 keratin 4 27 5403819 5415836 ENSCAFG00000025382 KRT77 keratin 77 27 5425745 5458087 ENSCAFG00000007212 K2C1_CANFA Keratin, type II cytoskeletal 1 27 5481521 5490471 ENSCAFG00000007233 KRT73 keratin 73 27 5498594 5510056 ENSCAFG00000025358 KRT72 keratin 72 27 5523441 5530483 ENSCAFG00000025346 KRT74 keratin 74 27 5540426 5547762 ENSCAFG00000007278 D6BR72_CANFA keratin 71 27 5570885 5576777 ENSCAFG00000007223 KRT5 keratin 5 27 5595688 5601257 ENSCAFG00000007251 Novel gene Uncharacterized protein 27 5613256 5618136 ENSCAFG00000007209 Novel gene Uncharacterized protein 27 5625571 5627338 ENSCAFG00000025323 Novel gene Uncharacterized protein 27 5638886 5648810 ENSCAFG00000025322 KRT75 keratin 75 27 5650991 5663478 ENSCAFG00000025318 Novel gene Uncharacterized protein 27 5667080 5678048 ENSCAFG00000007281 KRT82 keratin 82 27 5686336 5686456 ENSCAFG00000028307 5S_rRNA 5S ribosomal RNA 154 Table S2. (continued) Window (chr_Mb) 27_6th CFA Gene End (bp) 5695036 Ensembl Gene ID Gene symbol Description 27 Gene Start (bp) 5687503 ENSCAFG00000007284 KRT84 keratin 84 27 5709947 5716497 ENSCAFG00000007286 KRT85 keratin 85 27 5722652 5735581 ENSCAFG00000007293 Novel gene Uncharacterized protein 27 5749848 5755811 ENSCAFG00000007311 Novel gene Uncharacterized protein 27 5764377 5770662 ENSCAFG00000007307 Novel gene Uncharacterized protein 27 5779873 5784692 ENSCAFG00000007300 Novel gene Uncharacterized protein 27 5790522 5794208 ENSCAFG00000007318 Novel gene Uncharacterized protein 27 5808380 5814164 ENSCAFG00000007320 Novel gene Uncharacterized protein 27 5816046 5830969 ENSCAFG00000007322 KRT7 keratin 7 27 5865148 5884620 ENSCAFG00000007328 KRT80 keratin 80 27 5931294 5931569 ENSCAFG00000014784 Novel gene Uncharacterized protein 27 5934133 5936981 ENSCAFG00000007331 C12orf44 chromosome 12 open reading frame 44 27 5949972 5957895 ENSCAFG00000007338 NR4A1_CANFA 27 5996708 6003640 ENSCAFG00000025231 GRASP 27 6014634 6032573 ENSCAFG00000007357 ACVR1B Nuclear receptor subfamily 4 group A member 1 GRP1 (general receptor for phosphoinositides 1)-associated scaffold protein activin A receptor, type IB 27 6079269 6086578 ENSCAFG00000025220 ACVRL1 activin A receptor type II-like 1 27 6104064 6106685 ENSCAFG00000007345 ANKRD33 ankyrin repeat domain 33 27 6179810 6298254 ENSCAFG00000007600 SCN8A 27 6242075 6242213 ENSCAFG00000027574 U2 sodium channel, voltage gated, type VIII, alpha subunit U2 spliceosomal RNA 27 6407456 6462367 ENSCAFG00000007728 Q9GK56_CANFA Uncharacterized protein 27 6517095 6538877 ENSCAFG00000007758 GALNT6 27 6544212 6733918 ENSCAFG00000007774 CELA1_CANFA UDP-N-acetyl-alpha-Dgalactosamine:polypeptide Nacetylgalactosaminyltransferase 6 (GalNAcT6) Chymotrypsin-like elastase family member 1 27 6565180 6596945 ENSCAFG00000007778 BIN2 bridging integrator 2 27 6619721 6620699 ENSCAFG00000007785 SMAGP small cell adhesion glycoprotein 27 6622959 6626272 ENSCAFG00000007787 DAZAP2 DAZ associated protein 2 27 6656427 6671622 ENSCAFG00000007790 POU6F1 POU class 6 homeobox 1 27 6682523 6741630 ENSCAFG00000007831 TFCP2 transcription factor CP2 27 6755279 6763644 ENSCAFG00000007860 CSRNP2 cysteine-serine-rich nuclear protein 2 27 6767612 6774873 ENSCAFG00000007872 LETMD1 LETM1 domain containing 1 27 6805741 6820814 ENSCAFG00000007902 SLC11A2 27 6831874 6871607 ENSCAFG00000007924 METTL7A solute carrier family 11 (proton-coupled divalent metal ion transporters), member 2 methyltransferase like 7A 27 6924625 6954071 ENSCAFG00000007928 ATF1 activating transcription factor 1 155 Table S3. A list of genes in the putative regions for subgroup2 and subgroup3 Group Subgroup 2 Subgroup 3 Window (chr_Mb) 4_75th 31_18 th CFA Gene Start (bp) Gene End (bp) Ensembl Gene ID Gene symbol Description 4 75800049 75809381 ENSCAFG00000018745 CAPSL calcyphosine-like 4 75832871 75860982 ENSCAFG00000018752 IL7R interleukin 7 receptor 4 75896215 76060543 ENSCAFG00000018767 4 75133068 75211061 ENSCAFG00000018704 SPEF2 Q9N280_CAN FA 4 75464205 75489129 ENSCAFG00000018711 RANBP3L 4 75494657 75536151 ENSCAFG00000018721 NADKD1 4 75551591 75571161 ENSCAFG00000018725 SKP2 4 75591977 75621896 ENSCAFG00000018731 LMBRD2 sperm flagellar 2 excitatory amino acid transporter 1 RAN binding protein 3like NAD kinase domain containing 1 S-phase kinaseassociated protein 2, E3 ubiquitin protein ligase LMBR1 domain containing 2 4 75658019 75676510 ENSCAFG00000023144 Novel gene Uncharacterized protein 4 75680288 75711566 ENSCAFG00000018738 Novel gene Uncharacterized protein 4 75751766 75776502 ENSCAFG00000018743 Novel gene Uncharacterized protein 31 18049292 18049396 ENSCAFG00000021702 U6 U6 spliceosomal RNA 31 18936469 18936573 ENSCAFG00000027925 U6 U6 spliceosomal RNA 156 Table S4. A list of genes in the putative regions for all Siberian Huskies (including subgroup1, 2 and 3) Window (chr_Mb) CFA Gene Start (bp) Gene End (bp) Ensembl Gene ID Gene symbol 9_56th 56770082 56817552 ENSCAFG00000019945 ASS1 argininosuccinate synthase 1 56765474 56765884 ENSCAFG00000019944 HBXIP hepatitis B virus x interacting protein 56458287 56458469 ENSCAFG00000024407 QRFP pyroglutamylated RFamide peptide 56831407 56978021 ENSCAFG00000019950 Novel gene Uncharacterized protein 55986530 56045429 ENSCAFG00000019908 PRRC2B 56125301 56139605 ENSCAFG00000019911 PPAPDC3 56151597 56164178 ENSCAFG00000019912 FAM78A proline-rich coiled-coil 2B phosphatidic acid phosphatase type 2 domain containing 3 family with sequence similarity 78, member A 56187199 56289132 ENSCAFG00000019920 NUP214 nucleoporin 214kDa 56292948 56418185 ENSCAFG00000019926 AIF1L allograft inflammatory factor 1-like 56316975 56360237 ENSCAFG00000019927 LAMC3 laminin, gamma 3 56419461 56448416 ENSCAFG00000019930 FIBCD1 fibrinogen C domain containing 1 27_13th X_31 st X_33rd 56465925 56487400 ENSCAFG00000019938 ABL1 c-abl oncogene 1, non-receptor tyrosine kinase 56495457 56496470 ENSCAFG00000023093 Novel gene Uncharacterized protein 56612649 56621679 ENSCAFG00000019941 EXOSC2 exosome component 2 56629556 56644738 ENSCAFG00000023076 PRDM12 PR domain containing 12 56662039 56688826 ENSCAFG00000019942 FUBP3 far upstream element (FUSE) binding protein 3 56235131 56235229 ENSCAFG00000021075 U6 U6 spliceosomal RNA 13293117 13305402 ENSCAFG00000009652 TWF1 twinfilin, actin-binding protein, homolog 1 (Drosophila) 13326156 13343992 ENSCAFG00000009700 IRAK4 interleukin-1 receptor-associated kinase 4 13356837 13379958 ENSCAFG00000009729 PUS7L 13504107 13665614 ENSCAFG00000009744 ADAMTS20 13189041 13189531 ENSCAFG00000025198 Novel gene pseudouridylate synthase 7 homolog (S. cerevisiae)-like ADAM metallopeptidase with thrombospondin type 1 motif, 20 pseudogene Ensembl genes 13159473 13160267 ENSCAFG00000024514 Novel gene protein coding Ensembl genes 31935641 31936435 ENSCAFG00000024521 Novel gene protein coding Ensembl gene 31622005 31623093 ENSCAFG00000013845 PCBP1 poly(rC) binding protein 1 31853711 31854505 ENSCAFG00000024210 Novel gene Uncharacterized protein 30996456 31056775 ENSCAFG00000013840 CXorf59 chromosome X open reading frame 59 31148254 31150659 ENSCAFG00000023630 Novel gene Uncharacterized protein 31277509 31342758 ENSCAFG00000023600 CXorf30 chromosome X open reading frame 30 31820228 31821353 ENSCAFG00000023558 Novel gene Uncharacterized protein 33390642 33413474 ENSCAFG00000014051 Novel gene Uncharacterized protein 33498639 33499190 ENSCAFG00000014056 MID1IP1 MID1 interacting protein 1 33792251 33794292 ENSCAFG00000014059 Novel gene Uncharacterized protein 33006703 33055120 ENSCAFG00000014003 RPGR_CANFA X-linked retinitis pigmentosa GTPase regulator 33080945 33149321 ENSCAFG00000014023 OTC ornithine carbamoyltransferase 157 CHAPTER 6. GENERAL CONCLUSIONS Mapping causative mutations responsible for genetic diseases is the primary goal for medical geneticists. Precisely mapped genetic variants can contribute to future use of gene therapy to cure patients. Therefore, understanding the inherited mode of a given disease and utilizing a suitable mapping strategy to identify causative mutations are very important. Moreover, given that animals and humans share many genetic diseases, animal models such as dogs and sheep are useful for investigating gene functions and human diseases. By examining gene alterations in these animal models, one can map the gene loci for a given disease to help understand the causal relationship between mutations in homologous genes and the corresponding human disease. The relationship between phenotypes and genotypes The classical forward genetics approach allows researchers to locate the gene on its chromosome associated with a trait of interest. The genotype is the genetic makeup of an individual that represents a specific genetic character under consideration. Measuring the genotype of several individuals will provide information on the differences among individuals in a population. Using various mapping strategies, genotype data will help to map the causative or major genes responsible for the traits being studied. In the past several decades, a lot of efforts have been done to improve genotyping techniques in order to obtain accurate genotypes with less expense. To gain a highly successful rate of mapping traits in a population, researchers have realized that improving only the quality of genotypes is not enough when the measurement and categorization of phenotypes is not precise for quantitative and categorical traits. In this dissertation, we have successfully mapped three 158 monogenic sheep disorders (Chapters 2, 3 and 4). The prerequisites for being successful in these studies were: the accurate diagnoses of disease statuses of each sheep; and the very clear inherited mode of each disorder. Without those accomplishments, we would not have so quickly located the candidate genes, performed the fine mapping processes and eventually found the putative causative mutations. Moreover, in the canine cryptorchidism study in this dissertation (Chapter 5), dogs without one or two testicles in the scrotal sac were diagnosed as cryptorchidism. It is known that disruptions of different regulatory factors during testicular descent will lead to gonads remain in various positions inside the body, such as the abdominal cavity close to the kidneys, the locations nearby the internal/external inguinal ring or the inguinal canal. Therefore, the undescended testicle staying at different body locations in affected dogs has likely increased the complexity and heterogeneity of this disease and has made gene mapping difficult. Since different hormones regulate testicular descent in different descending phases, it is important to know exact details of cryptorchidism in each dog in order to efficiently find the associated genes for this disorder. Therefore, improving the measurement and categorization of phenotypes becomes a new challenge for many studies involved in a forward genetics approach. Comparative genomics for monogenic disorders and complex diseases Finding the genetic variations for a disease is just a start. Providing evidence to confirm that the genetic variant is a causative mutation needs a lot of work. Studies in comparative genomics, for example, biochemical or genetic studies in other species for the homologous gene will provide detailed functions of the studied genes. Gene knock-out mouse models are likely to show similar phenotypes as in affected human patients with a loss of function of 159 genes. However, many knock-out mice exhibit apparently different or no phenotypes due to different physiological systems between humans and mice. Selecting a suitable animal model is important for studying of human diseases. Similar characteristics in specific disease affected dogs, sheep or other animal species will provide an alternative choice of selecting a good model system for human diseases rather than mouse models. Comparing expression panels of genes in the same pathway between different species will give some hints for key genes affecting phenotypes of interest. For example, testicular descent is a prerequisite for normal spermatogenesis in many mammalian species. Cryptorchidism is an abnormal condition of testicular descent. It is interesting to know that elephants do not perform the process of testicular descent. The comparisons of gene expression, gene coding sequence and translated protein sequence among elephants, cryptorchid dogs and normal dogs to produce a list of genes involved in testicular descent in dogs might show some interesting findings. From candidate gene approach to whole genome scan Scientists have applied the candidate gene approach to search for causative mutations or associated DNA markers on a given trait for several decades. Biological functions of candidate genes should be clear when performing this approach. Reverse genetics by utilizing multiple techniques such as mutagenesis, gene targeting, or targeting induced local lesions in genomes (TILLING) to engineer a change or disruption in the DNA can provide information of how a gene influences phenotypes. If there is not sufficient research evidence to show gene functions, it will be hard to decide where to start in the genome to map the causative gene. Whole genome scans using different forms of genetic markers may solve 160 this problem. A SNP array based whole genome association study has been widely applied for study of disease traits. The higher the density of markers, the higher the level of resolution that can be obtained for mapping associated genes. SNPs are DNA markers with the highest density in the genome. Due to their widely distributions in the genome, one can detect association signals for almost every chromosome. From DNA marker assisted mapping to whole genome sequencing Advanced sequencing technology has pushed the reality of direct sequencing of genomic or coding sequences in a genome wide scale into the fact. Compared to traditional mapping approaches by using linkage and linkage disequilibrium, whole genome sequencing is not as abstract to understand as linkage mapping but it is straight forward because the researcher could observe almost any changes in the DNA sequences. This will help many scientists quickly search for disease loci without devoting too many efforts for preparing suitable populations for a specific marker assisted mapping approach. However, DNA sequencing generates large volumes of genomic sequence data relatively rapidly. There are too many genetic changes across the genome to be identified and that will increase the difficulty in finding the real mutation. Monogenic disorders usually have a clear mode of inheritance and the corresponding phenotypes are mostly caused by mutations that disrupt protein functions. Therefore, whole genome sequencing can be an efficient method to discover genes causing monogenic disorders. To map causative mutations and risk alleles for common complex diseases, a larger sample size is still preferred to find the common genetic variants responsible for common disease. DNA marker assisted mapping and whole genome 161 sequencing can be applied together to efficiently and economically identify genes contributing to diseases. The genetic basis underlying phenotypes is not simple for many cases. The common mutations found are likely changes in DNA sequences. However, more and more evidence shows that epigenetic changes such as changes in the pattern of methylation, or changes of histone modification play important roles in the development of specific phenotypes. Therefore, it is necessary to screen epigenetic changes in the genome in disease studies. The robust and cheaper technologies are under developed in commercial companies. A new era is expected for genetic disease mapping when powerful tools become available by integrating genetic and epigenetic knowledge for discovering causative mutations responsible for genetic disorders in humans and other species. FUTURE RESEARCH Future uses for our findings in the sheep industry in relation to the three sheep disorders are to exclude affected individuals and carriers by using genetic tests based on the mutations we discovered in the sheep populations to avoid at risk mating. Additionally, the disease affected sheep and carriers can be retained as animal models for studying of related human diseases. Gene expression and protein analyses can be done for these causative gene loci in the sheep model to further understand the gene functions involved in the normal bone and cartilage metabolisms as well as motor neuron activities. Future research plans regarding cryptorchidism should include fine mapping of the promising candidate gene AMHR2 by screening its promoter region, exons and splicing regions to discover causative mutations responsible for cryptorchidism. Gene expression analysis 162 should also be done for the AMHR2 gene using gubernaculum tissues from dogs diagnosed as affected and normal to provide solid evidence for the causative relationship between this gene and cryptorchidism. Second, collecting more Siberian Husky samples is desirable to increase the power of the experiment for studying cryptorchidism. Multiple offspring from closely related dog families will help to decipher the genetic basis for cryptorchid dogs with a similar genetic background. Third, whole genome sequencing can be done with a small group of dogs. Whole genome sequencing for pairs of affected and unaffected dogs from the same family may facilitate finding causative mutations without needing to consider LD structure and disease heterogeneity. CONCLUSIONS In conclusion, we have successfully found putative causative mutations for the three sheep monogenic diseases including a nonsense mutation in DMP1 for inherited rickets, a missense mutation in AGTPBP1 for lower motor neuron disease, and 1 base pair deletion in SLC13A1 for chondrodysplasia. This conclusion is based on 100% concordant rate of genotypes with the recessive pattern of inheritance in affected, carrier and phenotypically normal individuals, as well as implication of gene functions in other species. Moreover, by applying Bayesian inference, we identified a candidate region at the 4th Mb window on dog chromosome CFA27 associated with the development of cryptorchidism in a subgroup of our Siberian Husky population. The candidate region harbors a very promising functional candidate gene AMHR2 based on the studies for human diseases. Once significantly associated markers or causative mutations are identified in AMHR2 for cryptorchidism, this information could be quickly applied to design a genetic test for the dog 163 breeding program to cull dogs carrying the deleterious allele and to test other breeds. Moreover, our finding implicates the promising role of the AMHR2 gene during the normal testicular descent. This result will shed light on the future studies of genes in the same pathway for development of human cryptorchidism. 164 ACKNOWLEDGEMENTS The completion of my dissertation and subsequent Ph. D. has been a long journey. The first three research projects (chapters 2, 3, and 4) in my dissertation were completed smoothly even though the breaking though of the first one took me more than half a year. A large amount of time and efforts were put on the last project (chapter 5). Enormous problems rise during this study. I felt deeply sad, lost confidence on this project many times and got writer’s block before starting to write this dissertation. Eventually, I have finished my dissertation, but not alone. I could not have succeeded without the invaluable support of many people. Without these supporters, especially the select few I’m about to mention, I may not have gotten to where I am today. Firstly, special thanks go to Dr. Max F. Rothschild, my major professor, for his gentle encouragement and help in pushing me through the first to the last chapter of my dissertation; for his guidance not only on my study but also on the real life aspect of being a strong and successful person. I would also like to give a heartfelt, special thanks to Dr. Dorian Garrick, my co-major professor. His patience, flexibility, knowledgeable insight and genuine caring enabled me to complete my four research projects. I am very grateful to the remaining members of my study committee, Dr. Sue Lamont, Dr. Matthew Ellinwood and Dr. Peng Liu. Their academic support and input and personal cheering are greatly appreciated. Thanks too to the members of the Rothschild lab group (Dr. Danielle Gorbach, Dr. Zhi_Qiang Du, Dr. Suneel K. Onteru, Dr. Fan Bin, Agatha Ampaire and Muhammed 165 Walugembe) for your kind concern and support. Thanks to Dr. Rothschild’s family, especially Denise for the countless times she hosted the members of the lab group at her home. Next I would like to thank several of my good friends: Dawn Koltes, James Koltes, Xinhua Xiao and Difei Shen. They always show their warmly caring, concern and support during these years, and encourage me with their best wishes. I would want to sincerely thank my parents-in laws for their love and support. Without their help in taking care of my daughter, I would not have been able to complete my study and this dissertation. Thanks also go to my parents and younger sister’s famiy for taking responsibilities of family affairs in China over the past several years. Finally, I must thank my husband Chuanhe for his unfailing soulful love that gave me the constant inspiration to fulfill my study. He was always there cheering me up and stood by me through the good and bad times. Many thanks go to my daughter Jasmine. She is such a sweet girl and her smile is my central motivation to finish my Ph.D. study. There is no doubt that numerous people were involved in making my Ph.D. study possible. To the countless others that have been with me and helped me along the way, I apologize for not specifically mentioning you, but please know that your friendship, help and guidance is just as greatly appreciated. Thank you all.