Download In silico fine-mapping: narrowing disease

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Human genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genetic testing wikipedia , lookup

RNA-Seq wikipedia , lookup

Minimal genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression programming wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Metagenomics wikipedia , lookup

Gene wikipedia , lookup

Koinophilia wikipedia , lookup

Pathogenomics wikipedia , lookup

Behavioural genetics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Twin study wikipedia , lookup

Human–animal hybrid wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome evolution wikipedia , lookup

Genetic engineering wikipedia , lookup

Population genetics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Heritability of IQ wikipedia , lookup

Human genetic variation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Public health genomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
BIOINFORMATICS APPLICATIONS NOTE
Vol. 21 no. 8 2005, pages 1737–1738
doi:10.1093/bioinformatics/bti209
Genetics and population analysis
In silico fine-mapping: narrowing disease-associated loci by
intergenomics
Pablo Serrano-Fernández1,∗ , Saleh M. Ibrahim1 , Dirk Koczan1 , Uwe K. Zettl2 and
Steffen Möller1
1 Institute
of Immunology and 2 Institute of Neurology, University of Rostock, Schillingallee 70, D-18057 Rostock,
Germany
Received on September 17, 2004; revised on November 19, 2004; accepted on December 3, 2004
Advance Access publication December 10, 2004
ABSTRACT
Summary: Genetic linkage and association studies define quantitative trait loci (QTLs) and susceptibility loci (SLs) that influence the
phenotype of polygenic traits. A web-accessible application was created to identify intergenomic consensuses to fine map QTLs and SLs
in silico and select particularly promising candidate genes for such
traits. Furthermore, this approach offers an empirical evaluation of
animal models for their applicability to the study of human traits.
Availability: http://qtl.pzr.uni-rostock.de/qtlmix.php
Contact: [email protected]
MOTIVATION
Genetic linkage analyses and association studies identify quantitative trait loci (QTLs) and susceptibility loci (SLs). QTLs are
chromosomal regions containing genetic sequences that influence
a continuously distributed phenotypic trait. SLs differ from QTLs
in the kind of distribution of the trait, since it is a dichotomic one
(development of the disease or not). QTLs and SLs have proven to
help identify pathways involved in the pathogenesis of polygenic
diseases (Doerge, 2002). Unfortunately, conventional approaches
for fine-mapping such QTLs/SLs are time- and cost-intensive and
experimentally complex (Rogner and Alvner, 2003; Lemon et al.,
2003). Here, we present a bioinformatic tool to prioritize particular QTL/SL subregions for the wet-lab fine-mapping process and to
preselect candidate genes jointly responsible for the trait.
If animal models are driven by the same genetic mechanisms as
those for human traits, we should expect to find conserved genetic
sequences shared by QTLs of the animal models and susceptibility regions of the human trait. This principle has already been
used to identify new SLs of certain human diseases, based on syntenic correspondence to QTLs of an animal model (Barton et al.,
2001; Xu et al., 2001). Instead, we propose a user-guided multiple
comparison of human traits and animal models of different species
at a genome-wide scale to fine map known QTLs/SLs in silico. Furthermore, the approach provides an estimation of the validity of the
selected animal models for a certain human trait or trait class. It has
already been applied for the in silico fine-mapping of SLs of multiple
sclerosis (MS) and its animal model, the experimental autoimmune
encephalomyelitis (EAE) (Serrano-Fernández, 2004).
∗ To
whom correspondence should be addressed.
SOFTWARE DESCRIPTION
The software is available as a web interface (http://qtl.pzr.unirostock.de/qtlmix.php), but for a higher performance independent
of our local computing power, a local installation is recommended. Hardware and software requirements (platform-independent)
for the latter are described in the help page (http://qtl.pzr.unirostock.de/qtlmix_help.htm).
Data sources
A local database was built with data collected from public databases
and recent genetic linkage studies of rodent QTLs for EAE, an animal
model for MS, and QTLs for collagen- and pristane-induced arthritis
(CIA, PIA), models for the human rheumatoid arthritis (RA). The
database was complemented with human MS and RA predisposition
loci assembled from recent genetic association studies. References
to the original publications are shown for each locus in the help
page (http://qtl.pzr.uni-rostock.de/qtlmix_help.htm). Genetic linkage units are standardized to physical positions with the help of a
map conversion tool (Voigt et al., 2004). The boundaries of the susceptibility regions are defined by the user whenever the original data
are related to a single peak marker. Additionally, QTLs available in
EnsEMBL (http://www.ensembl.org) are also included and the users
are offered to submit their own data. Multiple data sources may be
selected and freely combined for comparisons between species.
In silico fine-mapping
The links between organisms are queried from the EnsEMBL Compara database (Clamp et al., 2003), which comprises information
about syntenic chromosomal regions of different species. Starting
from QTL regions in one source species, any region in the second
source species (if any) with a sequence similarity above a selectable
percentage of base pairs is regarded as a syntenic region. That similarity threshold determines the restrictiveness of the fine-mapping.
Overlapping sequences and sequences separated by gaps with a
maximum size selectable by the user are merged in order to avoid
redundancy and to reduce complexity. We will call those merged syntenic sequences ‘consensus region’ and the genes comprised within
as ‘consensus genes’. In a final step, consensus regions between the
source species are checked analogously for syntenic QTL fractions
of the target species. The names and IDs of the consensus genes
displayed as output refer always to the species selected as target.
The number of consensus genes may be determined alternatively
© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
1737
P.Serrano-Fernández et al.
on the basis of the EnsEMBL pairwise gene homology, where gene
products are compared for similarity. In that case the term ‘consensus’ refers to homologous genes instead of syntenic genes. Results
can be displayed as a table, summary or figure and further analysed
with conventional EnsEMBL tools.
Segregation of consensuses
The segregation pattern of the consensuses often yields insights on
the minimum number of genes included in a QTL that are presumably influencing the trait. A consensus region torn apart into distinct
consensuses (included in non-overlapping QTLs) in another species
strongly suggests that those consensuses are independent from each
other; each of them is most probably including at least one gene
influencing the phenotype of the trait. Such cases are automatically
detected by the software.
Validation of models
Consensuses may occur by chance with a certain probability. In order
to determine that probability, the software can be fed with randomly
located QTLs of the same size as the original ones. This process is
repeated as a permutation test up to a limit determined by the user.
With an increasing number of iterations the calculated distribution
tends to fit the real random distribution. That distribution yields the
probability of finding by chance a certain number of genes in consensuses. Whenever the observed number of genes in consensuses is
above the 95th percentile (i.e. P < 0.05), it can be concluded that
the correspondence between the animal models and the human trait
is not likely to be the product of chance.
The validation process is robust towards differential gene density.
The average gene density of the chromosomal regions that serve as an
input are calculated in order to normalize the size of the QTL or susceptibility region according to the average gene density of the whole
genome for their use in the permutation test. This is particularly
critical for the MHC locus that is systematically analysed for association or genetic linkage studies and has an extraordinarily high-gene
density.
Limitations
QTLs identified with a low density of genetic markers tend to be
very large. Sparely studied animal models may thus blur the results
of the analysis of the better known ones. Therefore, for analyses
1738
involving three species, it is always recommendable to check the
performance of the consensus search and its permutation test for all
three combinations of two-way comparisons.
CONCLUSIONS
The search for intergenomic consensuses within trait-associated loci
offers a systematic approach to objectively test the degree of similarity between a human disease and its animal model. It is directly
applicable to fine map in silico SLs of every trait for which animal
models are available and genetic linkage or association analyses have
been performed. The potential of this in silico fine-mapping technique will further increase with the inclusion of additional genomes
in the future.
ACKNOWLEDGEMENTS
The developers of EnsEMBL are thanked for their extraordinary
efforts. This work was supported by the BMBF projects FKZ
01GG9831, 01GG9841 and NBL3 (FKZ 01ZZ0108).
REFERENCES
Barton,A., Eyre,S., Myerscough,A., Brintnell,B., Ward,D., Ollier,W.E.R.,
Lorentzen,J.C., Klareskog,L., Silman,A., John,S. et al. The Arthritis Rheumatism
Campaign National Repository and Worthington,J. (2001) High resolution linkage
and association mapping identifies a novel rheumatoid arthritis susceptibility locus
homologous to one linked to two rat models of inflammatory arthritis. Hum. Mol.
Genet., 10, 1901–1906.
Clamp,M., Andrews,D., Barker,D., Bevan,P., Cameron,G., Chen,Y., Clark,L., Cox,T.,
Cuff,J., Curwen,V. et al. (2003) Ensembl 2002: accommodating comparative
genomics. Nucleic Acids Res., 31, 38–42.
Doerge,R.W. (2002) Mapping and analysis of quantitative trait loci in experimental
populations. Nat. Rev. Genet., 3, 43–52.
Lemon,W.J., Swinton,C.H., Wang,M., Berbari,N., Wang,Y. and You,M. (2003) Single
nucleotide polymorphism (SNP) analysis of mouse pulmonary adenoma susceptibility loci 1-4 for identification of candidate genes. J. Med. Genet., 40, e36.
Rogner,U.C. and Alvner,P. (2003) Congenic mice. Cutting tools for complex immune
disorders. Nat. Rev. Immunol., 3, 243–252.
Serrano-Fernández,P. (2004) Intergenomic consensus in multifactorial inheritance loci:
the case of multiple sclerosis. Genes Immun., 5, 615–620.
Voigt,C., Möller,S., Ibrahim,S.M. and Serrano-Fernández,P. (2004) Non-linear conversion between genetic and physical chromosomal distances. Bioinformatics, 20,
1966–1967.
Xu,C., Dai,Y., Lorentzen,J.C., Dahlman,I., Olsson,T. and Hillert,J. (2001) Linkage
analysis in multiple sclerosis of chromosomal regions syntenic to experimental
autoimmune disease loci. Eur. J. Hum. Genet., 9, 458–463.