Download Agaba et al - Centre for Genomic Research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oncogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Essential gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Microevolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Metagenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Minimal genome wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
A systematic, data-driven approach to the
combined analysis of microarray and QTL data
Rennie C1
2
H
Hulme
Fisher
Hall
2
P
3
L
Agaba M4
Noyes HA1
Kemp
1,4
SJ
2,5
Brass A
1 School
of
Biological
Sciences,
BioSciences
Building,
University of
Liverpool, Crown
Street, Liverpool,
L69 7ZB, UK
Abstract
High throughput technologies inevitably produce vast quantities of data. This presents challenges in terms of developing effective analysis methods, particularly where the analysis involves
combining data derived from different experimental technologies.
In this investigation, we applied a systematic approach to combine microarray gene expression data, QTL data and pathway analysis resources in order to identify functional candidate genes
underlying tolerance of Trypanosoma congolense infection in cattle (see Agaba et al poster at this conference). We automated much of the analysis using Taverna workflows previously
developed for the study of trypanotolerance in the mouse model.
We identified pathways represented by genes within the QTL regions, and subsequently ranked this list according to which pathways were over-represented in the set of genes that were
differentially expressed (over time or between tolerant N’dama and susceptible Boran breeds) at various timepoints after T. congolense infection. The genes within the QTL that played a role
in the highest-ranked pathways were flagged as strong candidates for experimental confirmation.
Background
African bovine trypanosomiasis is one of the most important diseases affecting African livestock production. West African
taurine cattle, such as the N'dama, are more resistant to the pathological consequences of trypanosomiasis (trypanotolerant)
than East African zebu cattle, such as the Boran.
A microarray timecourse experiment was carried out to investigate gene expression in N'dama and Boran cattle infected with
Trypanosoma congolense, in order to identify genes underlying trypanotolerance (see Agaba et al poster at this conference).
Trypanotolerance
Trypanotolerance is a complex phenotype involving several distinct components, likely to involve separate genetic control
mechanisms. Key features include the ability to control anaemia, control parasitaemia and maintain bodyweight. Data on
trypanotolerance QTL suggests that phenotypic traits involved in trypanotolerance may be influenced by multiple genetic loci
and possibly complex epistatic or environmental effects (Proc Natl Acad Sci USA 2003;100(13);7443-7448).
Microarray data
Microarray data for liver samples extracted from Boran and N'dama cattle at 0, 12, 15, 18, 21, 26, 29, 32 and 35 days postinfection were analysed. Outliers were identified using dChip and removed before the remaining hybridisations were
normalised using the Robust Multi-Array (RMA) method. Principal Components Analysis (PCA) was used to check that the
hybridisations clustered as expected.
T-tests were used to identify genes that were differentially expressed (p<=0.01) between the two breeds at each timepoint
and paired T-tests (using data for the same individual animals at different timepoints) were used to identify genes that were
differentially expressed (p<=0.01) within breed at any timepoint compared to day 0.
QTL
Phenotype
location
BTA2
Anaemia
BTA4
Parasitaemia
BTA7
Anaemia and parasitaemia
BTA16
Anaemia
BTA27
Anaemia
QTL data
16 trypanotolerance QTL had been identified in a previous mapping
study (Proc Natl Acad Sci USA 2003;100(13);7443-7448). 5 of
these QTL were selected based on the phenotypic trait involved, the
mapping resolution and the strength of the effect (see table on the
left for a summary of the QTL and associated phenotypes).
The base-pair positions of these QTL relative to the EnsEMBL
bovine genome preliminary build Btau2.0 were determined
manually
2 School
of
Computer
Science, Kilburn
Building,
University of
Manchester,
Oxford Road,
Manchester, M13
9PL, UK
3 Roslin
Institute,
Roslin, Midlothian,
EH25 9PS,
Scotland, UK
4 ILRI,
PO Box
30709, Nairobi,
00100, Kenya
5 Faculty
of Life
Sciences,
University of
Manchester,
Smith Building,
Oxford Road,
Manchester, M13
9PT, UK
Combined analysis approach
The gene underlying a QTL is not assumed to be differentially expressed. However, it is expected to connect biologically with
differentially expressed genes. The rationale behind this approach is to establish the possible connections.
The analysis procedure is described in Figure 1 (right). In brief, it involves mapping QTL genes and Affymetrix microarray
probes to genes in the EnsEMBL bovine preliminary build Btau2.0 then identifying KEGG pathways that include the
EnsEMBL genes. The two resulting pathway lists are compared to generate a list of KEGG pathways that include at least one
differentially expressed gene and at least one gene in the QTL. The pathway list is then ranked according to the results of a
Fisher exact test performed on the microarray data using DAVID, and annotated using literature searches and various public
databases of gene and pathway information.
Large sections of the analysis were automated (shown in blue in Figure 1) by adapting Taverna workflows previously
developed for the study of trypanosomiasis responses in mice (Nucl Acids Res 2007;35(16);5625-5633). The adaptations
required involved mapping genes to human homologues and using bovine IDs and human IDs in the analysis, rather than
murine IDs.
Results
The analysis procedure itself could be reused or adapted for studying another species or another phenotypic trait for which
QTL data are available.
In the case of the bovine trypanotolerance study, the result can be quantified in terms of the reduction of an enormous set of
potential targets for investigation to a manageable shortlist of the most likely targets. Out of 24128 probesets on the array,
12591 were significantly differentially expressed (p <= 0.01 in one or more T-tests comparing expression between breeds or
over time). 8342 of these probesets could be mapped to a known gene. In total they represented 7071 unique gene symbols.
In contrast, there were 127 genes in the QTL that were involved in pathways identified by the combined analysis protocol. If
we only include pathways with a significant (p<=0.05) score on the DAVID Fisher exact test, the list of targets is reduced to
only 51 genes (shown in the table below. Note that these results are based on an analysis with EnsEMBL bovine genome
preliminary build Btau2.0. A more recent preliminary build is available, and the analysis will be repeated, and key findings
discussed in a future publication).
Figure 1. Summary of the combined analysis procedure.
Stages of the analysis that were automated using Taverna
workflows are in blue
Discussion
Automated approaches are becoming increasingly necessary to enable researchers to handle the output from modern high-throughput technologies. Data-driven methods are useful in
studying complex phenotypes where an analysis based solely on biological processes already known to be involved may be insufficient. Pathway-based approaches provide a means to link
microarray data to QTL data in a biologically meaningful way.
Pathway-based, data-driven, systematic, semi-automated analysis approaches provide an excellent means to triage data from high-throughput technologies providing a shortlist of viable
targets for thorough manual investigation and experimental confirmation
Acknowledgements: This work was wholly supported by The Wellcome Trust. The authors would also like to thank Dr Park based in Dr McHugh’s group at University College Dublin for
sharing bovine gene symbol information for Affymetrix probes.