Download Background Information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometric software wikipedia , lookup

Business intelligence wikipedia , lookup

Data analysis wikipedia , lookup

Transcript
Development of an interactive pipeline for Genome wide
association analysis
Falola Damilare & Adigun Taiwo – Covenant University Bioinformatics research –
Nigeria ([email protected] & [email protected])
WACREN e-Research Hackfest – Lagos (Nigeria)
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement n° 654237
Outline
•
•
•
•
•
•
2
Background information
Scientific Problem, Aim and benefits
Computational & Data model
Implementation strategy
Typical user action workflow
Summary and Conclusion
Background Information
• The need for tailor healthcare and treatment
therapies to individual patients based on
their genetic make-up and other biological
features is becoming more essential in
today’s clinical practice.
• Genome Wide Association Study (GWAS)
has been applied extensively to uncover
several variations also known Single
Nucleotide Polymorphisms (SNPs) and
genes related to different diseases, traits
and clinical symptoms.
3
Background Information
•
4
Genome-wide association studies involves
• the collection of several unrelated individuals
with and without a specific trait or disease.
• the use of high-throughput genotyping
technologies to assay hundreds of thousands
of single-nucleotide polymorphisms (SNPs) of
those individuals.
• relate the genotyped SNPs using appropriate
statistical techniques e.g. Chi Square, Logistic
regression etc. to clinical conditions and
measurable traits to find what SNPs might be
associated with the disease.
Background Information
5
6
Typical GWAS workflow
Scientific Problem, Aim and benefits
•
•
•
7
A typical GWAS analysis involves
•
the use of numerous complex commands from different
languages, which makes research work complex for
researchers.
•
Use of large computing and storage resources to perform state
of art GWAS data analysis which might not be available for
most African or developing country researchers.
AIM
•
The aim of this project is to develop and implement an einfrastructure that will provide state-of-the art GWAS analysis
to local researchers. This tool will include all tools.
Benefits
•
This allows users focus mainly on the research problem, by
making the analysis process a black box technique, which will
bring about better and accurate research results.
•
This solution also brings in user interactivity providing better
visualization of results, swift comparison of results from
different types of analysis, and management of several
projects.
Computational & Data model
8
Typical user action workflow:
•
•
•
9
The main users of the system are: Public health or
medical researchers, scientists, and
bioinformaticians who have and would upload
genotype & phenotype data.
i.e. either as a raw-intensity file, for analysis starting
at the first phase or in a plink format, for analysis
starting at the second phase or a list of significant
SNPs for the third phase.
A typical GWAS analysis involves three main
phases, SNP chip genotype calling, Association
testing and Post GWAS analysis.
Typical user action workflow:
•
•
•
10
Phase 1 includes four (4) stages, which are initial
quality control, genotype calling, post-calling quality
control and conversion to plink file format.
Phase II includes four steps, they are quality
control, Population stratification correction
association testing and result visualization.
Phase III involves the annotation of the biological
significant markers we associated with the disease
phenotype in Stage II.
Implementation strategy
•
Back-end
•
•
•
•
•
•
11
Each sub stages of every phase have implemented in various
standalone bash, perl, R scripts and Java source codes.
The business logic of the system will be implemented using
Java technologies which includes: Servlets and Java Server
Pages.
Each scripts for each phase will be parallelized using
"processes input and output declarations" of NextFlow DSL
(Domain Specific Language). Complex stages like population
stratification will be put into different NextFlow pipeline scripts.
Java API for RESTful Web Services (JAX-RS) and Javscript
Object Notation (JSON) will be used to aid developers'
programmatic access to the web application
FutureGateway will be used to provide access to distributed
computing resources such as grid, cloud and HPCs.
Implementation strategy
•
Front-end
•
•
•
12
Dataset upload will be done via FTP or globus online APIs for
JAVA in to a storage element.
gLibrary will be used to manage metadata about the data.
HTML5 and JavaScript will be used for UI design.
•
styling of the interface will be done using Cascading Style
Sheet (CSS) and the system will be made mobile responsive
using the CSS 3 @media Query.
•
The database will be built using MYSQL (Relational Database
Management System) RDBMS.
Summary and conclusions
•
13
This solution makes GWAS analysis easier to
perform, by requiring limited understanding
computational needs from researchers. This allows
them to focus mainly on research problem and give
better biological interpretation to the results.
Thank you!
Special Appreciation to Abayomi
Mosaku, Bruce Becker and Mario Torrisi
sci-gaia.eu
[email protected]