Download Improving Intergenic miRNA Target Genes Prediction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Long non-coding RNA wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Pathogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Transposable element wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene therapy wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Essential gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA silencing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene nomenclature wikipedia , lookup

RNA interference wikipedia , lookup

Gene desert wikipedia , lookup

Minimal genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome evolution wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

MicroRNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Transcript
Improving Intergenic miRNA Target
Genes Prediction
Rikky Wenang Purbojati
miRNA


MicroRNA (miRNA) is a class of RNA which is believed
to play important roles in gene regulation.
It’s a short (21- to 23-nt) RNAs that bind to the 3′
untranslated regions (3′ UTRs) of target genes.
miRNA Characteristics



Short (22-25nts)
miRNA plays a major role in RNA Induced Silencing
Complex (RISC).
miRNAs control the expression of large numbers of
genes by:




mRNA degradation
Translational repression
Expression of miRNA will reduce the expression of its
target genes
Intergenic miRNA gene is located outside gene bodies
Basic miRNA problem



Finding miRNA true target genes is not a trivial task
One approach is to make a computational prediction
before validating it in wet-lab experiments
one basic challenge of miRNA:
Given a miRNA sequence, what is its target genes?
miRNA sequence target prediction

Several requirements for matching:





Strong Watson-Crick base pairing of the 5’ seed (2-8 nts)
Conservation of the miRNA binding site across species
Local miRNA-mRNA interaction with positive balance of
minimum free energy
Available tools for target genes prediction: PicTar,
TargetScan, miRanda,microT, etc.
Most tool’s prediction does not complement each other,
because they use different criteria
Problem and Opportunity

Problem:
Pure computational target genes prediction produces a
lot of candidates




Most of them are not validated
Common assumption is that most of them are false positives
Can we shorten the list to include only the strong candidates ?
Opportunity:
Lots of publicly available experimental dataset i.e. cDNA
microarray, miRNA microarray, etc.

Use the dataset to computationally invalidate some of the target
genes
Assumptions


miRNA works by silencing target genes, thus miRNA
gene and target genes should be anti-correlated
Intragenic miRNA are expressed along with the host
gene.


a host gene should be anti-correlated with a target gene
Intergenic miRNA does not have a host gene, but its real
target genes should be correlated together

The real target genes should be down-expressed whenever the
intergenic miRNA is expressed.
How to invalidate a target gene prediction


A target gene prediction can be invalidated by using a set
of microarray datasets
For Intragenic miRNA target gene:


If a target gene’s expressions has no correlation with the host
gene’s expression, we assume that the target gene does not
influenced by the host gene
For Intergenic miRNA target gene:

If a target gene behaves inconsistently compared to other
target genes, we assume that it might not be affected by the
miRNA gene
Filtering Intergenic miRNA Target Gene
Prediction



Use a combination of 8 prediction tools to produce the
initial predictions (union & intersection)
Use a collection of 190 microarray datasets to invalidate
some of the predictions
Use a greedy method to approximate the final subset of
high-confidence target genes
Consistent Target Genes


We need to establish the meaning of consistent target
genes
In this context, target gene A and target gene B is
consistent if:

For all microarray datasets in which gene A is down-regulated,
then gene B is also down-regulated
DHX9
M1
↑
M2
↓
M3
↑
M4
↓
…
Mk
↑
ASTE1
↓
↓
↓
↓
↑
C20ORF133
↓
↓
↑
↓
↓
PARP11
↑
↓
↓
↓
↓
SLC32A1
↓
↓
↓
↓
↑
PPAPDC2
↓
↓
↑
↓
↓
SCHIP1
↑
↓
↓
↓
↑
MPST
↑
↓
↓
↓
↓
Greedy Method

Given a set of target gene predictions, and a collection of
microarray dataset:

We wanted to find:


The longest subset of consistent target genes
The highest number of down-regulated target genes in the subset
Reasoning

Why we wanted to find:

The longest subset of consistent target genes?



Consistent target genes, on large number of microarray dataset with
different experiments, might indicate that they are affected by a
common factor, which may be microRNA
The longest subset ensures high probability of including the true
target genes
The highest number of down-regulated target genes in the
subset?

Since miRNA works by down-regulating target genes, it is desirable to
find the largest subset of consistently down-regulated target genes
Current Algorithm
for i = 0 to K
A <- G[i]
SigA <- signature(A)
Temp_Subset = {SigA}
down = countDownExpressedMicroarray(A)
for j = 0 to K
B <- G[j]
SigB <= signature(B)
if SigA == SigB
Temp_Subset U {SigB}
end if
end for
if (length(Temp_subset) > length(Subset)) && (down > downexpr_cnt)
subset = Temp_Subset
downexpr_cnt = down
end if
end for
Algorithm Limitations

The algorithm result might be biased based on the first
pivot gene expression signature :



Might get stuck on local maxima
Can be solved by prioritizing, sorting of target gene downexpression value, or random selection of pivot gene
The subset is an approximation of high-confidence target
genes, but it doesn’t necessarily include all real target
genes (because of supporting data limitation)
Benchmarking

Compare the performance with other prediction tools,
based on:



Number of correct predictions (based on validated target
genes)
Number of predictions
The algorithm will use an initial target predictions with:

2, 3, and 4 prediction tools support
Performance Comparison
Sensitivity-Specificity Comparison
Conclusion


In general, the approximation method shows better
sensitivity compared to other prediction tools
Specificity can be improved by including only target gene
that is supported by more than 2 prediction tools
Further Work


Adjusting the scoring function to find the optimum
balance between the length of the subset and the number
of down-regulated target genes
Implementing a threshold on target gene signaturing to
further reduce the specificity