* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Improving Intergenic miRNA Target Genes Prediction
Genetic engineering wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Pathogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Transposable element wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Essential gene wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
RNA silencing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene nomenclature wikipedia , lookup
RNA interference wikipedia , lookup
Gene desert wikipedia , lookup
Minimal genome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome evolution wikipedia , lookup
Ridge (biology) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Microevolution wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Improving Intergenic miRNA Target
Genes Prediction
Rikky Wenang Purbojati
miRNA
MicroRNA (miRNA) is a class of RNA which is believed
to play important roles in gene regulation.
It’s a short (21- to 23-nt) RNAs that bind to the 3′
untranslated regions (3′ UTRs) of target genes.
miRNA Characteristics
Short (22-25nts)
miRNA plays a major role in RNA Induced Silencing
Complex (RISC).
miRNAs control the expression of large numbers of
genes by:
mRNA degradation
Translational repression
Expression of miRNA will reduce the expression of its
target genes
Intergenic miRNA gene is located outside gene bodies
Basic miRNA problem
Finding miRNA true target genes is not a trivial task
One approach is to make a computational prediction
before validating it in wet-lab experiments
one basic challenge of miRNA:
Given a miRNA sequence, what is its target genes?
miRNA sequence target prediction
Several requirements for matching:
Strong Watson-Crick base pairing of the 5’ seed (2-8 nts)
Conservation of the miRNA binding site across species
Local miRNA-mRNA interaction with positive balance of
minimum free energy
Available tools for target genes prediction: PicTar,
TargetScan, miRanda,microT, etc.
Most tool’s prediction does not complement each other,
because they use different criteria
Problem and Opportunity
Problem:
Pure computational target genes prediction produces a
lot of candidates
Most of them are not validated
Common assumption is that most of them are false positives
Can we shorten the list to include only the strong candidates ?
Opportunity:
Lots of publicly available experimental dataset i.e. cDNA
microarray, miRNA microarray, etc.
Use the dataset to computationally invalidate some of the target
genes
Assumptions
miRNA works by silencing target genes, thus miRNA
gene and target genes should be anti-correlated
Intragenic miRNA are expressed along with the host
gene.
a host gene should be anti-correlated with a target gene
Intergenic miRNA does not have a host gene, but its real
target genes should be correlated together
The real target genes should be down-expressed whenever the
intergenic miRNA is expressed.
How to invalidate a target gene prediction
A target gene prediction can be invalidated by using a set
of microarray datasets
For Intragenic miRNA target gene:
If a target gene’s expressions has no correlation with the host
gene’s expression, we assume that the target gene does not
influenced by the host gene
For Intergenic miRNA target gene:
If a target gene behaves inconsistently compared to other
target genes, we assume that it might not be affected by the
miRNA gene
Filtering Intergenic miRNA Target Gene
Prediction
Use a combination of 8 prediction tools to produce the
initial predictions (union & intersection)
Use a collection of 190 microarray datasets to invalidate
some of the predictions
Use a greedy method to approximate the final subset of
high-confidence target genes
Consistent Target Genes
We need to establish the meaning of consistent target
genes
In this context, target gene A and target gene B is
consistent if:
For all microarray datasets in which gene A is down-regulated,
then gene B is also down-regulated
DHX9
M1
↑
M2
↓
M3
↑
M4
↓
…
Mk
↑
ASTE1
↓
↓
↓
↓
↑
C20ORF133
↓
↓
↑
↓
↓
PARP11
↑
↓
↓
↓
↓
SLC32A1
↓
↓
↓
↓
↑
PPAPDC2
↓
↓
↑
↓
↓
SCHIP1
↑
↓
↓
↓
↑
MPST
↑
↓
↓
↓
↓
Greedy Method
Given a set of target gene predictions, and a collection of
microarray dataset:
We wanted to find:
The longest subset of consistent target genes
The highest number of down-regulated target genes in the subset
Reasoning
Why we wanted to find:
The longest subset of consistent target genes?
Consistent target genes, on large number of microarray dataset with
different experiments, might indicate that they are affected by a
common factor, which may be microRNA
The longest subset ensures high probability of including the true
target genes
The highest number of down-regulated target genes in the
subset?
Since miRNA works by down-regulating target genes, it is desirable to
find the largest subset of consistently down-regulated target genes
Current Algorithm
for i = 0 to K
A <- G[i]
SigA <- signature(A)
Temp_Subset = {SigA}
down = countDownExpressedMicroarray(A)
for j = 0 to K
B <- G[j]
SigB <= signature(B)
if SigA == SigB
Temp_Subset U {SigB}
end if
end for
if (length(Temp_subset) > length(Subset)) && (down > downexpr_cnt)
subset = Temp_Subset
downexpr_cnt = down
end if
end for
Algorithm Limitations
The algorithm result might be biased based on the first
pivot gene expression signature :
Might get stuck on local maxima
Can be solved by prioritizing, sorting of target gene downexpression value, or random selection of pivot gene
The subset is an approximation of high-confidence target
genes, but it doesn’t necessarily include all real target
genes (because of supporting data limitation)
Benchmarking
Compare the performance with other prediction tools,
based on:
Number of correct predictions (based on validated target
genes)
Number of predictions
The algorithm will use an initial target predictions with:
2, 3, and 4 prediction tools support
Performance Comparison
Sensitivity-Specificity Comparison
Conclusion
In general, the approximation method shows better
sensitivity compared to other prediction tools
Specificity can be improved by including only target gene
that is supported by more than 2 prediction tools
Further Work
Adjusting the scoring function to find the optimum
balance between the length of the subset and the number
of down-regulated target genes
Implementing a threshold on target gene signaturing to
further reduce the specificity