Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IDENTIFICATION OF THE POWER-LAW COMPONENT IN HUMAN TRANSCRIPTOME Vasily V. Grinev Associate Professor Department of Genetics Faculty of Biology Belarusian State University Minsk, Republic of Belarus DIVERSITY OF SPLICE SITES IN HUMAN GENOME/TRANSCRIPTOME A graphical representation of the traditional (linear) transcriptional model (A), splice sites (B) and exon (C) splicing graphs models of human RCAN3 gene organisation DISCRETE POWER-LAW MODEL The probability mass function π© π = ππβπΆ Normalization constant π= π = ππππ π π β π·πππ π π π(π, ππππ ) Determination of the scaling parameter a value by maximum likelihood estimator for xmin ο³ 6 Hurwitz zeta function β π π, ππππ = Estimation of the lower bound xmin by Kolmogorov-Smirnov statistic (π§ + ππππ )βπΆ π§=π The cumulative distribution function βπ π πβ π+π§ π₯π§ π’=π ππ ππππ β π π Determination of the scaling parameter a value by direct numerical maximization of the likelihood function The complementary cumulative distribution function itself for x < 6 π min π π, π π π =πβ π π, ππππ π π, π π· π = π π, ππππ Important equations π(π) = βπ§π₯π§π π, ππππ β π π₯π§ππ π’=π Determination of parameters Clauset,A., Shalizi,C.R., Newman,M.N.J. (2009) Power-law distributions in empirical data. SIAM Rev., 51, 661-703. Newman,M.E.J. (2005) Power laws, Pareto distributions and Zipfβs law. Contemp. Phys., 46, 323-351. Goldstein,M.L., Morris,S.A., Yen,G.G. (2004) Problems with fitting to the power-law distribution. Eur. Phys. J. B, 41, 255-258. COMPETITIVE STATISTICAL MODELS 1) Power-law π© π = ππβπΆ 1) Log-likelihood ratio test 2) Truncated power-law π© π = ππβπΆ πβππ 3) Yule-Simon π© π =π Π(π±) Π(π± + π) 4) Exponential π© π = π’=π ππ (ππ ) ππ (ππ ) Vuong,Q.H. (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307-333. πππ = ππ€ β π ππ π³ 5) Stretched exponential βπππ· π© π = πππβπ π π© π =π π 2) Akaike information criterion ππβππ , 6) Log-normal π³π π= = π³π (ππ πβπ)π β πππ π π 7) Poisson ππ π© π =π π±! The probability mass functions of competitive statistical models Akaike,Y. (1974) A new look at the statistical model identification. IEEE Transact. Automat. Control, 19, 716-723. 3) Bayesian information criterion πππ = βπ ππ π³ + π€π₯π§(π§) Schwarz,G.E. (1978) Estimating the dimension of a model. Ann. Stat., 6, 461-464. Comparison of alternative statistical models STATISTICAL ANALYSIS CONFIRMS THE PRESENCE OF POWER-LAW COMPONENT IN TRANSCRIPTOME OF KASUMI-1 CELLS USAGE OF EXONS IN ALTERNATIVE SPLICING FOLLOWS A POWER-LAW IN HUMAN TRANSCRIPTOME USAGE OF EXONS IN ALTERNATIVE SPLICING FOLLOWS A POWER-LAW IN HUMAN TRANSCRIPTOME Maximum values of splicing degrees from different models of human genes ARE THERE ANY SPECIFIC FEATURES ASSOCIATED WITH DIFFERENT CLASSES OF SPLICE SITES? Every splice site was annotated with sequence, sequence-related, functional and structural features which were extracted from four types of the genomic/RNA elements RANDOM FOREST BASED DATA MINING A small set of features allows distinguish between two classes of splice sites in Kasumi-1 cells RANDOM FOREST BASED DATA MINING Iterative removing of misclassified splice sites leads to high accuracy of classification RANDOM FOREST BASED DATA MINING About half of misclassified cases of splice sites can be explained by some different ways MANY THANKS TO THE MEMBERS OF OUR TEAM: Ilia M. Ilyushonak Dr. Petr V. Nazarov Dr. Laurent Vallar Northern Institute for Cancer Research Prof. Olaf Heidenreich THANK YOU FOR ATTENTION!