Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
International Biometric Society A NEW APPROACH FOR STATISTICAL CLASSIFICATION AND VISUALIZATION FOR LONGITUDINAL TEXT DATA Shizue Izumi1 and Kenichi Satoh2 1 Department of Computer Science and Intelligent Systems, Oita University, Japan 2 Research Institute for Radiation Biology and Medicine, Hiroshima University, Japan One of big data is longitudinally observed text data. Extraction of the time-varying trends of keyword appearance and its classification can summarize the changes of characteristics in longitudinal text data. Satoh and Tonda (2013) proposed the method of estimating semiparametric varying coefficients using a mixed effects model. Here we propose a new approach for statistical classification using a semiparametric regression model for the keyword appearance in the longitudinally observed text data. And we suggest to visualize the time-varying trends of keyword appearance using summary of predictors. Keywords: Text mining, Summarization, Semiparametric model, Multi dimensional scaling. References [1] R.Agrawal, T.Imielinski and A.Swami. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 207-216, Washington D.C. [2] C.Borgelt and R.Kruse. (2002). Induction of Association Rules: Apriori Implementation. 15th Conference on Computational Statistics (COMPSTAT 2002, Berlin, Germany) Physica Verlag, Heidelberg, Germany. [3] C.Borgelt. (2003). Efficient Implementations of Apriori and Eclat. Workshop of Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL, USA). [4] B.A. Brumback, D.Ruppert and M.P.Wand. (1999). Variable selection and function estimation in additive nonparametric regression using a data-based prior: Comment. Journal of American Statistical Association, 94, 794-797. [5] Y.Ishibashi, A.Hara, I.Okayasu, K.Kurihara. (2011).Development of histopathological information database system which enables image retrieval using image features and text information. Computational Statistics, 24, 3-21. [6] S.Izumi, K.Satoh and N.Kawano. Statistical classification and visualization based on varying coefficients model for longitudinal text data. (submitted) [7] D.Ruppert, M.P.Wand and R.J.Carroll. (2003). Semiparametric Regression, Cambridge University Press. [8] K.Satoh and T.Tonda. (2013) Statistical inference of semiparametric varying coefficients using mixed effects model.Japanese Journal of Applied Statistics, 42(1), 1–10. [9] P.H.A.Sneath. (1957). Some thoughts on bacterial classification. Journal of General Microbiology, 17, 184-200. [10] H.Wakimori. (2013). Textmining technique and its applications for big data. UNISYS Technology Review, 32(4), 19-31. (2013) International Biometric Conference, Florence, ITALY, 6 – 11 July 2014