Download click here and type title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Biometric Society
A NEW APPROACH FOR STATISTICAL CLASSIFICATION AND VISUALIZATION
FOR LONGITUDINAL TEXT DATA
Shizue Izumi1 and Kenichi Satoh2
1 Department of Computer Science and Intelligent Systems, Oita University, Japan
2 Research Institute for Radiation Biology and Medicine, Hiroshima University, Japan
One of big data is longitudinally observed text data. Extraction of the time-varying trends of
keyword appearance and its classification can summarize the changes of characteristics in
longitudinal text data. Satoh and Tonda (2013) proposed the method of estimating
semiparametric varying coefficients using a mixed effects model. Here we propose a new
approach for statistical classification using a semiparametric regression model for the
keyword appearance in the longitudinally observed text data. And we suggest to visualize
the time-varying trends of keyword appearance using summary of predictors.
Keywords: Text mining, Summarization, Semiparametric model, Multi dimensional scaling.
References
[1] R.Agrawal, T.Imielinski and A.Swami. (1993). Mining association rules between sets of
items in large databases. In Proceedings of the ACM SIGMOD International Conference
on Management of Data, 207-216, Washington D.C.
[2] C.Borgelt and R.Kruse. (2002). Induction of Association Rules: Apriori Implementation.
15th Conference on Computational Statistics (COMPSTAT 2002, Berlin, Germany)
Physica Verlag, Heidelberg, Germany.
[3] C.Borgelt. (2003). Efficient Implementations of Apriori and Eclat. Workshop of Frequent
Item Set Mining Implementations (FIMI 2003, Melbourne, FL, USA).
[4] B.A. Brumback, D.Ruppert and M.P.Wand. (1999). Variable selection and function
estimation in additive nonparametric regression using a data-based prior: Comment.
Journal of American Statistical Association, 94, 794-797.
[5] Y.Ishibashi, A.Hara, I.Okayasu, K.Kurihara. (2011).Development of histopathological
information database system which enables image retrieval using image features and text
information. Computational Statistics, 24, 3-21.
[6] S.Izumi, K.Satoh and N.Kawano. Statistical classification and visualization based on
varying coefficients model for longitudinal text data. (submitted)
[7] D.Ruppert, M.P.Wand and R.J.Carroll. (2003). Semiparametric Regression, Cambridge
University Press.
[8] K.Satoh and T.Tonda. (2013) Statistical inference of semiparametric varying coefficients
using mixed effects model.Japanese Journal of Applied Statistics, 42(1), 1–10.
[9] P.H.A.Sneath. (1957). Some thoughts on bacterial classification. Journal of General
Microbiology, 17, 184-200.
[10] H.Wakimori. (2013). Textmining technique and its applications for big data. UNISYS
Technology Review, 32(4), 19-31. (2013)
International Biometric Conference, Florence, ITALY, 6 – 11 July 2014
Related documents