Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Research Statement - Ian Davidson Research Interests My research allows computers to solve complex problems associated with inferring patterns from data. This places my work in the data mining (DM) and machine learning (ML) fields of artificial intelligence (AI). The larger scale contribution of my research is at two levels. Firstly, I am exploring defining new problems such as incorporating background knowledge into DM algorithms and secondly, I am applying mathematically rigorous novel computational techniques from probability, statistics, information theory and theoretical computer science to AI and DM problems. Past Accomplishments techniques. I have proposed a number of novel problems and computational • My doctoral work looked at finding alternative (orthogonal) explanations in data which I achieved using the minimum message length principle (MDL/MML) and Markov chain Monte Carlo (MCMC) sampling [2]. • At SGI I looked at visualization of DM models and their effect on data points for which I created a gravitational model paradigm that was the subject of patent applications. • With my collaborators at IBM Watson laboratories, I have been exploring predictive mining when the assumption that the data available is plentiful and statistically identical to future data is not made. This is often the case for areas such as the life sciences where, for instance, drug trial participants self select. • My most recent work on knowledge enhanced mining via constraints focuses on the important open question of how to incorporate strong background/domain knowledge into pattern recognition algorithms so as to find novel patterns that are consistent with existing domain knowledge. Short Term Future Plans Some of my short term plans are reflected by my three grant applications which leverage existing publication successes and collaborations. One of my immediate research plans is to continue working on knowledge enhanced mining which is the focus of my NSF CAREER grant. I am also interested in high impact computer science application of social importance. With collaborators at Virginia Tech’s Bioinformatics Institute (VBI) we are looking at using DM to find insights into the results of pandemic simulation data. Long Term Future Plans My longer term plans are to continue work in the areas of AI and DM as they have an enjoyable mix of theory and practice. In particular, I am interested in bridging the gap between computer science theory and application. In many fields such as physics, there is a much synergy between theoretical and applied scientists, which I believe is sometimes lacking in computer science. I would also like to explore whether the computational techniques I create for mining and AI (such as non-parametric bootstrapping [3], expected message length calculations [2] and efficient approaches to average over the space of trees [5]) can make a contribution to the general area of algorithm design. As my basic research matures, I am interested in making a contribution to their applications in areas that have social benefit. I believe my current planned work on pandemic preparedness by mining simulation results could be expanded to analysis of other micro simulations results such as bush fires, drug/molecule behavior and computer networks. 1 References [1] Berg G., Davidson I., Duan M., Paul G., , Searching For Hidden Messages: Automatic Detection of Steganography, 15th Innovative Applications of Artificial Intelligence Conference (IJCAI/IAAI), pp. 51-56, 2003. [2] Davidson, I., Minimum Message Length Clustering Using Gibbs Sampling, 16th International Conference on Uncertainty in Artificial Intelligence, (UAI), pp. 160-167, 2000. [3] Davidson I., Ensemble Approaches for Stable Learners with Convergence Bounds, 19th National Conference of the American Association for Artificial Intelligence Conference (AAAI), 2004, pp. 330-335. [4] Davidson I. and Ravi S. S., “Clustering with Constraints and the k-Means Algorithm”, 5th SIAM Data Mining Conference (Winner Best Research Paper Award) 2005. [5] Davidson I., Fan, W., When Efficient Model Averaging Out-Performs Boosting and Bagging , To Appear ECML/PKDD 2006. [6] Davidson I. and Ravi, S.S., The Complexity of Non-Hierarchical Clustering under Constraints, To Appear, Journal of Knowledge Discovery and DM, 41 pages. [7] Davidson I., Ravi S.S., Identifying and Generating Easy Sets of Constraints For Clustering, 21st AAAI Conference, 2006. [8] Davidson I., Wagstaff, K., Basu, S., Measuring Constraint-Set Utility for Partitional Clustering Algorithms, To Appear ECML/PKDD 2006. [9] Davidson I. and Paul, G., Locating Secret Messages in Images, Research Track: 10th Knowledge Discovery and DM Conference (KDD), pp. 545-550, 2004. (82 / 400+) [10] Zhang K., Fan W., Yuan X., Davidson I., and Li X., ”Forecasting Skewed Biased Stochastic Ozone Days”, IEEE ICDM 2006 (Winner Best Application Paper Award). [11] Jerrum M. and Sinclair A., The Markov chain Monte Carlo method: an approach to approximate counting and integration, in ”Approximation Algorithms for NP-hard Problems,” D.S.Hochbaum ed., PWS Publishing, Boston, 1996. [12] Satyanarayana, A. Davidson I, A Dynamic Adaptive Sampling Algorithm (DASA) for Real World Applications: Finger Print Recognition and Face Recognition. ISMIS, pp. 631-640, 2005. [13] Satyanarayana, A., DM For Large Datasets: Intelligent Sampling and Filtering, Ph.D. Thesis, SUNY-Albany, 2006 2