Download Research Statement - Ian Davidson

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Research Statement - Ian Davidson
Research Interests My research allows computers to solve complex problems associated
with inferring patterns from data. This places my work in the data mining (DM) and machine
learning (ML) fields of artificial intelligence (AI). The larger scale contribution of my research is
at two levels. Firstly, I am exploring defining new problems such as incorporating background
knowledge into DM algorithms and secondly, I am applying mathematically rigorous novel
computational techniques from probability, statistics, information theory and theoretical
computer science to AI and DM problems.
Past Accomplishments
techniques.
I have proposed a number of novel problems and computational
• My doctoral work looked at finding alternative (orthogonal) explanations in data which
I achieved using the minimum message length principle (MDL/MML) and Markov chain
Monte Carlo (MCMC) sampling [2].
• At SGI I looked at visualization of DM models and their effect on data points for which
I created a gravitational model paradigm that was the subject of patent applications.
• With my collaborators at IBM Watson laboratories, I have been exploring predictive
mining when the assumption that the data available is plentiful and statistically identical
to future data is not made. This is often the case for areas such as the life sciences where,
for instance, drug trial participants self select.
• My most recent work on knowledge enhanced mining via constraints focuses on the important open question of how to incorporate strong background/domain knowledge into
pattern recognition algorithms so as to find novel patterns that are consistent with existing domain knowledge.
Short Term Future Plans Some of my short term plans are reflected by my three grant
applications which leverage existing publication successes and collaborations. One of my immediate research plans is to continue working on knowledge enhanced mining which is the focus
of my NSF CAREER grant. I am also interested in high impact computer science application
of social importance. With collaborators at Virginia Tech’s Bioinformatics Institute (VBI) we
are looking at using DM to find insights into the results of pandemic simulation data.
Long Term Future Plans My longer term plans are to continue work in the areas of AI
and DM as they have an enjoyable mix of theory and practice. In particular, I am interested
in bridging the gap between computer science theory and application. In many fields such as
physics, there is a much synergy between theoretical and applied scientists, which I believe is
sometimes lacking in computer science. I would also like to explore whether the computational
techniques I create for mining and AI (such as non-parametric bootstrapping [3], expected
message length calculations [2] and efficient approaches to average over the space of trees [5])
can make a contribution to the general area of algorithm design. As my basic research matures,
I am interested in making a contribution to their applications in areas that have social benefit. I
believe my current planned work on pandemic preparedness by mining simulation results could
be expanded to analysis of other micro simulations results such as bush fires, drug/molecule
behavior and computer networks.
1
References
[1] Berg G., Davidson I., Duan M., Paul G., , Searching For Hidden Messages: Automatic
Detection of Steganography, 15th Innovative Applications of Artificial Intelligence Conference (IJCAI/IAAI), pp. 51-56, 2003.
[2] Davidson, I., Minimum Message Length Clustering Using Gibbs Sampling, 16th International Conference on Uncertainty in Artificial Intelligence, (UAI), pp. 160-167, 2000.
[3] Davidson I., Ensemble Approaches for Stable Learners with Convergence Bounds, 19th
National Conference of the American Association for Artificial Intelligence Conference
(AAAI), 2004, pp. 330-335.
[4] Davidson I. and Ravi S. S., “Clustering with Constraints and the k-Means Algorithm”,
5th SIAM Data Mining Conference (Winner Best Research Paper Award) 2005.
[5] Davidson I., Fan, W., When Efficient Model Averaging Out-Performs Boosting and Bagging , To Appear ECML/PKDD 2006.
[6] Davidson I. and Ravi, S.S., The Complexity of Non-Hierarchical Clustering under Constraints, To Appear, Journal of Knowledge Discovery and DM, 41 pages.
[7] Davidson I., Ravi S.S., Identifying and Generating Easy Sets of Constraints For Clustering,
21st AAAI Conference, 2006.
[8] Davidson I., Wagstaff, K., Basu, S., Measuring Constraint-Set Utility for Partitional Clustering Algorithms, To Appear ECML/PKDD 2006.
[9] Davidson I. and Paul, G., Locating Secret Messages in Images, Research Track: 10th
Knowledge Discovery and DM Conference (KDD), pp. 545-550, 2004. (82 / 400+)
[10] Zhang K., Fan W., Yuan X., Davidson I., and Li X., ”Forecasting Skewed Biased Stochastic
Ozone Days”, IEEE ICDM 2006 (Winner Best Application Paper Award).
[11] Jerrum M. and Sinclair A., The Markov chain Monte Carlo method: an approach to
approximate counting and integration, in ”Approximation Algorithms for NP-hard Problems,” D.S.Hochbaum ed., PWS Publishing, Boston, 1996.
[12] Satyanarayana, A. Davidson I, A Dynamic Adaptive Sampling Algorithm (DASA) for
Real World Applications: Finger Print Recognition and Face Recognition. ISMIS, pp.
631-640, 2005.
[13] Satyanarayana, A., DM For Large Datasets: Intelligent Sampling and Filtering, Ph.D.
Thesis, SUNY-Albany, 2006
2