Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dental Data Mining: Potential Pitfalls and Practical Issues digital diagnostics (e.g., radiology, microbiology), and molecular biology (e.g., polymerase chain-reactions, microarrays). S.A. Gansky Center for Health and Community, Center to Address Disparities in Children's Oral Health, Department of Preventive and Restorative Dental Sciences, Division of Oral Epidemiology and Dental Public Health, University of California, San Francisco, CA 94143-1361, USA; [email protected] Adv Dent Res 17:109-114, December, 2003 Abstract — Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports (e.g., Kattan et al., 1998) show that ANN and CART models can perform better than classic regression models: CART models excel at covariate interactions, while ANN models excel at nonlinear covariates. Model prediction performance is examined with the use of validation procedures and evaluating concordance, sensitivity, specificity, and likelihood ratio. To aid interpretation, various plots of predicted probabilities are utilized, such as lift charts, receiver operating characteristic curves, and cumulative captured-response plots. A dental caries study is used as an illustrative example. This paper compares the performance of logistic regression with KDD methods of CART and ANN in analyzing data from the Rochester caries study. With careful analysis, such as validation with sufficient sample size and the use of proper competitors, problems of naïve KDD analyses (Schwarzer et al., 2000) can be carefully avoided. Introduction I nformatics, in general, and dental informatics, in particular, are disciplines encompassing a variety of research areas, from molecular biology to library science to public health surveillance. Many dental informatics application areas utilize knowledge discovery and data mining (KDD)—semiautomatic pattern, association, anomaly, and statistically significant structure discovery in data (Fayyad et al., 1996, p. 6). KDD operates at the intersection of artificial intelligence, machine language learning, computer science, engineering, and statistics. KDD has been named a Top Ten emerging technology that will change the world (Waldrop, 2001). However, KDD is not alchemy—it does not turn lead into gold (i.e., bad data or flawed study designs into incredible, novel insights)—but rather KDD is a discipline using modern computing tools to solve problems. In current business applications, KDD touches lives daily when customers swipe supermarket savings cards, sending buying habits to data warehouses. This provided retailers the (apocryphal?) data mining discovery: diapers and beer sharing men's late-night supermarket baskets. In the future, similar encounters in clinicians' offices might collect health information in data warehouses (according to patient confidentiality protections), which can be mined to identify at-risk patients and better treatment modalities. Such possibilities are gradually becoming reality (e.g., Page et al., 2002). Some potential oral health applications for KDD include: large surveys (e.g., NHANES), longitudinal cohort studies (e.g., Veterans Administration Longitudinal Study on Aging), disease registries (e.g., National Cancer Institute's Surveillance, Epidemiology and End Results [SEER] program; birth defects registry; craniofacial treatment outcomes registry), health services research (e.g., claims data, fraud detection), provider and workforce databases, KDD Methods KDD learning methods (see the Glossary in the Appendix) can be unsupervised (grouping into similar, heretofore undetermined, classes based on similarities) or supervised (prediction using already-determined classes, such as disease status). Unsupervised methods include hierarchical cluster analysis and k-means; supervised methods include regression, tree models (e.g., classification and regression trees [CART], boosting, bagging, and ensemble methods), multivariate adaptive regression splines, artificial neural networks (ANNs), support vector machines, and random forests (Hastie et al., 2001). In oral health research, CART was used to predict caries (Stewart and Stamm, 1991). ANN was used in clinical decisionmaking for third molar extraction (Brickley and Shepherd, 1996, 1997; Brickley et al., 1998; Goodey et al., 2000), oral cancer risk assessment (Speight et al., 1995), predicting dental age from photomicrographs (Amariti et al., 2000), predicting growth classification from lateral cephalograms (Lux et al., 1998), and assessing correlations of tooth enamel chemical elements (Nilsson et al., 1996). However, KDD methods have not been compared very much in oral health studies; compared with regression models, ANN might better identify non-linearities, while CART may better find interactions (Kattan et al., 1998). Logistic regression models linear relationships between predictors (inputs) and a binary response (output) (e.g., Harrell, 2001). The binary logit model can be written as: i logit(i) = loge ______ = 0 + 1xi1 + ··· + PxiP + ei 1 - i where i is the probability of the i-th person having the response (y i ), s are the corresponding parameters for P predictor variables, and ei is the error for the i-th person. For example, if log10MS and fluoride levels relate linearly to the probability of developing caries, this model would fit well. Logistic regression coefficients (s) are easy to interpret (as natural logarithms of odds ratios), a very desirable property. If the actual likelihood surface is not a hyperplane, logistic regression will not fit well, since it misses bumps or non-linearities. CART models (e.g., Stewart and Stamm, 1991; Hastie et al., 2001) adapt well to fit interactions, since they group individuals with similar probabilities of caries (to produce terminal nodes with the highest purity or homogeneity of outcome classes). Unlike logit models, CART models are robust to outliers and do not require specific data transformations or hierarchical interaction specification. CART Key Words Models, statistical; decision support techniques; neural networks (computer); dental caries; oral health. Publication supported by Software of Excellence (Auckland, NZ) Presented at "Dental Informatics & Dental Research: Making the Connection", a conference held in, Bethesda, MD, USA, June 12-13, 2003, sponsored by the University of Pittsburgh Center for Dental Informatics and supported in part by award 1R13DE014611-01 from the National Institute of Dental and Craniofacial Research/National Library of Medicine. 109 models are step-function-type likelihood approximations (analogous to Riemann sums approximating integrals); these models are highly interpretable for easy clinician use. ANNs, extremely flexible weighted combinations of nonlinear functions, use a hidden layer with hidden units/nodes/neurons and activation functions to link inputs to the hidden layer and from the hidden layer to outputs. A feedforward or multilayer perceptron ANN is: g0-1(i) = w0 + w1H1 + w2H2 + ··· + wRHR where Hr = tanh(w0r + w1rx1i + w2rx2i + ··· + wPrxPi) with r = 1, 2, ..., R indexing the neurons, Hr denoting the r-th neuron, wpr denoting the coefficients of the p-th input xpi for the r-th neuron, and g 0 -1 denoting the inverse activation function (in this case, tanh-1). In ANN terminology (Schwarzer et al., 2000), a P-R-S model has P inputs (predictors), 1 hidden layer with R neurons, and S outputs (outcomes). Neurons are a function of weighted sums of inputs plus a constant ("bias"), w0. Similarly, outputs are a function of weighted sums of neurons plus bias; logistic and hyperbolic tangents are common activation functions. (Logistic regression is a P-0-1 feed-forward ANN with logistic activation function.) Weight decay, a model complexity penalty term for maximization, can be added to examine potential overfitting. In a simulation study of a 1-15-1 ANN, weight decays of 0, 0.002, and 0.005 were used with 0.005 stabilizing prediction (Schwarzer et al., 2000). Varying random seeds, R, and weight decays stabilizes global optimization (Ripley, 1996). ANNs are iteratively optimized with training data, and the final model is fitted to validation data so that future performance can be assessed. Training estimates weights, but they have no clear interpretation; thus, ANNs have very poor interpretability. Since ANNs with large R fit any arbitrary surface, ANNs should not be overfitted to the training data. Common mistakes with ANN are: too many parameters for the sample size, not using validation, not using a model complexity penalty, incorrect misclassification estimation, implausible probability functions, incorrectly described network complexity, inadequate flexible statistical competitors (e.g., CART), and insufficient comparisons with statistical competitors (e.g., receiver operating characteristic curves) (Schwarzer et al., 2000). KDD Process Although KDD is probably best known for analytic algorithms, KDD is an iterative process with the following steps regarding data: collect and store, pre-process, analyze, validate, and implement (Fig. 1). Data collection and storage include study design, sampling, merging, and warehousing. Pre-processing which may be needed includes cleaning, imputing missing values, transforming (e.g., logarithm for skewed bacterial count data), standardizing (centering at the mean and scaling by standard deviation), and registering (e.g., aligning radiographs for digital subtraction radiography). Analysis can involve unsupervised and supervised learning techniques plus visualization methods. Ideally, validation will be both internal (these data) and external (separate data) (Altman and Royston, 2000). Internal validation can use a split sample (separate training and validation data, if sample size > 3000), crossvalidation (if sample size < 3000), bootstrap (resampling with replacement), or jackknife (leave-one-out) methods. Finally, implementation could involve changes in the KDD process, new clinical interventions, or changes in health policy. Data quality and study design issues remain paramount: Limitations inherent in study designs remain when KDD is used. For example, a tooth implant registry cannot examine the bone quality's impact on implant failure if bone quality is not measured well (or at all). Similarly, a cross-sectional study of stress and temporomandibular joint disorders still assesses only associations, not causality. Goals of this paper are to demystify knowledge discovery and data mining (KDD) by explaining the process, to identify possible pitfalls and practical issues, and to compare the performance of KDD methods (logit, CART, and ANN) in analyzing Rochester caries study data. Materials and Methods Rochester Caries Study In upstate New York (the Rochester and Finger Lakes areas), firstand second-graders, caries-free at baseline, had stimulated saliva collected and dental exams without radiographs performed every six months for up to 6 years for a larger longitudinal caries risk assessment study (Billings et al., 2003); this was a follow-up to the same research team's earlier cross-sectional investigation (Leverett et al., 1993a) and two-year longitudinal study (Leverett et al., 1993b). This example analysis predicts primary tooth caries (output) according to a subset of predictors (inputs) which may have non-linearity or interactions. Salivary assays assessed mutans streptococci (MS) and lactobacillus (LB) levels (colonyforming units per milliliter, CFU/mL), fluoride (F), calcium (Ca), and phosphate (P) levels. Data for 466 children with 2 years of follow-up were analyzed with input variables, selected based on published discriminant analysis models (Leverett, 1993b) (log10 MS, log10 LB, F [parts per million, ppm], Ca [millimole per liter, mmol/L], and P [mmol/L]). The output (response) variable was caries incidence (at least one decayed or filled surface) on primary teeth at 24 months of follow-up. Earlier analyses showed 18month measures to be more predictive of 24-month caries incidence than baseline, six-month, or 12-month measures. KDD methods Fig. 1 — Knowledge discovery and data mining (KDD) steps. KDD involves several iterative steps, beyond analytic algorithms, to process scientific information. 110 Gansky Logistic regression, CART, and ANN caries prediction methods were compared. Logistic regression used stepwise selection, with alpha = 0.05 to enter and 0.20 to stay, and the Akaike Information Criterion to judge the need for additional predictors. CART used the Gini index-splitting criterion and the proportion correctly classified for pruning back the maximal tree. A 5-3-1 multilayer perceptron ANN model (22 degrees of freedom) with inverse hyperbolic tangent activation functions, Adv Dent Res 17:109-114, December, 2003 Levenberg-Marquardt optimization, 5 preliminary runs, average error selection, and no weight decay function was used. Sensitivity analyses varied random seed (5 different values), number of hidden neurons (2, 3, 4), and weight decay parameter (0, 0.001, 0.005, 0.010, 0.250). Training and validation were performed with a 70%/30% randomly split sample stratified on primary dentition caries. All methods used the same training data to develop the prediction models and the hold-out (not used to develop the models) validation data to score or validate the models. Additionally, five-fold cross-validation [CV(5)] was performed, randomly forming 5 groups leading to 5 analyses, each with 4/5 of the total data (i.e., each 5th was left out of one analysis). Results were then aggregated to calculate mean square error (MSE), also called the Brier score (B), between observed and expected output: 1 n 1 n ˆi MSE = __ ei2 = B = __ yi - n i=1 n i=1 ( ), 2 The resultant training classification tree is presented in Fig. 2. In this example, the overall prevalence of caries in the primary dentition was 15% (root node). Each input variable was searched to partition the root node. All children with log10 MS less than 7.08 (~ 10 million CFU/mL) were in the left node, with the remainder in the right node. Node-specific prevalences were 15% and 91%, respectively. Circles identify nodes with prevalence less than or equal to the overall prevalence of 15%, while squares identify nodes with prevalence greater than the overall prevalence. Continuing, the left node was split into two nodes according to log10 LB. The node with log10 LB less than 3.05 was further split with log10 MS for identification of a group with very low prevalence. This illustrates tree models' recursive nature, since predictors can be re-used. Next, the node with log10 LB greater than or equal to 3.05 was split with fluoride. Finally, the node with log10 MS greater than or equal to 7.08 was split with fluoride. There were 6 terminal nodes (3 high prevalence and 3 low prevalence); 1 high-prevalence node was very high, while 2 low-risk nodes were very low. Cumulative captured-response curves (Fig. 3) compare where n is the sample size. Visualization Area under the curve (AUC) from receiver operating characteristic (ROC) curves plotting sensitivity vs. the false-positive fraction (1 specificity), which is equivalent to concordance (c index), was also calculated. The (positive) likelihood ratio is sensitivity / (1 - specificity). ROC curves allow for balancing between the sensitivity-specificity tradeoff. Cumulative captured-response curves are similar to ROC curves, but with graph sensitivity vs. the percent testing positive (identified as highrisk). Thus, sensitivity for KDD methods can be compared for a specific percent-positive cut-off, which may be useful when resources for those labeled high-risk might be limited. A related graph is the lift chart, which displays the gain each KDD method has over baseline vs. the percent testing positive. To visualize the input contribution, we divided ANN predicted probabilities into quintiles (fifths) and showed the distributions of the standardized predictors in each quintile via boxplots. Fig. 2 — Classification and regression tree model to predict 24-month caries in primary dentition from salivary bacterial and chemical measurement 6 months prior. Mutans streptococci counts (log10 colony-forming units per milliliter, CFUs/mL), lactobacilli counts (log10 CFUs/mL), and fluoride (F) levels (parts per million) were used to produce 6 terminal nodes—3 with risk higher than overall prevalence and 3 with risk lower than overall prevalence. Results Logistic regression yielded a model with only the two bacterial level variables as significant predictors: the log 10 MS odds ratio (OR) was 1.27 (95% confidence interval, 1.10 1.46), while the log10 LB OR was 1.36 (95% CI, 1.19 - 1.57), meaning that each log10 increase in MS related to a 27% increase in probability of having primary tooth caries 6 months later, and each log10 increase in LB related to 36% greater odds of carious primary dentition. Adv Dent Res 17:109-114, December, 2003 Fig. 3 — Percent of children identified with caries as the percent classified as high-risk changes for the different KDD models. Cumulative captured-response graph comparing baseline, 5-3-1 artificial neural network (ANN), logistic regression (logit), and classification and regression tree model performance in identifying children with caries in the primary dentition (% captured response) vs. the percent labeled as high-risk. Percent captured response is sensitivity, while percent high-risk is the percent testing positive. Dental Data Mining: Pitfalls and Issues 111 the performance of the 3 methods. These curves show the percent of children with caries identified (vertical axis) as the percent classified as high-risk (horizontal axis) increases—i.e., sensitivity on the vertical axis and percent with a positive diagnostic test on the horizontal axis. Like ROC curves, this graph shows performance vs. a 45-degree diagonal reference line, which identifies the same percent of cases as the percent labeled as high-risk. For example, if 30% of the children are labeled high-risk, 30% of the cases would be expected to be identified (i.e., at the same prevalence as the overall prevalence). As with ROC curves, an ideal prediction method would identify all 100% of the cases with a small percent of the children being labeled high-risk. Here, if 10% of the children were labeled high-risk, all 3 methods identified about 30% of the cases; if 20% were denoted high-risk, logistic regression (logit) identified about 35% of the cases, CART (tree) 40%, and ANN 50%; if 30% were classified high-risk, logistic regression and CART each identified 55% of the cases, while ANN found more than 60%. Corresponding values in a lift chart (not shown) would be about 3 for all 3 methods if 10% of children are labeled high-risk, but about 1.75 for logit and tree and 2 for ANN if 30% of children are labeled highrisk. To be clinically useful, studies probably would not want to denote more than 30-40% of children as being high-risk (Hausen, 1997). ANN may provide some advantages over logit and CART. ROC curves (not shown) yield patterns very similar to those shown in Fig. 3; however, Fig. 3 provides the direct interpretation for the percent of children identified as high-risk, which may be more useful to public health and policy planners. Although ANN weights do not provide the direct interpretation (e.g., ORs), the predicted probability from an ANN model can be categorized (binned), and the distribution of predictors in each category can be graphed. Fig. 4 shows the distribution of standardized log 10 MS (standard normal distribution) with boxplots for quintiles of ANN model predicted probability. The lowest quintile (chance of caries in the primary dentition) has the lowest log10 MS level. CV(5) results showed extremely similar root MSE values for the 3 methods: 0.365 for logit, 0.363 for CART, and 0.362 for ANN. AUC (c index) from ROC curves differed somewhat: 0.553 for CART, 0.680 for logit, and 0.707 for ANN. This is the probability that one randomly chosen child with caries and one without caries would both be correctly classified. Discussion Limitations of the example presented included the relatively small number of predictors utilized. Moreover, other factors potentially related to caries, such as salivary flow rate and pH, were not included. Sufficient salivary flow rate was an inclusion criterion for the study. Buffering capacity, pH, was not an important predictor in earlier work for the precursor study. However, investigators thought that the relationships between predictors and response might be non-linear and include interactions (earlier analyses showed interactions between bacterial counts and salivary chemistry measures). Additionally, the logistic regression models considered did not include interaction or non-linear terms, which may have produced logistic regression models approaching predictive accuracy of artificial neural networks. Boosting (re-weighting to emphasize misclassifications) or bootstrap aggregation (bagging) could have improved the performance of the tree models. Conclusions Knowledge discovery and data mining (KDD) is not a panacea but rather a process with useful tools; KDD does not obviate the need for careful monitoring of data quality and study design issues. Multiple methods should be used to assess sensitivity to one particular method; prediction results from various methods should be compared according to receiver operating characteristic (ROC) curves, cumulative captured-response curves, or lift charts. Care should be taken to avoid common mistakes made with artificial neural networks (ANNs) (e.g., Schwarzer et al., 2000). Validation (internal and external) is essential: "The major cause of unreliable models is overfitting the data" (Harrell, 2001, p. 249). Graphic displays can greatly help interpretations and demystify the "black box" nature of some KDD methods, such as ANNs. KDD methods may provide advantages over traditional statistical methods in dental data. Acknowledgments Fig. 4 — Distribution of standardized log10 mutans streptococci (MS) counts for 5-3-1 artificial neural network (ANN) model predicted probability quintiles. Predicted probability of caries in primary dentition is split into 5 equal-sized groups (quintiles). Log10 MS counts (standardized to a standard normal distribution by centering at the mean and scaling by the standard deviation) were much lower in the first quintile (i.e., the 20% of children who had the lowest predicted probability of caries in their primary dentition). Thus, even though ANN model weights are not interpretable for individual input (predictor) variables, visualization techniques can show how inputs relate to the predicted output (response). 112 Gansky The author is grateful to Dr. John Featherstone for his insights about caries risk, to Dr. Jin Whan Jung for his inspiration to utilize KDD tools, to Dr. Ronald Billings for generously providing the Rochester caries study data, and to Dr. Jane Weintraub for her helpful suggestions and comments on an earlier draft of the paper. Any ambiguities, omissions, or errors that remain are solely my own. This research was supported in part by cooperative agreement US DHHS/NIH/NIDCR, NCMHD U54DE14251-01. The Rochester caries study was performed with support from US DHHS/NIH/NIDCR R01DE08946 (R.J. Billings, Principal Investigator). Adv Dent Res 17:109-114, December, 2003 Appendix: Glossary of Knowledge Discovery and Data Mining Methods and Related Terms (italicized words are cross-referenced) Artificial neural network (ANN) model — multilayer non-linear "black box" mathematical model to predict output from inputs Bagging (Bootstrap aggregation) — ensemble tree model method to reduce misclassification error using bootstrap (with replacement) resampling Boosting (e.g., adaptive resampling and combining (ARCing) or adaptive boosting (AdaBoost)) — ensemble tree model method to reduce misclassification error using increased weights for misclassified observations to allow for better prediction in subsequent trees on those records Bootstrap — drawing (resampling) a large number (e.g., 500 to 10,000) of new sets of data with the original sample size (n) from the original data with replacement and re-analyzing those bootstrap resamples to simulate variability and assess robustness Classification and regression tree (CART) model — recursive partitioning method (re-assessing all inputs at each stage) to split the data into 2 groups at each stage based on inputs that minimize the output class misclassification error Cross-validation or K-fold cross-validation [CV(K)] — randomly dividing the data into K mutually exclusive and exhaustive subsets (e.g., 5 or 10), re-analyzing each subset, and aggregating across the K subsets to estimate robustness Ensemble tree model or committee of trees — classifier using majority vote (modal) class assignment or mean predicted probability from a group of tree models grown under different conditions to reduce classification error Hierarchical clustering — groups' records together based on closeness/similarity, starting with each record in its own cluster and ending with all records in one cluster (or vice versa) and allowing the reader to choose classification from those in the middle; formed in either a step-down (divisive) or step-up (agglomerative) direction Input — predictor, explanatory, or independent variable Jackknife — assessing analysis robustness by leaving out one observation (i.e., sample size is n - 1), analyzing the data again, repeating n times until each observation has been left out once, and then comparing with the original analysis with all the data; equivalent to n-fold cross-validation [CV(n)] K-means clustering — iteratively determines K groups based on closeness/similarity to group center (mean) and minimal within-group variability k-nearest neighbor (knn) clustering — iteratively identifies groups based on the k closest neighbors to each point, assigning modal or majority class among the k neighbors Multivariate adaptive regression splines (MARS) — iterative modeling method using combinations of linear basis functions of inputs (predictors) to fit non-linear relationships smoothly Output — response, outcome, or dependent variable Random Forests — ensemble tree method using randomly selected subsets of inputs while also providing interpretability through summary measures of input variable importance Regression model (linear or logistic) — classic statistical model to predict output value or probability (linearly or log-linearly) from inputs Split sample — randomly grouping data into training and testing samples, stratifying on output, building prediction models with the training sample, and testing that resultant model with the holdout testing sample to provide an unbiased error estimate and assess robustness Adv Dent Res 17:109-114, December, 2003 Supervised learning — modeling in which the output class is known Support vector machines (SVM) — computationally intensive "black box" method to find the non-linear multidimensional boundary (hyperplane) transformed as a linear hyperplane that best splits classes Unsupervised learning — modeling in which the output class is not known; data are clustered according to similar input variables References Altman DG, Royston P (2000). What do we mean by validating a prognostic model? Statist Med 19:453-473. Amariti ML, Restori M, De Ferrari F, Paganelli C, Faglia R, Legnani G (2000). A histological procedure to determine dental age. J Forensic Odontostomatol 18:1-5. Billings RJ, Gansky SA, Mundorff-Shrestha SA, Leverett DH, Featherstone JDB (2003). Pathological and protective caries risk factors in a children's longitudinal study (abstract). Caries Res 37:277-278. Brickley MR, Shepherd JP (1996). Performance of a neural network trained to make third-molar treatment-planning decisions. Med Decis Making 16:153-60. Brickley MR, Shepherd JP (1997). Comparisons of the abilities of a neural network and three consultant oral surgeons to make decisions about third molar removal. Br Dent J 182(2):59-63. Brickley MR, Shepherd JP, Armstrong RA (1998). Neural networks: a new technique for development of decision support systems in dentistry. J Dent 26:305-309. Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996). From data mining to knowledge discovery: an overview. In: Advances in knowledge discovery and data mining. Fayyad UM, PiatetskyShapiro G, Smyth P, Uthurusamy R, editors. Menlo Park, CA: AAAI Press, pp. 1-36. Goodey RD, Brickley MR, Hill CM, Shepherd JP (2000). A controlled trial of three referral methods for patients with third molars. Br Dent J 189:556-560. Harrell FE Jr (2001). Regression modeling strategies with applications to linear models, logistic regression and survival analysis. New York: Springer Verlag. Hastie T, Tibshirani R, Friedman JH (2001). The elements of statistical learning: data mining, inference, and prediction. New York: Springer Verlag. Hausen H (1997). Caries prediction—state of the art. Community Dent Oral Epidemiol 25:87-96. Kattan MW, Hess KR, Beck JR (1998). Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Comput Biomed Res 31:363-373. Leverett DH, Featherstone JDB, Proskin HM, Adair SM, Eisenberg AD, Mundorff-Shrestha SA, et al. (1993a). Caries risk assessment by a cross-sectional discrimination model. J Dent Res 72:529-537. Leverett DH, Proskin HM, Featherstone JDB, Adair SM, Eisenberg AD, Mundorff-Shrestha SA, et al. (1993b). Caries risk assessment in a longitudinal discrimination study. J Dent Res 72:538-543. Lux CJ, Stellzig A, Volz D, Jager W, Richardson A, Komposch G (1998). A neural network approach to the analysis and classification of human craniofacial growth. Growth Dev Aging 62(3):95-106. Nilsson T, Lundgren T, Odelius H, Sillen R, Noren JG (1996). A computerized induction analysis of possible co-variations among different elements in human tooth enamel. Artif Intell Dental Data Mining: Pitfalls and Issues 113 Med 8:515-526. Page RC, Krall EA, Martin J, Mancl L, Garcia RI (2002). Validity and accuracy of a risk calculator in predicting periodontal disease. J Am Dent Assoc 133:569-576. Ripley BD (1996). Pattern recognition and neural networks. New York: Cambridge University Press. Schwarzer G, Vach W, Schumacher M (2000). On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat Med 19:541-561. Speight PM, Elliott AE, Jullien JA, Downer MC, Zakrzewska JM 114 (1995). The use of artificial intelligence to identify people at risk of oral cancer and precancer. Br Dent J 179(10):382-7. Stewart PW, Stamm JW (1991). Classification tree prediction models for dental caries from clinical, microbiological, and interview data. J Dent Res 70:1239-1251. Waldrop MM (2001). Data mining. MIT Technology Review—Ten emerging technologies that will change the world. January/February. Internet Web site accessed 23 March 2004. <http://www.technologyreview.com/articles/mag_toc_jan01. asp> Gansky Adv Dent Res 17:109-114, December, 2003