Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extracting correlated sets using the chi-squared measure within n-ary relations: an implementation A. 1 Casali , C. 2 Ernst , F. 3 Gasnier , J. 2 Stephan 1: Université de la Méditerranée / LIF ― 2: École des Mines de St Étienne / CMP-GC ― 3: STMicroElectronics Rousset Motivations Results 8th European AEC/APC Conference - Dresden 2007 The field of APC aims at highlighting correlations between Production parameters. This study focuses on the device analysis of the principal trajectories impacting the yield. Item1 Retrieved Patterns The goal is to detect correlations between data measurements structured as n-ary relations and involving (at least) one target attribute. The method uses a data mining levelwise algorithm based on both the chi-squared and the support measures. Report INTERPRETATION Item2 … … … … 3453 3489 - - 6.29 964 1990 3489 - 15.96 1106 1990 3489 - 23.55 1767 1990 3489 - 15.75 1962 1990 3489 - 28.55 1990 2115 3489 - 46.57 … … … … … Attribute1 Attribute2 … Target Attribute … … … … _9592_TRAN- 0.41 [2060.6, 2076.8] 0.39 [328.5, 373.5] 0.37 [328.5, 373.5] [127.1, 136.5] Generation [52.3, 75.5] [328.5, 373.5] _4690_ALIY- [0.3, 11.8] 0.37 [328.5, 373.5] 0.62 [0.3, 11.8] [79.6, 81.1] … 0.82 PCTH0.62 [0.3, 11.8] 0.82 PCTH0.62 [0.3, 11.8] _4748_EPTE0.62 0.82 PCTH- _4692_IMPT- _4692_IMPT[328.5, 373.5] 0.62 _4692_IMPT0.38 0.82 PCTH- _4692_IMPT- _4572_EOXR- Knowledge [0.3, 11.8] _4692_IMPT- _3700_ALIX[17.5, 23.0] PCTH- - _2565_EPPO- SELECTION Chi2 … [-47.8, -32.7] Files with a vast number of numerical attributes (and often incomplete data) Item4 - Item decoding - Presentation (processing) of correlations Methodology: a KDD approach Raw (Excel) Data Measurement Files Item3 0.82 PCTH0.34 … [0.3, 11.8] … 0.82 … A complete data transformation, mining and interpretation Model for correlation detection within data measurements Selected File Conclusions PREPROCESSING Attribute removal. Criteria: attributes - with too few distinct values - having too many null values - presenting doubles (one is kept) - with a too small standard deviation Preprocessed File TRANSFORMATION - Normalization - Interval discretization / Item encoding - Elimination of attributes with no item having the support This approach makes it possible for STMicroElectronics Rousset to highlight unknown correlations between various parameters, validated by electrical and/or physical analysis. While the proposed mining method confirmed that levelwise algorithms do not provide results beyond four search levels, it proved its value for n-ary relations with a very large number of numerical attributes. The study aims at supporting the development of effective R2R control loops. Transformed Future Work File Current developments are focused on: - The optimization of the procedure, - And the implementation of other search methods. DATA MINING IN : ItemSet I, Fraction p%, Threshold mc (chi2), Threshold s (support), Target Attribute ta, Relation r OUT : Set of minimal correlated patterns 1 C2 := APrioriGen(I); // (2-pattern) candidates generation 2 i := 2 3 while Ci <> 0 do 4 Li := 0 5 for each X Ci do 6 Build the contingency table of X 7 if p% of the table’s cells have a support s then 8 if chi2(X) mc then Li := Li X 9 endif 10 end for 11 Ci+1 := APrioriGen(Ci – Li) 12 i := i + 1 13 end while 14 return i Li // limited to the patterns including one item of ta We plan to initiate a background procedure integrating different sets of methods, measurements and results. Automatic generation of the most suitable result for each new analysis. → Acknowledgments This work was initiated while the fourth author was at Ecole des Mines de Saint-Étienne / CMP-GC, and was supported by Research Project “Rousset 2003-2008”, financed by the Communauté du Pays d'Aix, Conseil Général des Bouches du Rhône and Conseil Régional Provence Alpes Côte d'Azur.