Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
FROM DATA MINING TO KNOWLEDGE MINING: SYMBOLIC DATA ANALYSIS AND THE SODAS SOFTWARE E. Diday University of Paris IX Dauphine and INRIA AIM FROM HUDGE DATA IN AN ECONOMIC WAY -Extract new knowledge -Summarize -Concatenate -Solve confidentiality -Explain correlation HOW? By working on HIGHER LEVEL UNITS as “classes”, “categories” or ”concepts“ necessary described by more complex data extending Data Mining to Knowledge Mining. 1 OUTLINE 1) THE MAIN IDEA: FIRST AND SECOND ORDER OBJECTS. 2)THE INPUT OF A SYMBOLIC DATA ANALYSIS: SYMBOLIC DATA TABLE. 3) MAIN SOURCES OF SYMBOLIC DATA: FROM DATA BASES, FROM CATEGORICAL VARIABLES. 4) MAIN OUTPUT OF SYMBOLIC DATA ANALYSIS ALGORITHMS: SYMBOLIC DESCRIPTIONS AND SYMBOLIC OBJECTS. 5) THE MAIN STEPS OF A SDA. 6) SOME TOOLS OF SYMBOLIC DATA ANALYSIS 7) SYNTHETICAL VIEW OF THE SODAS SOFTWARE THE M AIN IDEA: FIRST AND SECOND ORDER OBJECTS THE ARISTOTLE ORGANON (IV B.C.) CLEARLY DISTINGUISHES "FIRST ORDER OBJECTS" (AS THIS HORSE OR THIS MAN) CONSIDERED AS A UNIT DESCRIBING AN INDIVIDUAL OF THE W ORLD , FROM "SECOND ORDER OBJECTS" (AS A HORSE OR A MAN) ALSO TAKEN AS A UNIT DESCRIBING A CLASS OF INDIVIDUALS. 2 Class, Category, Concept • A CLASS is a set of units (birds of an island) • A CATEGORY is a value of a categorical variable (“swallows” is a value of the variable: “species of bird”) • A CONCEPT is defined by an intent and an extent • Intent: characteristic properties of the swallows • Extent: the set of swallows •A concept is modeled by a “symbolic object” From standard units to concepts the statistic is not the same! On an island there are 400 swallows, 100 ostrich, 100 penguins : Standard Data Table Symbolic Data Table Oiseau Species Flying Size (cm) 1 Penguin No 80 2 swallows Yes 70 600 Ostrich No 125 Species Flying Color Size Migr Swallows yes 0.3b,0.7grey [60, 85] Yes Ostrich No 0.1noir,0.9g [85, 160] No Penguin No 0.5n,0.5gris [70, 95] Yes The species is a concept , it becomes the new statistical unit The variation due to the individuals is expressed by intervals or histograms Birds (individuals) The statistic of individuals is different from the statistic of concepts! 400 Species A « conceptual » variable is added: it applies to concepts (concepts) 2 1 200 Flying No flying Flying No flying 3 From Database to Concepts Individuals Columns: standard variables Relational Data Base QUERY Rows : individuals Individuals description Concepts Description Columns: symbolic variables Rows : concepts Symbolic Data Table Individuals versus Concepts Classical observations (Individuals) Symbolic observations (Concepts) Town (Paris, Lyon, ...) Region (North, South, ...) Bird (this cardinal, ...) Species (Cardinals, Wrens, ...) Football Player (Zidane, ...) Team (Spane, ...) Photo (taken today, ...) Types of Photos (Family, Mountains, ...) Item Sold (hammer, phone, ...) Types of Item (Electrical Goods, Auto,...) WEB trace (Jones, temperature today,.) WEB Usages (Medical Info, Weather, ...) Subscription (to Time, ...) Magazines Reservation of John to Fly AF Paris Ljubjana All Reservation to Fly AF Paris Ljubjana July 2006 Fly AF Paris Ljubjana N° 205 5 May 06 Fly AF Paris Ljubjana N° 205 in May 2006 Client Socio professional category Water consumer Type of consuming Geographical Position Type of environment 4 FOUR PRINCIPLES 1) ONLY TWO LEVELS OF UNITS: First level: Individuals Second level: concepts 2) THE CONCEPTS CAN BE CONSIDERED AS NEW INDIVIDUALS OF HIGHER LEVEL. 3) A CONCEPT IS DESCRIBED BY USING THE DESCRIPTION OF A CLASS OF INDIVIDUALS OF ITS EXTENT. 4) THE DESCRIPTION OF A CONCEPT MUST EXPRESSES THE VARIATION OF THE INDIVIDUALS OF ITS EXTENT FROM FIRST ORDER OBJECTS TO SECOND ORDER OBJECTS IN OFFICIAL STATISTICS Units Classes Case n° Region Bedroom DiningLiving Socio-Econ Group 11401 NorthernMetropolitan NorthernMetropolitan NorthernMetropolitan 2 1 1 2 1 3 1 3 3 12315 East-Anglia 12316 East-Anglia 14524 Greater-London 1 2 1 3 2 2 3 1 3 11402 11403 Descr. Var. of the Units 5 FROM FIRST ORDER OBJECTS TO SECOND ORDER OBJECTS IN OFFICIAL STATISTICS Classes Region Descriptive variable of the units Bedroom Dining-Liv Socio-Ec gr 2 1 1 NorthernMetropolitan Northern2 Metropolitan Northern1 Metropolitan 1 3 3 3 East-anglia East-anglia East-anglia 3 2 2 3 1 3 1 2 1 Classes Region Descriptive variables of the classes Bedroom Dining-Liv Socio-Ec gr (2\3) 2, (1\3) 1 (2\3) 1, (1\3) 3 (1\3) 1, (2\3) 3 NorthernMetropolitan East-anglia (2\3) 1, (1\3) 2 (2\3) 2, (1\3) 3 (2\3) 3, (1\3) 1 Keeping rules after the generalization process Individuals Concepts I1 C1 a Y1 2 Y2 I2 C1 b 1 I3 C1 c 2 I4 C2 b 1 I5 C2 b 3 I6 C2 a 2 Initial classical data table where individuals are described by three variables Y1 Y2 C1 {a, b, c } {1, 2} C2 {a, b } {1, 2, 3} Induced Symbolic Data Table with background knowledge defined by two rules: [Y1 = a] ⇒ [Y2 = 2] and [Y2 = 1] ⇒ [Y1 = b] 6 Keeping correlation after the generalization process Player Team Age Weight Height Nationality Fernandez Spain 29 85 1. 84 Spanish ____ World Cup yes PRodriguez Spain 23 90 1.92 Brazilian yes Mballo France 25 82 1.90 African yes Zedane France 27 78 1.85 French yes _ ____ _ ___ ___ ____ ____ _____ Renie XX 23 91 1. 75 Spanish no Olar XX 29 84 1.84 Brazilian no Mirce XXX 24 83 1.83 African no Rumbum XXX 30 81 1.81 French no ____ Classical data table describing players by numerical and categorical variables. World Cup Age Weight Height Cor(Weight, Height) yes [21, 26] [78, 90] [1.85, 1.98] 0.85 no [23, 30] [81, 92] [1.75, 1.85] 0.65 Symbolic Data Table obtained by generalisation for the variables age, weight and height and keeping back the correlations between the Weight and the Height: the correlation is higher for the world cup players. Schweitzer (1984): "Distributions are the numbers of the future” MORE GENERALY, WHAT IS THE INPUT OF A SYMBOLIC DATA ANALYSIS? SYMBOLIC DATA TABLE + BACKGROUND KNOWLEDGE SYMBOLIC DATA TABLE : THE CELLS CAN CONTAIN: - SEVERAL QUALITATIVE OR QUANTITATIVE WEIGHTED VALUES - INTERVALS - HISTOGRAMS 7 SOME BACKGROUND KNOWLEDGE CAN BE GIVEN VARIABLES CAN BE: -TAXONOMIC: « A SOCIO-ECONOMIC GROUP IS CONSIDERED TO BE "SELF-EMPLOYED" IF IT IS "PROFESSIONAL SELF-EMPLOYED" OR "OWN ACCOUNT NON-PROFESSIONAL". -HIERARCHICALLY DEPENDENT : THE VARIABLE: “DOES THE COMPANY HAS COMPUTERS? “ AND THE VARIABLE: “ KIND OF COMPUTER” ARE HIERARCHICALLY LINKED. - RULES KEEPING LOGICAL DEPENDENCIES: « IF AGE(W) IS LESS THAN 2 MONTHS THEN HEIGHT(W) IS LESS THAN 10 ». A TAXONOMIC VARIABLE Compagnies ________ Region Comp1 Paris Comp2 London Comp3 France Comp4 England Comp5 Glasgow Comp6 Europe _________ The taxonomic Europe tree associated with the variable « region ». United King Dom Scotland England London Glasgow Paris France Region Predecessor Paris France London England France Europe England Europe Glasgow Scotland Europe Europe Database relation 8 EXAMPLE OF SYMBOLIC DATA TABLE PRODUCT WEIGHT TOWN COLOUR PRODUCT 1 [5, 9] {Londres} {0.1red, 0.9white} PRODUCT 2 [ 3,8] {Paris, Londres } PRODUCT 3 PRODUCT 4 { 0.3 red, 0.7 green} [2,3] THE CELLS CAN CONTAIN: -SEVERAL QUALITATIVE OR QUANTITATIVE WEIGHTED VALUES -INTERVALS - HISTOGRAMS SYMBOLIC DATA ANALYSIS: 3 STEPS FIRST STEP: From individuals to categories SECOND STEP: From Categories to Concepts described by a Symbolic Data Table. THIRD STEP: Extract new knowledge from this symbolic data completed by some background knowledge. CONSEQUENCE: need of extending Standard Statistics, Exploratory Data Analysis, and Data Mining to Symbolic Data Tables. This is the aim of the SODAS software. 9 From Standard Versus Symbolic Data and methods THE DATA Standard Data • Numerical: Points de R (nbres réels) • Categorical Ordinal : Points of N (naturel numbers) • Categorical non ordered : categorical values Symbolic Data - Diagrams, Histograms ou Distributions - Sequences of numbers or categories - Sequences of weighted categories - Functions, curves - Rules - Taxonomies - Graphes Methods • Stat descriptive (Histos, Corrélations , biplots) •Typologie (hiérarchies, pyramids, K-means, Nuées dynamiques, Kohonen maps ,…) •Mixture decoimposition •Décision trees, boosting, baging, … •Dissimilarities and their Représentation •Rules inference and causal trees • Visualisation (points) •factorial Analysis (ACP, AFC, …) •Classical Regression, PLS •Neural Network, VSM (Vector Support Machine), Etc. • Galois Lattices (binary data) Any standard method can be generalized to concepts described by symbolic data (PLUS) + Specific Methods of Symbolic Data Analysis Class description, Haussdorf dissimilarities, symbolic prototypes, modeling concepts by symbolic objects… SOURCES OF SYMBOLIC DATA .FROM CATEGORICAL VARIABLES: - GIVEN (AS « TYPE OF EMPLOYMENT ») - OBTAINED BY CLUSTERING. .FROM DATA BASES: QUERY CREATING A NEW CATEGORICAL VARIABLE: cartesian prod .FROM EXPERT: NATIVE SYMBOLIC DATA: Scenario of road accidents, species of insects .FROM CONFIDENTIAL DATA IN ORDER TO HIDE THE INITIAL DATA BY LESS ACCURACY 10 .FROM STOCHASTIC DATA TABLE: THE PROBABILITY DISTRIBUTION , THE HISTOGRAM THE PERCENTILES OR THE RANGE OF ANY RANDOM VARIABLE ASSOCIATED TO EACH CELL OF A DATA TABLE EXAMPLE Tom M athem atics Physics Litterature XM XP XL Paul X M is the random variable which associates to each exam of TO M his m ark in m athem atics. From X M several kinds of sym bolic objects can be defined by using in each cell: - Probability distr. - H istogram s - Inter-quartile intervals .FROM TIME SERIES - IN DESCRIBING INTERVALS OF TIME: ( the variation of the values each week) - IN DESCRIBING A TIME SERIES BY THE HISTOGRAM OF ITS VALUES. 100 80 60 40 20 0 N or d E st O u e st Nord E st 1er tr im . 2e tr im . 3e tr im . 4e tr im . 11 FROM FUZZY DATA TO SYMBOLIC DATA Paul Jef Jim Bill height 1.60 1.85 0.65 1.95 weight 45 80 30 90 hair yellow yellow black black From Numerical to Fuzzy Data small average high 1 Initial Data Paul Jef Jim Bill small 0.70 0 0.50 0 height average 0.30 0.50 0 0 0.5 weight hair high 0 0.50 0 0.48 45 80 30 90 yellow yellow black black 0 1.50 1.80 1.90 JEF 0.65 1.60 1.85 1.95 Fuzzy Data small {Paul, Jef } {Jim, Bill} [0, 0.70] [0, 0.50] height average weight hair high [0.30, 0.50] 0 [0, 0.50] [45, 80] yellow [0, 0.48] [30, 90] black Symbolic Data MAIN OUTPUT OF SYMBOLIC DATA ANALYSIS ALGORITHMS: SYMBOLIC DESCRIPTIONS SYMBOLIC OBJECTS. SYMBOLIC DESCRIPTIONS Description AGE D1 {12,20,28} SPC {employee,worker} D2 {teacher,countryman} [5, 33] 12 CONCEPTS ARE MODELED BY SYMBOLIC OBJECTS WHATS A CONCEPT? A CONCEPT IS DEFINED BY AN * INTENT : ITS CHARACTERISTIC PROPERTIES * EXTENT:THE SET OF INDIVIDUALS WHICH SATISFY THESE PROPERTIES LIKE OUR MIND, SYMBOLIC OBJECTS MODEL CONCEPTS BY AN INTENT AND A WAY OF COMPUTING THE EXTENT SYMBOLIC OBJECT It’s an animal(w) = 0.99 yes dC d R y w S = (a, R, dC) a(w) = [y(w)RdC] 13 TWO KINDS OF SYMBOLIC OBJECTS BOOLEAN SYMBOLIC OBJECTS S = (a, R, d1) d1= {12, 20 ,28} x {employee, worker}] R = (⊆ ⊆ , ⊆ ), a(w) = [age(w) ⊆ {12, 20 ,28}] ∧ [SPC(w) ⊆ {employee, worker}] a(w) ∈ {TRUE, FALSE}. THE MEMBERSHIP FUNCTION« a » MODAL CASE S = (a, R, d): a(w) = [age(w) R1 {(0.2)12, (0.8) [20 ,28]}] ∧ [SPC(w) R2 {(0.4)employee, (0.6)worker}] a(w) ∈ [0,1]. First approach: simple or flexible matching R= (R1, R2 ): r Ri q = ∑j=1 ,k r j q j e (r j - min (r j, q j)) . Second approach: Probabilistic: if dependencies, copulas, derivation of the joint distribution, transforming the joint density in [0,1]. 14 EXTENT OF A SYMBOLIC OBJECT S: BOOLEAN CASE: EXT(s) = {W ∈ Ω / a(W) = TRUE}. MODAL CASE EXTα (S)= EXTENTα (a) = {W ∈ Ω / a( W ) ≥ α}. THE LEARNING PROCESS OF SYMBOLIC OBJECTS REAL WORLD MODELED WORLD DESCRIPTIONS INDIVIDUALS xw x x x Ω Ext(s/Ω Ω) x dw R x x x x T dC SYMBOLIC OBJECTS CONCEPTS x x x x s = (a, R,dC) Learning symbolic objects modeling concepts by reducing two kinds of errors: the individuals who are in the extent of the symbolic objects but not in the extent of the concept, the individuals which are not in the extent of the symbolic object but are in the extent of the concept. 15 THE FOUR CASES OF STATISTICAL OR DATA MINING ANALYSIS. Output Classical Analysis Input Case 1 Classical Data Case 3 Symbolic Data Symbolic Analysis Case 2 Case 4 Comparing standard data versus symbolic data WHY SYMBOLIC DATA ARE NOT STANDARD DATA? Symbolic Data Table Quality of Goal Weight Height Nationality Very Good [80, 95] [1.80, 1.95] {0.7 Eur, 0.3 Afr} Associated standard data Quality of Goal Weight Min Weight Max Height Min Height Max Eur Afr Very Good 80 95 1.80 1.95 0. 7 0.3 Consequences: . Initial variables are lost . Variation are lost 16 WHY IT IS ENHANCING TO WORK ON SYMBOLIC DATA THAN ON THEIR DECOMPOSITION IN STANDARD DATA TABLE? Town Nb of pupils Kind Level Paris [320, 450] (100%)Public {1, 3} Lyon [200, 380] (50%)Public , (50%)Private {2, 3} Toulouse [210, 290] (50%)Public , (50%)Private {1, 2} Town Min Nb of pupils Max Nb of pupils Public Private Level 1 Level 2 Level 3 1 Paris 320 450 100 0 1 0 Lyon 200 380 50 50 0 1 1 Toulouse 210 290 50 50 1 1 0 LOST INFORMATION BY STANDARD ENCODING OF SYMBOLIC DATA Example 1: in decision tree By standard encoding of symbolic data, the variable size desapears as it is replaced by the « Size Min » and « Size max » The symbolic approach allows the use of the variable « size » itself and not the variables « Size Min » and « Size max » Height ≥1.8O Very good goal players have generally more than 1,80m. Very good goal players Explanatory Variable < 1.80 Mean or bad goal players Here the Height is not a minimum height or a maximum height or a mean, but the value of the age which discriminate the very good players from the others. 17 Symbolic Histogram It is not a minimum or a maximum histogram! 26 33 [ [ ] ] 22 20 27 35 30 25 [ ] 27 36 Sum of the portion of the intervals Sum of the portion of the intervals Lost of information by using standard coding of symbolic data in Principal component analysis By standard coding each concept is represented by a point By symbolic coding Each concept is represented by a surface which expresses the variation of the individuals of the extent of the concept Each concept can be described by a conjunction of properties in term of the new variables (the factors) . Height, weight… Very Weak Height, weight… Weak Very weak Mean Very Good x x Means x Very Good Good Good x x Number of goals Number of goals … Weak Classical PCM Symbolic PCM 18 AN OVERVIEW ON THE SODAS SOFTWARE THE SODAS 2 SOFTWARE FROM ASSO 19 THE MAIN STEPS FOR A SYMBOLIC DATA ANALYSIS IN SODAS . PUT THE DATA IN A RELATIONAL DATA BASE (Oracle, Acces, ...) .DEFINE A CONTEXT BY GIVING * The Units (Individuals, Households,...) * The Classes (Regions, Socioeconomics groups,...) * The descriptive variables of the units . BUILD A SYMBOLIC DATA TABLE WHERE * The units are the preceding classes * The descriptions of each class is obtained by a generalization process applied to its members APPLY SYMBOLIC DATA ANALYSIS TOOLS - Correlation, Mean, Mean Square - Histogram of a symbolic variable - Dissimilarities between symbolic descriptions - Clustering of symbolic descriptions - Principal component Analysis - Decision Tree - Graphical visualisation of Symbolic Objects 20 SODAS Software Menu Sds file Methods Report Graphs Graph Chaining Concatenation of summarized data from several populations SODAS Join two or more sets of SO based on different underlyingpopulations household DataBases school 25 SO 25 SO Concatenation 25 SO 21 DISSIMILARITY MEASURES A straightforward extension of similarity measures for classical data matrices with nominal variables. Agreement Disagreement Total Agreement Disagreement Total α=µ µ(Aj∩Bj) χ=µ µ(c(Aj)∩ ∩Bj) µ(Bj) β=µ µ(Aj∩c(Bj)) µ(Aj) δ=µ µ(c(Aj)∩ ∩c(Bj)) µ(c(Aj)) µ(c(Bj)) µ(Yi) where µ(Vj) is either the cardinality of the set Vj (if Yj is a nominal variable) or the length of the interval Vj (if Yj is a continuous variable). DE CARVALHO’S DISSIMILARITY MEASURES Five different similarity measures si, i = 1, ..., 5, are defined: si C o m p a riso n F u n c tio n R ange P ro p e rty s1 s2 s3 s4 S5 α/ (α+ β + χ) 2 α / (2 α + β + χ ) α/ (α+ 2 β + 2 χ) ½ [ α / ( α + β )+ α / ( α + χ )] α / [( α + β ) ( α + χ )] ½ [0 ,1 ] [0 ,1 ] [0 ,1 ] [0 ,1 ] [0 ,1 ] m e tric s e m i m e tric m e tric s e m i m e tric s e m i m e tric The corresponding dissimilarities are di = 1 − si. The di are aggregated on p variables by the generalised Minkowski metric, thus obtaining: SO_1 p d i ( a ,b ) = q ∑ [w d i( A j, B j) ] q j 1≤ i≤ 5 j =1 22 Graphical Representation of a dissimilarity SOE Software Graphic Representation of the symbolic description of a concept Figure21: Examples of 2D and 3D ZOOM STAR: each star describes a concept 23 Pyramid Each level is a class of concepts modeled by a symbolic description. PRODUCTS WEIGHT COST AGE PROFIT PRODUCT1 [2,4] [3,5] [4,6] [0,3] PRODUCT2 [4,5] [3,4] [1,6] [2,7] PRODUCT3 [1,6] [2,7] [5,8] [6,9] AGE PRODUCT1 PRODUCT2 6* 4* 3 * * 2 5 * COST * 4 WEIGHT 24 Symbolic Principal Component Analysis xx x xx Symbolic correlation x Principal Component Analysis 25 SYMBOLIC DESCRIPTION OF CLASSES THE SODAS 2 SOFTWARE FROM ASSO SS YY MMB B OO L IL C IC DDAATTAA mi p l o t HH i s it so t go r agmr ,a B S t a r s , G r a p h ic s D eecci si isoino ntt rreeee D D is s im ila r it ie s D is c r im in a t io n C lu s t e r in g F aFc at co tr oi ar li aAl n a l y s i s R eFgar ce tsos ri oi an l A n a l y s i s T TS S S Y M BSOYLMI B C. OOB BJ JE EC C S o ff b b eesst t S eel el ec tc ito ino n o SSy y mm b ob c O b bj jeecc tt ss .l i O NNe e wSwy Sm y mbb o l i c D a t a T a b le S y mG br oa lpi ch i cD Ssay tm oafb A n a ly s is E x p o r t a t io n t o a new D atab ase S y m b o lic O b j e c t s P r o p a g a t io n 26 NEW PROBLEMS APPEAR .QUALITY, ROBUSTNESS RELIABILITY OF THE APPROXIMATION OF A CONCEPT BY A SYMBOLIC OBJECT, .THE SYMBOLIC DESCRIPTION OF A CLASS, .THE CONSENSUS BETWEEN SYMBOLIC DESCRIPTIONS ETC.. MUCH HAS TO BE DONE: SYMBOLIC .REGRESSION, FACTORIAL ANAL. .MULTIDIMENSIONAL SCALING, .MIXTURE DECOMPOSITION, .NEURAL NETWORK, KOHONEN MAP .CONCEPT PROPAGATION between Databases…... Some recent advances: - Mixture decomposition of Distributions of distributions (by Copulas, Dirichlet and Kraft stochastic process) - Stochastic Symbolic Conceptual lattices using capacity theory - Symbolic class description -Symbolic Regression -NEXT FUTUR -- Spatial symbolic clustering by pyramids - Symbolic time series. - Consensus between different description of the same set of units 27 AIM ATTAINED FROM HUDGE DATA BASES IN AN ECONOMIC WAY WE ARE ABLE TO: -Extract new knowledge -Summarize -Concatenate -Solve confidentiality -Explain Correlation HOW? By working on HIGHER LEVEL UNITS extending Data Mining to Knowledge Mining. CONCLUSION Symbolic Data Analysis is an extension of standard data analysis therefore First principle: any Symbolic Data Mining method must have as a special case method of Data Mining on standard data. Second principle : the output must be a symbolic description or a symbolic object New problems appear as the quality, robustness and reliability of the modelling of a concept by a symbolic object, the symbolic description of a class, the consensus between symbolic descriptions etc.. Due to the intensive development of the information technology the great chapters of the standard statistics will have to be think in these new terms. 28 References SPRINGER, 2000 : “Analysis of Symbolic Data” H.H., Bock, E. Diday, Editors . 450 pages. JASA (Journal of the American Statistical Association) “From the Statistic of Data to the Statistic of Knowledge: Symbolic Data Analysis” Billard, Diday June, 2003 . Electronic Journal of S. D. A.: ESDA E. Diday, R. Verde, Y. Lechevallier editors WILEY 2006: To be published SYMBOLIC DATA ANALYSIS AND THE SODAS SOFTWARE Diday, Noirhomme (editors) SYMBOLIC DATA ANALYSIS : conceptual statistics and data mining Billard , Diday Download SODAS with documentation and examples : www.ceremade.dauphine.fr then LISE, then SODAS 29