Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Core Data Analysis Boris Mirkin / Борис Григорьевич Миркин • • Professor, Data Analysis and AI, NRU HSE, Moscow Professor Emeritus, Computer Science, UL, London Presentation: 1. 2. 3. 4. Intro Course philosophy and contents A most successful example of data analysis Differences in approaches of: • • • • Mathematical statistics Machine Learning Data Mining Data Analysis Main text (1): From Computing Reviews of the ACM June 27, 2011: “There is an unforgettable scene in the film Lawrence of Arabia where T.E. Lawrence is asked why he is obsessed with the desert. His reply: “It’s clean.” Core concepts in data analysis is clean and devoid of any fuzziness. The author presents his theses with a refreshing clarity seldom seen in a text of this sophistication. The entire text is rich in solved examples, case studies, projects, and introspective questions. Data Analysis: Methods for enhancing knowledge (2) Core Data Analysis: Methods for structural knowledge enhancing Elements of structural knowledge: ◦concepts ◦statements of relation among concepts: Ohm law in physics – quantitative Rule AB - categorical CoDA contents: Structural knowledge enhancing (2) Generic: Two pathways Two formats ◦ Summarization (concept) methods: Quantitative Principal component analysis (PCA) Categorical Cluster analysis ◦ Correlation (relation) methods: Quantitative Regression Categorical Classifier Preliminary: 1D Data Analysis Histograms 2D Data Analysis Correlation/Association Example of Data Analysis (3) Laws for planetary motion: J. Kepler (circa 1605) using data of Tycho Brahe (1546-1601): 1st Law: Planets revolve Sun in ellipses 2d Law: The further away from Sun, the slower the speed (Equal sectors in equal time) Does either Example of Data Analysis: 3d Law (3.1) Period Planet (year) Distance (average, relative to that of Earth) Mercury 0.241 0.39 Venus 0.615 0.72 Earth 1.00 1.00 Mars 1.88 1.52 Jupiter 11.8 5.20 Saturn 29.5 9.54 Uranus 84.0 19.18 Neptune 165 30.06 Pluto 248 39.44 Is there any relation between speed/period and distance? Example of Data Analysis: 3d Law (3.2) 3d Kepler’s Law: Is there any relation between speed/period and distance? Fits no line… Example of Data Analysis: 3d Law (3.3) 3d Kepler’s Law (1619): [ J. Napier invented logarithm (1614) ] Transform data: 𝟑 𝟐 Log(P)= Log(D) 2 3 P =D Example of Data Analysis: 3d Law. So what? (3.4) Three Kepler’s Laws: What is so grand about them? Substantiated theoretically by R. Hooke (1635-1703) and I. Newton (1642-1727) UNIVERSAL GRAVITATION LAW ! Mathematical equation, CORNERSTONE of science Data Analysis Differs from - Math Statistics, - Data Mining, - Machine Learning Mathematical methods for data processing are the same BUT Different questions asked Math Statistics Approach 3d Kepler’s Law: Is there any relation between speed/period and distance? Needs a probabilistic model Period=f(Distance, Error). Proof: Statistical criteria Data Mining Approach 3d Kepler’s Law: Is there any relation between speed/period and distance? Take many Fs: Period=F(Distance) - F(x)=log(xa) - F(x)= ax+b - F(x)=ax ………….. Which one is most interesting? Proof: Usage Machine Learning Approach d 3 Kepler’s Law: Relation between speed/period and distance? Needs a function f to predict: Period=f(Distance, Error) for Uranus, Pluto. Deep network for f ? - ok. Proof: Small error. Data Analysis Approach 3d Kepler’s Law: Relation between speed/period and distance? Needs a function f to add to the theory: Period=f(Distance) P2=D3 Proof: Good interpretation (like Core Data Analysis Boris Mirkin / Борис Григорьевич Миркин • • Professor, Data Analysis and AI, NRU HSE, Moscow Professor Emeritus, Computer Science, UL, London Presentation: 1. 2. 3. 4. Intro Course philosophy and contents A most successful example of data analysis Differences in approaches of • • • • Mathematical statistics Machine Learning Data Mining Data Analysis