Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Methodologies for Predictive Data-Analytic Modeling Vladimir Cherkassky Electrical & Computer Engineering University of Minnesota – Twin Cities [email protected] Presented at Chicago Chapter ASA, May 6, 2016 Electrical and Computer Engineering 1 Part 1: Motivation & Background • • • • Background - Big Data and Scientific Discovery - Philosophical Connections - Modeling Complex Systems Two Data-Analytic Methodologies Basics of VC-theory Summary 2 Growth of (biological) Data from http://www.dna.affrc.go.jp/growth/D-daily.html 3 Practical and Societal Implications • Personalized medicine • Genetic Testing: already available at $300-1K 4 Typical Applications • Genomics • Medical imaging (i.e., sMRI, fMRI) • Financial • Process Control • Marketing …… • Sparse High-Dimensional Data number of samples ~ n << d number of features • Complex systems underlying first-principle mechanism is unknown • Ill-posed nature of such problems only approximate non-deterministic models 5 What is Big Data? • Traditional IT infrastructure Data storage, access, connectivity etc. • Making sense / acting on this data Data Knowledge Decision making always predictive by nature • Focus of my presentation - Methodological aspects of data-analytic knowledge discovery 6 Scientific Discovery • Combines ideas/models and facts/data • First-principle knowledge: hypothesis experiment theory ~ deterministic, causal, intelligible models • Modern data-driven discovery: s/w program + DATA knowledge ~ statistical, complex systems • Two different philosophies 7 History of Scientific Knowledge • Ancient Greece: Logic+deductive_reasoning • Middle Ages: Deductive (scholasticism) • Renaissance, Enlightment: (1) First-Principles (Laws of Nature) (2) Experimental science (empirical data) Combining (1) + (2) problem of induction • Digital Age: the problem of induction attains practical importance in many fields 8 Induction and Predictive Learning Induction: aka inductive step, standard inductive inference Deduction: aka Prediction 9 Problem of Induction in Philosophy • • • • Francis Bacon: advocated empirical knowledge (inductive) vs scholastic David Hume: What right do we have to assume that the future will be like the past? Philosophy of Science tries to resolve this dilemma/contradiction between deterministic logic and uncertain nature of empirical data. Digital Age: growth of empirical data, and this dilemma becomes important in practice. 10 Cultural and Psychological Aspects • • • All men by nature desire knowledge Man has an intense desire for assured knowledge Assured Knowledge ~ belief in - religion (much of human history) - reason (causal determinism) - science / pseudoscience - data-analytic models (~ Big Data) - genetic risk factors … 11 Knowledge Discovery in Digital Age • • • Most information in the form of digital data Can we get assured knowledge from data? Big Data ~ technological nirvana data + connectivity more knowledge Wired Magazine, 16/07: We can stop looking for (scientific) models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. 12 • More examples … Duke biologists discovered an unusual link btwn the popular singer and a new species of fern, i.e. - bisexual reproductive stage of the ferns; - the team found the sequence GAGA when analyzing the fern’s DNA base pairs 13 Real Data Mining: Kepler’s Laws • How planets move among the stars? - Ptolemaic system (geocentric) - Copernican system (heliocentric) • Tycho Brahe (16 century) - measured positions of the planets in the sky - use experimental data to support one’s view (hypothesis) • Johannes Kepler - used volumes of Tycho’s data to discover three remarkably simple laws 14 Kepler’s Laws vs. ‘Lady Gaga’ knowledge • Both search for assured knowledge • Kepler’s Laws - well-defined hypothesis stated a priori - prediction capability - human intelligence • Lady Gaga knowledge - no hypothesis stated a priori - no prediction capability - computer intelligence (software program) - popular appeal (to widest audience) 15 Lessons from Natural Sciences • Prediction capability Prediction is hard. Especially about the future. • Empirical validation/repeatable events • Limitations (of scientific knowledge) • Important to ask the right question -Science starts from problems, and not from observations (K. Popper) -What we observe is not nature itself, but nature exposed to our method of questioning (W.Heisenberg) 16 Limitations of Scientific Method When the number of factors coming into play in a phenomenological complex is too large, scientific method in most cases fails us. We are going to be shifting the mix of our tools as we try to land the ship in a smooth way onto the aircraft carrier. Recall: the Ancient Greeks scorned ‘predictability’ 17 Important Differences Albert Einstein: • It might appear that there are no methodological differences between astronomy and economics: scientists in both fields attempt to discover general laws for a group of phenomena. But in reality such differences do exist. • The discovery of general laws in economics is difficult because observed economic phenomena are often affected by many factors that are very hard to evaluate separately. • The experience which has accumulated during the civilized period of human history has been largely influenced by causes which are not economic in nature. 18 Flexible Data Modeling Approaches • Late 1980’s Artificial Neural Networks • Mid 1990’s Data Mining • Late 1990’s Support Vector Machines • Mid 2000’s • Early 2010’s Big Data Deep Learning (reincarnated NNs) NOTE 1: no clear boundary btwn science vs marketing NOTE 2: fragmentation and ‘soft’ plagiarism 19 Methodologies for Data Modeling • • • • • The field of Pattern Recognition is concerned with the automatic discovery of regularities in data. Data Mining is the process of automatically discovering useful information in large data repositories. This book (on Statistical Learning) is about learning from data. The field of Machine Learning is concerned with the question of how to construct computer programs that automatically improve with experience. Artificial Neural Networks perform useful computations through the process of learning. (1) focus on algorithms/ computational procedures (2) all fields estimate useful models from data, i.e. extract knowledge from data (the same as in classical statistics) Real Issues: what is ‘useful’? What is ‘knowledge’? 20 What is ‘a good model’? • • • • All models are mental constructs that (hopefully) relate to real world Two goals of data-analytic modeling: - explanation (of past/ available data) - prediction (of future data) All good models make non-trivial predictions Good data-driven models can predict well, so the goal is to estimate predictive models aka generalization, inductive inference Importance of methodology/assumptions 21 The Role of Statistics • • • • Dilemma: Mathematical or natural science? Traditionally, heavy emphasis on parametric modeling and math proofs Conservative attitude: slow acceptance of modern computational approaches Under-appreciation of predictive modeling William Edwards Deming: The only useful function of a statistician is to make predictions, and thus to provide a basis for action. 22 BROADER QUESTIONS • • • • • Can we trust models derived from data? What is scientific knowledge? First-principle knowledge vs empirical knowledge vs beliefs Understanding uncertainty and risk Historical view: how explosive growth of data-driven knowledge changes human perception of uncertainty 23 Scientific Understanding of Uncertainty • Very recent: most probability theory and statistics developed in the past 100 years. Most apps in the last 50-60 years • Dominant approach in classical science is causal determinism, i.e. the goal is to estimate the true model (or cause) ~ system identification Classical statistics: the goal is to estimate probabilistic model underlying the data, i.e. system identification • 24 Scientific Understanding (cont’d) • Albert Einstein: The scientist is possessed by the sense of universal causality. The future, to him, is every whit as necessary and determined as the past. • Albert Einstein: God does not play dice • Stephen Hawking: God not only plays dice. He sometimes throws the dice where they cannot be seen 25 Modeling Complex Systems • • • First-principle scientific knowledge: - deterministic - simple models (~ few main concepts) This knowledge has been used to design complex systems: computers, airplanes etc. It has not been successful for modeling and understanding complex systems: - weather prediction/ climate modeling - human brain - stock market etc. 26 Modeling Complex Systems • A. Einstein: When the number of factors coming into play in a phenomenological complex is too large, scientific method in most cases fails us. One need only think of the weather, in which case prediction even for a few days is impossible… Occurences in this domain are beyond the reach of exact prediction because of the variety of factors in operation, not because of any lack of order in nature. 27 How to Model Complex Systems ? • Conjecture 1 first-principle /system identification approach cannot be used • Conjecture 2 system imitation approach, i.e. modeling certain aspects of a system, may be used statistical models Examples: stock trading, medical diagnosis 28 Three Types of Knowledge • • • Growing role of empirical knowledge Classical philosophy of science differentiates only between (first-principle) science and beliefs (demarcation problem) Importance of demarcation btwn empirical knowledge and beliefs in modern apps 29 Beliefs vs Scientific Theories Men have lower life expectancy than women • Because they choose to do so • Because they make more money (on average) and experience higher stress managing it • Because they engage in risky activities • Because ….. Demarcation problem in philosophy 30 Popper’s Demarcation Principle • First-principle scientific theories vs. beliefs or metaphysical theories • Risky prediction, testability, falsifiability Karl Popper: Every true (inductive) theory prohibits certain events or occurences, i.e. it should be falsifiable 31 Popper’s conditions for scientific hypothesis - Should be testable - Should be falsifiable Example 1: Efficient Market Hypothesis(EMH) The prices of securities reflect all known information that impacts their value Example 2: We do not see our noses, because they all live on the Moon 32 Observations, Reality and Mind Philosophy is concerned with the relationship btwn - Reality (Nature) - Sensory Perceptions - Mental Constructs (interpretations of reality) Three Philosophical Schools • REALISM: - objective physical reality perceived via senses - mental constructs reflect objective reality • IDEALISM: - primary role belongs to ideas (mental constructs) - physical reality is a by-product of Mind • INSTRUMENTALISM: - the goal of science is to produce useful theories Which one should be adopted (by scientists+ engineers)?? 33 Three Philosophical Schools • Realism (materialism) • Idealism • Instrumentalism 34 Application Example: predicting gender of face images • Training data: labeled face images Male etc. Female etc. 35 Predicting Gender of Face Images • Input ~ 16x16 pixel image • Model ~ indicator function f(x) separating 256-dimensional pixel space in two halves Model should predict well new images Difficult machine learning problem, but easy for human recognition • • 36 Two Philosophical Views (Vapnik, 2006) • System Identification (~ Realism) - estimate probabilistic model (of true class densities) from available data - this view is adopted in classical statistics • System Imitation (~ Instrumentalism) - need only to predict well i.e. imitate specific aspect of unknown system; - multiplicity of good models; - can they be interpreted and/or trusted? 37 OUTLINE • • • • Background Two Data-Analytic Methodologies - inductive inference step - two approaches to statistical inference - advantages of predictive approach - Example: market timing of mutual funds Basics of VC-theory Summary 38 Statistical vs Predictive Modeling EMPIRICAL DATA KNOWLEDGE, ASSUMPTIONS STATISTICAL INFERENCE PROBABILISTIC MODELING PREDICTIVE APPROACH 39 Inductive Inference Step • Inductive inference step: Data model ~ ‘uncertain inference’ • Is it possible to make uncertain inferences mathematically rigorous? (Fisher 1935) • Many types of ‘uncertain inferences’ - hypothesis testing - maximum likelihood - risk minimization …. each comes with its own methodology/assumptions 40 Two Data-Analytic Methodologies • Many existing data-analytic methods but lack of methodological assumptions • Two theoretical developments - classical statistics ~ mid 20-th century - Vapnik-Chervonenkis (VC) theory ~ 1970’s • Two related technological advances - applied statistics (R. Fisher) - machine learning, neural nets, data mining etc. 41 Statistical vs Predictive Approach • Binary Classification problem estimate decision boundary from training data x i , y i Assuming class distributions P(x,y) were known: 10 8 (x1,x2) space 6 x2 4 2 0 -2 -4 -6 -2 0 2 4 x1 6 8 10 42 Classical Statistical Approach: Realism (1) parametric form of unknown distribution P(x,y) is known (2) estimate parameters of P(x,y) from the training data (3) Construct decision boundary using estimated distribution and given misclassification costs 10 Estimated boundary 8 6 4 Unknown P(x,y) can be accurately estimated from available data x2 Modeling assumption: 2 0 -2 -4 -6 -2 0 2 4 x1 6 8 10 43 Critique of Statistical Approach (Leo Breiman) • • • • The Belief that a statistician can invent a reasonably good parametric class of models for a complex mechanism devised by nature Then parameters are estimated and conclusions are drawn But conclusions are about - the model’s mechanism - not about nature’s mechanism Many modern data-analytic sciences (economics, life sciences) have similar flaws 44 Predictive Approach: Instrumentalism (1) parametric form of decision boundary f(x,w) is given (2) Explain available data via fitting f(x,w), or minimization of some loss function (i.e., squared error) (3) A function f(x,w*) providing smallest fitting error is then used for predictiion 10 8 Estimated boundary 6 4 x2 Modeling assumptions 2 - Need to specify f(x,w) and 0 -2 loss function a priori. -4 - No need to estimate P(x,y) -6 -2 0 2 4 x1 6 8 10 45 Classification with High-Dimensional Data • Digit recognition 5 vs 8: each example ~ 16 x 16 pixel image 256-dimensional vector x Medical Interpretation - Each pixel ~ genetic marker - Each patient (sample) described by 256 genetic markers - Two classes ~ presence/ absence of a disease • Estimation of P(x,y) with finite data is not possible • Accurate estimation of decision boundary in 256-dim. space is possible, using just a few hundred samples 46 Common Modeling Assumptions • Future is similar to Past - training and test data from the same distribution - i.i.d. training data - large test set • Prediction accuracy ~ given loss function - misclassification costs (classification problems) - squared loss (regression problems) - etc. • Proper formalization~type of learning problem e.g., classification is used in many applications 47 Importance of Complexity Control Regression estimation for known parameterization • Ten training samples y x 2 N (0, 2 ), where 2 0.25 • Fitting linear and 2-nd order polynomial: 48 Statistical vs Predictive: issues Predictive approach - estimates certain properties of unknown P(x,y) that are useful for predicting y - has solid theoretical foundations (VC-theory) - successfully used in many apps BUT its methodology + concepts are different from classical statistics: - formalization of the learning problem (~ requires understanding of application domain) - a priori specification of a loss function - interpretation of predictive models may be hard - multiplicity of models estimated from the same data 49 Predictive Methodology (VC-theory) • Method of questioning is - the learning problem setting(inductive step) - driven by application requirements • Standard inductive learning commonly used (may not be the best choice) • Good generalization depends on two factors - (small) training error - small VC-dimension ~ large ‘falsifiability’ 50 Timing of International Funds • International mutual funds - priced at 4 pm EST (New York time) - reflect price of foreign securities traded at European/ Asian markets - Foreign markets close earlier than US market Possibility of inefficient pricing. Market timing exploits this inefficiency. • Scandals in the mutual fund industry ~2002 • Solution adopted: restrictions on trading 51 Binary Classification Setting • • • • TWIEX ~ American Century Int’l Growth Input indicators (for trading) ~ today - SP 500 index (daily % change) ~ x1 - Euro-to-dollar exchange rate (% change) ~ x2 Output : TWIEX NAV (% change)~y next day Trading rule: D(x) = 0~Sell, D(x)=1 ~ Buy Model parameterization (fixed): - linear g (x, w) w1 x1 w2 x2 w0 - quadratic g (x, w) w x w x w x w x w x x w Decision rule (estimated from training data): 1 1 • 2 2 2 3 1 2 4 2 5 1 2 0 D(x) Ind ( g (x, w )) Buy /Sell decision (+1 / 0) 52 Methodological Assumptions • When a trained model can predict well? (1) Future/test data is similar to training data i.e., use 2004 period for training, and 2005 for testing (2) Estimated model is ‘simple’ and provides good performance during training period i.e., the trading strategy is consistently better than buyand-hold during training period • Loss function (to measure performance): L(x, y ) D(x) y where D(x) Ind ( g (x, w )) 53 Empirical Results: 2004 -2005 data Linear model Training data 2004 Training period 2004 30 1.5 25 1 20 Cumulative Gain /Loss (%) EURUSD ( %) 2 0.5 0 -0.5 -1 15 10 5 0 -5 -1.5 -2 -2 Trading Buy and Hold -1.5 -1 -0.5 0 0.5 SP500 ( %) 1 1.5 2 -10 0 50 100 150 200 250 Days can expect good performance with test data 54 Empirical Results: 2004 -2005 data Linear model Test data 2005 Test period 2005 2 25 Cumulative Gain /Loss (%) 1.5 1 EURUSD( %) 0.5 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 SP500( %) 0.5 1 1.5 2 20 Trading Buy and Hold 15 10 5 0 -5 0 50 100 150 200 250 Days confirmed good prediction performance 55 Empirical Results: 2004 -2005 data Quadratic model Training data 2004 Training period 2004 2 35 1.5 30 25 Cumulative Gain /Loss (%) EURUSD( %) 1 0.5 0 -0.5 20 15 10 5 -1 0 -1.5 -5 -2 -2 Trading Buy and Hold -10 0 -1.5 -1 -0.5 0 0.5 SP500( %) 1 1.5 2 50 100 150 200 250 Days can expect good performance with test data 56 Empirical Results: 2004 -2005 data Quadratic model Test data 2005 Test period 2005 30 2 1.5 25 Cumulative Gain/Loss (%) EURUSD( %) 1 0.5 0 -0.5 Trading Buy and Hold 20 15 10 5 -1 0 -1.5 -2 -2 -1.5 -1 -0.5 0 0.5 SP500( %) 1 1.5 2 -5 0 50 100 150 200 250 Days confirmed good test performance 57 Interpretation vs Prediction • Two good trading strategies estimated from 2004 training data 2 2 1.5 1.5 1 0.5 0 0.5 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -2 • • EURUSD( %) EURUSD ( %) 1 -1.5 -1 -0.5 0 0.5 SP500 ( %) 1 1.5 2 -2 -2 -1.5 -1 -0.5 0 0.5 SP500( %) 1 1.5 2 Both models predict well for test period 2005 Which one is ‘true’? 58 DISCUSSION • Can this trading strategy be used now ? - NO, this market timing strategy becomes ineffective since ~ year 2008. The reason is changing statistical characteristics of the market - YES, it can be used occasionally. • Hypocrisy of the mutual fund industry Story 1: markets are very efficient, so individual investors cannot trade successfully and outperform the market indices (such as SP500) Story 2: market timing is harmful for mutual funds, so such abusive trading activity should be banned Story 3: restrictions also apply to domestic funds 59 Interpretation of Predictive Models • Humans cannot provide interpretation even if they can make good prediction Each input ~ 28 x 28 pixel image 784-dimensional input x • Interpretation of black-box models Not unique/ subjective Depends on chosen parameterization (method) 60 Interpretation of SVM models How to interpret high-dimensional models? (say, SVM model) Strategy 1: dimensionality reduction/feature selection prediction accuracy usually suffers Strategy 2: approximate SVM model via a set of rules (using rule induction, decision tree etc.) does not scale well for high-dim. models 61 OUTLINE • • • • Background Two Data-Analytic Methodologies: - Classical statistics vs predictive learning Basics of VC-theory - History and Overview - Inductive problem setting - Conditions for consistency of ERM - Generalization Bounds and SRM Summary 62 History and Overview • SLT aka VC-theory (Vapnik-Chervonenkis) • Theory for estimating dependencies from finite samples (predictive learning setting) • Based on the risk minimization approach • All main results originally developed in 1970’s for classification (pattern recognition) – why? but remained largely unknown • Recent renewed interest due to practical success of Support Vector Machines (SVM) 63 History and Overview(cont’d) MAIN CONCEPTUAL CONTRIBUTIONS • Distinction between the problem setting, inductive principle and learning algorithms • Direct approach to estimation with finite data (KID principle) • Math analysis of Empirical Risk Minimization • Two factors responsible for generalization: - empirical risk (fitting error) - complexity(capacity) of approximating functions 64 Inductive Learning: problem setting • The learning machine observes samples (x ,y), and returns an estimated response yˆ f (x, w) • Two modes of inference: identification vs imitation • Risk Loss(y, f(x,w)) dP(x,y) min 65 The Problem of Inductive Learning • Given: finite training samples Z={(xi, yi),i=1,2,…n} choose from a given set of functions f(x, w) the one that approximates best the true output. (in the sense of risk minimization) Concepts and Terminology • approximating functions f(x, w) • (non-negative) loss function L(f(x, w),y) • expected risk functional R(Z,w) Goal: find the function f(x, wo) minimizing R(Z,w) when the joint distribution P(x,y) is unknown. 66 Empirical Risk Minimization • ERM principle in model-based learning – Model parameterization: f(x, w) – Loss function: L(f(x, w),y) 1 n – Estimate risk from data:Remp (w ) n L( f (x i , w ), yi ) i 1 – Choose w* that minimizes Remp • Statistical Learning Theory developed from the theoretical analysis of ERM principle under finite sample settings 67 Consistency/Convergence of ERM • Empirical Risk known but Expected Risk unknown • Asymptotic consistency requirement: under what (general) conditions models providing min Empirical Risk will also provide min Prediction Risk, when the number of samples grows large? 68 Consistency of ERM • Necessary & sufficient condition: The set of possible models f(x, w) has limited ability to fit (explain) finite number of samples ~ VC-dimension of this set of functions is finite • Generalization (prediction) is possible only if a set of models has limited complexity (VC-dim) • VC-dimension - measures the ability (of a set of functions) to fit or ‘explain’ available finite data. - similar to DoF for linear parameterization, but different for nonlinear 69 • VC-dimension of a set of indicator functions: - Shattering: if n samples can be separated by a set of indicator functions in all 2^^n possible ways, then these samples can be shattered by this set of functions. - A set of functions has VC-dimension h if there exist h samples that can be shattered by this set of functions, but there does not exist h+1 samples that can be shattered. • Example: VC-dimension of linear indicator functions (d=2) Z2 Z2 * * * * * * * Z1 Z1 70 • VC-dimension of a set of linear hyperplanes is h=d+1 • VC-dimension of linear slab or delta-margin hyperplanes is controlled by the width (delta) 71 • Example: VC-dimension of a linear combination of fixed basis functions (i.e. polynomials, Fourier expansion etc.) Assuming that basis functions are linearly independent, the VC-dim equals the number of basis functions. • Counter- Example: single parameter but infinite VCdimension. 72 Generalization Bounds • Bounds for learning machines (implementing ERM) evaluate the difference btwn (unknown) risk and known empirical risk, as a function of sample size n and the properties of the loss functions (approximating fcts). • Classification: the following bound holds with probability of 1 for all approximating functions R( ) Remp ( ) Remp ( ), n / h, ln / n where is called the confidence interval • Regression: the following bound holds with probability of 1 for all approximating functions R( ) Remp ( ) / 1 c 73 Structural Risk Minimization • Analysis of generalization bounds R( ) Remp ( ) Remp ( ), n / h, ln / n suggests that when n/h is large, the term is small R( ) ~ Remp ( ) This leads to parametric modeling approach (ERM) • When n/h is not large (say, less than 20), both terms in the right-hand side of VC- bound need to be minimized make the VC-dimension a controlling variable • SRM = formal mechanism for controlling model complexity Set of loss functions has nested structure Sk L(z, ), k S1 S2 ... Sk ... such that h1 h2... hk ... structure ~ complexity ordering 74 Model Complexity Control via SRM • An upper bound on the true risk and the empirical risk, as a function of VC-dimension h (for fixed sample size n) 75 VC Approach to Predictive Learning Goals of Predictive Learning - explain (or fit) available training data - predict well future (yet unobserved) data (similar to biological learning) Main Practical Result of VC-theory: If a model explains well past data AND is simple, then it can generalize (predict) 76 VC Approach to High-Dimensional Data Strategy for modeling high-dimensional data: Find a model f(x) that explains past data AND has low VC-dimension, even when dim. is large SVM approach Large margin = Low VC-dimension ~ easy to falsify 77 OUTLINE • • • • Background Two Data-Analytic Methodologies: - Classical statistics vs predictive learning Basics of VC-theory Summary 78 SUMMARY (A) • Predictive Data-Analytic Modeling: usually on the boundary btwn trivial and impossible • Asking the right question ~ problem setting - depends on modeler’s creativity/ intelligence - requires application domain knowledge - cannot be formalized • • Modeling Assumptions (not just algorithm) Interpretation of black-box models - very difficult (requires domain knowledge) - multiplicity of ‘good’ models 79 SUMMARY (B) • • • • Common misconception: data-driven models are intrinsically objective Explanation bias (favors simplicity+causality) - for psychological + cultural reasons Confirmation bias (only positive findings) – encouraged by funding agencies When all these human biases are incorporated into data-analytic modeling: - many ‘interesting’ discoveries - little objective value - no real predictive value 80 References • V. Vapnik, Estimation of Dependencies Based on Empirical Data. Empirical Inference Science: Afterword of 2006 Springer • L. Breiman, Statistical Modeling: the Two Cultures, Statistical Science, vol. 16(3), pp. 199-231, 2001 • A. Einstein, Ideas and Opinions, Bonanza Books, NY 1954 • V. Cherkassky and F. Mulier, Learning from Data, second edition, Wiley, 2007 • V. Cherkassky and S. Dhar, Market timing of international mutual funds: a decade after the scandal, Proc. CIFEr 2012 81