Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit V. Image interpretation Dr. Felipe Orihuela-Espina An apology… ☞ This unit contains some material that I prepared for different talks. Some of the original slides were in Spanish and thus there might remain some nontranslated slides. I’m working on the translation but also on updating the examples to medical images. Please accept my apologies for the inconvenience this may cause. © 2015. Dr. Felipe Orihuela-Espina 2 Outline Causality Interpreting statistics Data mining Pattern recognition, machine learning, knowledge discovery Knowledge representation Interpretation guidelines © 2015. Dr. Felipe Orihuela-Espina 3 The three levels of analysis Data analysis often comprises 3 steps: Processing: Output domain matches input domain Preparation of data; data validation, cleaning, normalization, etc… Analysis: Reexpress data in a more convenient domain Summarization of data: Feature extraction, computation of metrics, statistics, etc… Understanding: Abstraction to achieve knowledge generation Interpretation of data: Concept validation, reexpresion in natual language, etc. 22/05/2017 INAOE 4 The three levels of analysis Processing • f:XX’ such that X and X share the same space • E.g.: Apply a filter to a signal or image and you get another signal or image Analysis • f:XY such that X and Y do not share the same space • E.g.: Apply a mask to a signal or image and you get the discontinuities, edges or a segmentation Interpretation (a.k.a. Understanding) • f:XH such that H is natural language • E.g.: Apply a model to a signal or image and you get some knowledge useful for a human expert 22/05/2017 INAOE 5 Typical fMRI processing Figure source: [Wellcome Trust; Tutorial on SPM] 6 Typical fNIRS processing Raw Detrend Low pass filtering (decimation) Averaging Decimated and detrended 7 CAUSALITY © 2015. Dr. Felipe Orihuela-Espina 8 Cogito ergo sum Cause Effect Cogito Sum Felipe Orihuela-Espina (INAOE) 9 Causation defies (1st level) logic… Input: “If the floor is wet, then it rained” “If we break this bottle, the floor will get wet” Logic output: “If we break this bottle, then it rained” Example taken from [PearlJ1999] Felipe Orihuela-Espina (INAOE) 10 Causality requires time! “…there is little use in the practice of attempting to dicuss causality without introducing time” [Granger,1969] …whether philosphical, statistical, econometrical, topological, etc… Felipe Orihuela-Espina (INAOE) 11 Why is causality so problematic? A very silly example Cannot be computed from the data alone Systematic temporal precedence is not sufficient Co-ocurrence is not sufficient It is not always a direct relation (indirect relations, transitivity/mediation, etc may be present), let alone linear… It may occur across frequency bands YOU NAME IT HERE… Which process causes which? Causality is so difficult that “it would be very healthy if more researchers abandoned thinking of and using terms such as cause and effect” [Muthen1987 in PearlJ2011] Felipe Orihuela-Espina (INAOE) 12 A real example An ECG [KaturaT2006] only claim that there are interrelations (quantified using MI) [OrihuelaEspinaF2010] Felipe Orihuela-Espina (INAOE) 13 Statistical dependence Statistical dependence is a type of relation between any two variables [WermuthN1998]: if we find one, we can expect to find the other Statistical independence Association (symmetric or assymettric) Deterministic dependence The limits of statistical dependence Statistical independence: The distribution of one variable is the same no matter at which level changes occur on in the other variable X and Y are independent P(X∩Y)=P(X)P(Y) Deterministic dependence: Levels of one variable occur in an exactly determined way with changing levels of the other. Association: Intermediate forms of statistical dependency Symmetric Asymmetric (a.k.a. response) or directed association Felipe Orihuela-Espina (INAOE) 14 Associational Inference ≡ Descriptive Statistics!!! The most detailed information linking two variables is given by the joint distribution: P(X=x,Y=y) The conditional distribution describes how the values of X changes as Y varies: P(X=x|Y=y)=P(X=x,Y=y)/P(Y=y) Associational statistics is simply descriptive (estimates, regressions, posterior distributions, etc…) [HollandPW1986] Example: Regression of X on Y is the conditional expectation E(X|Y=y) Felipe Orihuela-Espina (INAOE) 15 Regression and Correlation; two common forms of associational inference Regression Analysis: “the study of the dependence of one or more response variables on explanatory variables” [CoxDR2004] Strong regression ≠ causality [Box1966] Prediction systems ≠ Causal systems [CoxDR2004] Correlation is a relation over mean values; two variables correlate as they move over/under their mean together (correlation is a ”normalization” of the covariance) Correlation ≠ Statistical dependence If r=0 (i.e. absence of correlation), X and Y are statistically independent, but the opposite is not true [MarrelecG2005]. Correlation ≠ Causation [YuleU1900 in CoxDR2004, WrightS1921] Yet, causal conclusions from a carefully design (often synonym of randomized) experiment are often (not always) valid [HollandPW1986, FisherRA1926 in CoxDR2004] Felipe Orihuela-Espina (INAOE) 16 Coherence: yet another common form of associational inference Often understood as “correlation in the frequency domain” Cxy = |Gxy|2/(GxxGyy) where Gxy is the cross-spectral density, i.e. coherence is the ratio between the (squared) correlation coefficient and the frequency components. Coherence measures the degree to which two series are related Coherence alone does not implies causality! The temporal lag of the phase difference between the signals must also be considered. Felipe Orihuela-Espina (INAOE) 17 Statistical dependence vs Causality Statistical dependence provide associational relations and can be expressed in terms of a joint distribution alone Causal relations CANNOT be expressed on terms of statistical association alone [PearlJ2009] Associational inference ≠ Causal Inference [HollandPW1986, PearlJ2009] …ergo, Statistical dependence ≠ Causal Inference In associational inference, time is merely operational Felipe Orihuela-Espina (INAOE) 18 Causality requires directionality! Algebraic equations, e.g. regression “do not properly express causal relationships […] because algebraic equations are symmetrical objects […] To express the directionality of the underlying process, Wright augmented the equation with a diagram, later called path diagram in which arrows are drawn from causes to effects” [PearlJ2009] Feedback and instantaneous causality in any case are a double causation. Felipe Orihuela-Espina (INAOE) 19 From association to causation Barriers between classical statistics and causal analysis [PearlJ2009] 1. Coping with untested assumptions and changing conditions 2. Inappropiate mathematical notation Felipe Orihuela-Espina (INAOE) 20 Causality Stronger Zero-level causality: a statistical association, i.e. non-independence which cannot be removed by conditioning on allowable alternative features. i.e. Granger’s, Topological First-level causality: Use of a treatment over another causes a change in outcome i.e. Rubin´s, Pearl’s Weaker Second-level causality: Explanation via a generating process, provisional and hardly lending to formal characterization, either merely hypothesized or solidly based on evidence i.e. Suppe’s, Wright’s path analysis e.g. Smoking causes lung cancer Inspired from [CoxDR2004] Felipe Orihuela-Espina (INAOE) It is debatable whether second level causality is indeed causality 21 Variable types and their joint probability distribution Variable types: Background variables (B) – specify what is fixed Potential causal variables (C) Intermediate variables (I) – surrogates, monitoring, pathways, etc Response variables (R) – observed effects Joint probability distribution of the variables: P(RICB) = P(R|ICB) P(I|CB) P(C|B) P(B) …but it is possible to integrate over I (marginalized) P(RCB) = P(R|CB) P(C|B) P(B) In [CoxDR2004] Felipe Orihuela-Espina (INAOE) 22 Granger’s Causality Granger´s causality: Y is causing X (YX) if we are better to predict X using all available information (Z) than if the information apart of Y had been used. The groundbreaking paper: Granger “Investigating causal relations by econometric models and cross-spectral methods” Econometrica 37(3): 424-438 Granger’s causality is only a statement about one thing happening before another! Rejects instantaneous causality Considered as slowness in recording of information Felipe Orihuela-Espina (INAOE) Sir Clive William John Granger (1934 –2009) – University of Nottingham – Nobel Prize Winner 23 Granger’s Causality “The future cannot cause the past” [Granger 1969] “the direction of the flow of time [is] a central feature” Feedback is a double causation; XY and YX denoted XY “causality…is based entirely on the predictability of some series…” [Granger 1969] Causal relationships may be investigated in terms of coherence and phase diagrams Felipe Orihuela-Espina (INAOE) 24 Topological causality “A causal manifold is one with an assignment to each of its points of a convex cone in the tangent space, representing physically the future directions at the point. The usual causality in MO extends to a causal structure in M’.” [SegalIE1981] Causality is seen as embedded in the geometry/topology of manifolds Causality is a curve function defined over the manifdld The groundbreaking book: Segal IE “Mathematical Cosmology and Extragalactic Astronomy” (1976) I am not sure whether Segal is the father of causal manifolds, but his contribution to the field is simply overwhelming… Irving Ezra Segal (1918-1998) Professor of Mathematics at MIT Felipe Orihuela-Espina (INAOE) 25 Causal (homogeneous Lorentzian) Manifolds: The topological view of causality The cone of causality [SegalIE1981,RainerM1999, MosleySN1990, KrymVR2002] Future Instant present Past Felipe Orihuela-Espina (INAOE) 26 Rubin Causal Model Rubin Causal Model: “Intuitively, the causal effect of one treatment relative to another for a particular experimental unit is the difference between the result if the unit had been exposed to the first treatment and the result if, instead, the unit had been exposed to the second treatment” The groundbreaking paper: Rubin “Bayesian inference for causal effects: The role of randomization” The Annals of Statistics 6(1): 34-58 The term Rubin causal model Donald B Rubin (1943 – ) – John L. Loeb Professor of Stats at Harvard was coined by his student Paul Holland Felipe Orihuela-Espina (INAOE) 27 Rubin Causal Model Causality is an algebraic difference: treatment causes the effect Ytreatment(u)-Ycontrol(u) …or in other words; the effect of a cause is always relative to another cause [HollandPW1986] Rubin causal model establishes the conditions under which associational (e.g. Bayesian) inference may infer causality (makes assumptions for causality explicit). Felipe Orihuela-Espina (INAOE) 28 Fundamental Problem of Causal Inference Only Ytreatment(u) or Ycontrol(u) can be observed on a phenomena, but not both. Causal inference is impossible without making untested assumptions …yet causal inference is still possible under uncertainty [HollandPW1986] (two otherwise identical populations u must be prepared and all appropiate background variables must be considered in B). Again! (see slide #15“Statistical dependence vs Causality”); Causal questions cannot be computed from the data alone, nor from the distributions that govern the data [PearlJ2009] Felipe Orihuela-Espina (INAOE) 29 Relation between Granger, Rubin and Suppes causalities Granger Rubin’s model Cause (Treatment) Y t Effect X Ytreatment(u) All other available information Z Z (pre-exposure variables) Granger’s noncausality: X is not Granger cause of Y (relative to information in Z) X and Y are conditionally independent (i.e. P(Y|X,Z)=P(Y|Z)) Granger’s noncausality is equal to Suppes spurious case Modified from [HollandPW1986] Felipe Orihuela-Espina (INAOE) 30 Pearl’s statistical causality (a.k.a. structural theory) “Causation is encoding behaviour under intervention […] Causality tells us which mechanisms [stable functional relationships] is to be modified [i.e. broken] by a given action” [PearlJ1999_IJCAI] Causality, intervention and mechanisms can be encapsulated in a causal model The groundbreaking book: Pearl J “Causality: Models, Reasoning and Inference” (2000)* Pearl’s results do establish conditions under which first level causal conclusions are possible [CoxDR2004] Felipe Orihuela-Espina (INAOE) * With permission of his 1995 Biometrika paper masterpiece Judea Pearl (1936-) Professor of computer science and statistics at UCLA Sewall Green Wright (1889-1988) – Father of path analysis (graphical rules) 31 [PearlJ2000, Lauritzen2000, DawidAP2002] Statistical causality Conditioning vs Intervening [PearlJ2000] Conditioning: P(R|C)=P(R|CB)P(B|C) useful but innappropiate for causality as changes in the past (B) occur before intervention (C) Intervention: P(R║C)=P(R|CB)P(B) Pearl´s definition of causality Underlying assumption: The distribution of R (and I) remains unaffected by the intervention. Watch out! This is not trivial serious interventions may distort all relations [CoxDR2004] βCB=0 C╨B P(R|C)=P(R║C) Structural coefficient Conditional independence i.e. there is no difference between conditioning and intervention Felipe Orihuela-Espina (INAOE) 32 INTERPRETING STATISTICS © 2015. Dr. Felipe Orihuela-Espina 33 Estadística inferencial “If your experiment needs statistics, you ought to have done a better experiment.” 22/05/2017 INAOE Lord Sir Ernest Rutherford of Nelson Neo Zelandés / Británico, 1871-1937 Padre de la física nuclear Descubridor del protón Nobel de Química 1908 34 Modelado Modelo determinista Valores de variables dependientes Valores de variables independientes y/o controladas Modelo estocástico 22/05/2017 INAOE Esperanza de variables dependientes 35 Regresión lineal univariable En dependencia estocástica podemos llevar a cabo 2 tipos de análisis estrechamente relacionados: Análisis de regresión Permite definir el “tipo” (lineal, exponencial/logarítmica, hiperbólica, etc) de relación entre las variables Produce una ecuación que describe la relación entre variables (cercana a la dependencia funcional) Análisis de correlación Permite definir el grado y consistencia de in/dependencia, o grado de asociación, entre las variables Produce un valor que resume la fuerza de la relación entre las variables 22/05/2017 INAOE 36 Análisis de Regresión El análisis de regresión es un conjunto de técnicas estadísticas para estimar relaciones entre variables. El análisis de regresión es ampliamente usado para: A. Inferencia de relaciones entre variables (modelado) y B. Predicción de nuevos desenlaces/observaciones (simulación) El aprendizaje máquina está fuertemente relacionado con el análisis de regresión. Ejemplo: Los clasificadores son modelos regresivos (discretos o continuos). 22/05/2017 INAOE 37 Regresión lineal univariable (determinista) Variable dependiente Pendiente Variable independiente Intersección (Corte en el eje de ordenada) Una notación un poco más general Parámetros 22/05/2017 INAOE 38 Regresión lineal univariable (estocástica) Modelo determinista En presencia de incertidumbre Modelo estocástico 22/05/2017 INAOE 39 Regresión lineal univariable (estocástica) Modelo estocástico Expresando la incertidumbre (error) de forma explícita para cada observación El error es la diferencia de la observación i-ésima de su esperanza. En otras palabras, la diferencia entre la medición y el valor real (Yi-E[X]). 22/05/2017 INAOE 40 Regresión lineal multivariable (estocástica) Para j variables independientes/ controladas: A este se le conoce como el modelo lineal aditivo que relaciona una variable dependiente con j variables independientes. Observa que lo que no se conocen son los coeficientes βi. El modelado consiste en calcular o estimar estos coeficientes (a menudo llamados parámetros) 22/05/2017 INAOE 41 Regresión lineal multivariable (estocástica) En general, para n casos, se forma un sistema de ecuaciones: 22/05/2017 INAOE 42 Modelo general lineal Podemos expresar el modelo de regresión múltiple anterior con matrices de una forma más compacta: donde: 22/05/2017 Estos 1 son necesarios para la intersección con el eje de ordenadas β0. Hay veces que el modelo se presenta sin término constante, y en consecuencia esta columna INAOE desaparece 43 Modelo general lineal nx1 22/05/2017 nx(j+1) INAOE (j+1) x1 nx1 44 Covarianza La covarianza expresa la tendencia en la relación (lineal) entre las variables Si sXY>0 ⇒ cuando X crece, Y crece Si sXY<0 ⇒ cuando X crece, Y decrece Figura de: [http://biplot.usal.es/ALUMNOS/BIOLOGIA/5BIOLOGIA/Regresionsimple.pdf] 22/05/2017 INAOE 45 Coeficiente de correlación El coeficiente de correlación de Pearson es un índice que mide la magnitud de la asociación lineal entre dos variables aleatorias cuantitativas, y corresponde con la normalización de la covarianza: Covarianza 22/05/2017 INAOE Desviaciónes estándar 46 Coeficiente de correlación Figura de: [en.wikipedia.org] 22/05/2017 INAOE 47 Ajuste Coeficiente de determinación R2: El coeficiente de determinación no es el coeficiente de correlación lineal muestral r (o coeficiente de correlación de Pearson), pero está estrechamente relacionado De hecho, como puedes imaginar; uno es el cuadrado del otro… Lectura recomendada: Mis diapositivas de la asignatura de estadística 22/05/2017 INAOE 48 Ajuste Figura de: [Wolfram MathWorld] 22/05/2017 INAOE 49 Coeficiente de correlación ¡Cuidado! Esta tabla según yo está obsoleta; algunos de los indicados como “no desarrollados” ya han sido desarrollado. Pero no he tenido tiempo de confirmarlo. Tabla obtenida de: [http://pendientedemigracion.ucm.es/info/mide/docs/Otrocorrel.pdf] 22/05/2017 INAOE 50 Citas sobre la significancia estadística [BlandM1996] “Acceptance of statistics, though gratifying to the medical statistician, may even have gone too far. More than once I have told a colleague that he did not need me to prove that his difference existed, as anyone could see it, only to be told in turn that without the magic p-value he could not have his paper published.” [Nicholls in KatzR2001] “In general, however, null hypothesis significance testing tells us little of what we need to know and is inherently misleading. We should be less enthusiastic about insisting on its use.” Citas sobre la significancia estadística [Falk in KatzR2001] “Significance tests do not provide the information that scientists need, neither do they solve the crucial questions that they are characteristically believed to answer. The one answer that they do give is not a question that we have asked.” [DuPrelJB2009] “Unfortunately, statistical significance is often thought to be equivalent to clinical relevance. Many research workers, readers, and journals ignore findings which are potentially clinically useful only because they are not statistically significant. At this point, we can criticize the practice of some scientific journals of preferably publishing significant results [...] ("publication bias").” 22/05/2017 INAOE 52 Citas sobre la significancia estadística [GardnerMJ1986, co-authored by Altman] “...the use of statistics in medical journals has increased tremendously. One unfortunate consequence has been a shift in emphasis away from the basic results towards an undue concentration on hypothesis testing. In this approach data are examined in relation to a statistical "null" hypothesis, and the practice has led to the mistaken belief that studies should aim at obtaining "statistical significance”. [...] The excessive use of hypothesis testing at the expense of other ways of assessing results has reached such a degree that levels of significance are often quoted alone in the main text and abstracts of papers, with no mention of actual concentrations, proportions, etc, or their differences. The implication of hypothesis testing- that there can always be a simple "yes" or "no" answer as the fundamental result from a medical study-is clearly false and used in this way hypothesis testing is of limited value.” 22/05/2017 INAOE 53 Prueba de hipótesis Considerado el padre de la estadística inferencial Creador de ANOVA entre otros Trabajo principalmente en Cambridge y UCL, fue miembro de la Royal Society Reemplazó a Pearson en su cátedra en UCL Cómo buen genio trabajo en otros campos: matemáticas, biología evolutiva, genética, etc De hecho, también es el padre de la genética poblacional, que describe los fenómenos evolutivos en función de la variación y distribución de la frecuencia alélica También descubrió la utilidad del uso de los cuadrados latinos para mejorar significativamente los métodos agrícolas Sir Ronald Aylmer Fisher (1890-1962) Británico Una biografía y algunos enlaces: http://www-history.mcs.st-andrews.ac.uk/Biographies/Fisher.html 22/05/2017 INAOE 54 Null and Alternative Hypothesis Statistical testing is used to accept/rejct hypothesis Null hypothesis (H0): There is no difference or relation H0: μ1=μ2 Alternative hypothesis (Ha): There is difference or relation Ha: μ1μ2 Example: Research question: ¿Are men taller than women? Null hypothesis: There is no height difference among genders Alternative hypothesis: Gender makes a difference in height. Hypothesis Type / Directionality: One-tail vs Two-tail One-tailed: Used for directional hypothesis testing Alternative hypothesis: There is a difference and we anticipate the direction of that difference Ha: μ1<μ2 Ha: μ1>μ2 Two-tailed: Used for non-directional hypothesis testing Alternative hypothesis: There is a difference but we do not anticipate the direction of that difference Ha: μ1μ2 Example: Research question: ¿Are men taller than women? Null hypothesis: There is no height difference among genders Alternative hypothesis: One tail: Men are taller than women Two tail: One gender is taller than the other. [Figures from: http://www.mathsrevision.net/alevel/pages.php?page=64] Significance Level (α) and test power (1-β) The probability of making Decision \ Reality H0 true / Ha False H0 false / Ha true Accept H0; Reject Ha Ok (p=1-α) Type II Error (β) Reject H0; Accept Ha Type I Error (p=α) Ok (1-β) Type I Errors can be decreased by altering the level of significance (α) Unfortunately, this in turn increments the risk of Type II Errors. …and viceversa The decision on the significance level should be made (not arbitrarily but) based on the type of error we want to reduced. Figure from: [http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/reference/reference_manual_02.html] Hypothesis Type / Directionality: One-tail vs Two-tail Hypotheis directionality affect statistical power One tail tests provide Two tail test more statistical power to detect an effect Choosing a one-tailed test for the sole purpose of attaining significance is not appropriate. You may lose the difference in the other direction! Choosing a one-tailed test after running a twotailed test that failed to reject the null hypothesis is not appropriate. One tail test Source: [http://www.ats.ucla.edu/stat/mult_pkg/faq/general/tail_tests.htm] Figure from: [http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/reference/reference_manual_02.html] Independence of observations: Paired vs Unpaired Paired: There is a one-to-one (biyective) correspondence between the samples of the groups If samples in one group are reorganised then so should samples in the other. Examples: Randomized block experiments with two units per block Studies with individually matched controls Repeated measurements on the same individual Unpaired: There is no correspodence between the samples of the groups. Samples in one group can be reorganised independently of the other Pairing is a strategy of design, not analysis (pairing occur before data collection!). Pairing is used to reduce bias and increase precision [DinovI2005] Example of paired data: N sets of twins to Twin Aggresiveness score know if the 1st Pair 1st born 2nd born born is86more88 1 aggresive than 2 71 77 3 second 77 76 the … … … N 87 72 Example adapted from [DinovI2005] Parametric vs non-parametric Parametric testing: Assumes a certain deistribution of the variable in the population to which we plan to generalize our data Non-parametric testing: No assumption regarding the distribution of the variable in the population That is distribution free, NOT ASSUMPTION FREE!! Non-parametric tests look at the rank order of the values Parametric tests are more powerful than non- parametric ones and so should be used if possible [GreenhalghT 1997 BMJ 315:364] Source: 2.ppt (Author unknown) One way, two way,… N-way analysis Experimental design may be one-factorial, two factorial,… N-factorial i.e. one research question at a time, two research questions at a time, …N research questions at a time. The more ways the more difficult the analysis interpretation One-way analysis measures significance effects of one factor only. Two-way analysis measures significance effects of two factor simultaneously. Etc… Steps to apply a significance test 1. Define a hypothesis 2. Collect data 3. Determine the test to apply 4. Calculate the test value (t,F,χ2,p) 5. Accept/Reject null hypothesis based on degrees of freedom and significance threshold [GurevychI2011] Which test to apply? Selecting the right test depends on several aspects of the data: Sample count (Low <30; High >30) Independence of observations (Paired, Unpaired) Number of groups or datasets to be compared Data types (Numerical, categorical, etc) Assumed distributions Hypothesis type (One-tail, Two tail). [GurevychI2011] Which test to apply? Independent Variable Number Dependent Variable Type Number Test Type Statistic 1 population N/A 1 Continuous normal One sample ttest Mean 2 independent populations 2 categories 1 Normal Two sample ttest Mean 1 Non-normal Mann Whitney, Wilcoxon rank sum test Median 1 Categorical Chi square test, Fisher’s exact test Proportion 3 or more populations Categorical 1 Normal One way ANOVA Means … … … … … … More complete tables can be found at: •http://www.ats.ucla.edu/stat/mult_pkg/whatstat/choosestat.html •http://bama.ua.edu/~jleeper/627/choosestat.html •http://www.bmj.com/content/315/7104/364/T1.expansion.html DATA MINING © 2015. Dr. Felipe Orihuela-Espina 65 Initial definitions In a conditional probability P(x|y), the set of P(y) are called the priors. The likelihood function is the probability of the evidence given the parameters i.e. the model: p(x|θ). The posterior probability is the probability of the parameters i.e. the model, given the evidence: p(θ|x). 22/05/2017 INAOE 66 Initial definitions Factors of variation: Aspects of the data that can vary separately. i.e. the intrinsic dimensionality of the manifold Computational element or unit: A mathematical function or block that can be reused to express more complex mathematical functions. Examples: basic logic gates (AND, OR, NOT), artificial neurons, decision trees, etc Fan-in: Maximum number of inputs of a particular element 22/05/2017 INAOE 67 Initial definitions System or computational model: A set of interconnected computational elements, at times represented by a graph. Size of a system: Number of elements in the system. Important to justify deep learning is the observation that reorganizing the way in which computational units are composed or connected can have a drastic effect on the efficiency of representation size [BengioY2009, pg 19]. Types or classes of models: Generative models: Models for randomly generating observable data P(X,Y). These include HMMs, GMMs, restricted Boltzmann Machines, etc Discriminative or conditional models: Models for capturing the dependence of an unobserved variable Y on an observed variable X, P(Y|X). These include linear discriminant analysis, SVM, linear regressors, ANN, ... 22/05/2017 INAOE 68 Posterior probability Using Bayes rules; p(θ|x) = [p(x|θ)p(θ)]/p(x) ...which can be "reexpressed" for easy remembering as the directly proportional (∝) relation: Posterior probability ∝ Likelihood ✕ Prior probability …or in other words, since the joint distribution p(x,θ)=p(x|θ)p(θ), then Posterior probability ∝ Joint distribution 22/05/2017 INAOE 69 Posterior probability From the above (i.e. previous slide), two basic approximations for estimating posterior probabilities follow [ResnikP2010]: The maximum likelihood estimation (MLE) which amounts to counting and then normalizing so that the probabilities sum to 1: MLE produces the choice most likely to have generated the observed data. The maximum a posteriori (MAP) estimation MAP estimate is the choice that is most likely given the observed data. In contrast to MLE, MAP estimation applies Bayes's Rule, so that our 22/05/2017 estimate can take into account prior knowledge about what we expect θ to be in the form of a prior probability distribution P(θ). Both, MLE and MAP give us the best estimate according to their respective definitions of "best". None, MLE nor MAP give a whole distribution P(θ|x). INAOE 70 Patterns Patterns are regularities in data. [Wikipedia:Pattern_recognition] Patterns refers to models (regression or classification) or components of models (e.g. a linear term in a regression) [FayyadU1996, pg 51] [Fayyad et atl (1996) AI magazine Fall:37-54, >6500 citations!] © 2015. Dr. Felipe Orihuela-Espina 71 Data mining Data mining is: “the application of specific algorithms for extracting patterns from data.” [FayyadU1996] “the computational process of discovering patterns in large data sets” [Wikipedia:Data_mining] the analysis step of the "Knowledge Discovery in Databases" (KDD) process [FayyadU1996, Wikipedia:Data_mining] © 2015. Dr. Felipe Orihuela-Espina 72 Data mining • Discovering patterns in large data sets [Wikipedia:Data_mining] Data mining Pattern recognition •Recognition of regularities (patterns) in data [Wikipedia:Pattern_recognition •Data driven classification [JainAK2000] •Nearly synonymous with machine learning [Wikipedia:Pattern_recognition] Different names for the same thing? Machine learning Knowledge discovery •Data-driven discovery of knowledge •It adds processing (cleaning, selection) steps to data mining [FayyadU1996] •Construction and study of algorithms that can learn (act of acquiring new knowledge) from data •Often overlaps with computational statistics [Wikipedia:Machine_learning] © 2015. Dr. Felipe Orihuela-Espina 73 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 74 Data mining … © 2015. Dr. Felipe Orihuela-Espina 75 [Fayyad et atl (1996) AI magazine Fall:37-54, >6500 citations!] Data mining Classification is strongly related to regression [FayyadU1996]: Regression is learning a function that maps a data item to a realvalued prediction variable. Classification is learning a function that maps (classifies) a data item into one of several predefined classes. That’s regression with a threshold! [Felipe’s dixit] © 2015. Dr. Felipe Orihuela-Espina 76 Learning Goal The objective of learning in AI is giving computers the ability to understand our world in terms of inferring semantic concepts and relationships among these concepts. Scope: Single task: Observations comes from a single task Multi-task: Observations comes from several tasks at once 22/05/2017 INAOE 77 Types of learning Supervised: Relys on known (labelled) examples a.k.a. the training set, to find a discrete regressor Unsupervised: Finds regularities and structures (i.e. fits probability distributions) to observations Reinforced: Updates the currently learn model based on rewards assessing its outputs Semi-supervised: From an initially learned supervised model, it evolves unsupervisedly by generating synthetic "rewards" proportional to the likelihood of the new observations. Active: A particular case of semi-supervised learning in which the new observations are chosen or selected from all arriving new observations according to a certain criteria. Transfer: A particular case of semi-supervised learning in which new observations comes from a new domain or task. 22/05/2017 INAOE 78 Basic problems in learning Modelling: It refers to encoding dependencies between variables under a given chosen form. In fact, modelling per se just refers to choosing this form, and in its more minimalistic case it does not require the model to be representative of the phenomenon, explicative nor predictive! It may be just nuts, a silly model! Learning: It refers to optimizing the parameters of the model by minimizing the loss functional i.e. a particular criteria, e.g. least squares error. Inference or reconstruction: It refers to estimating posterior probabilities of hidden variables given observed ones, P(h|x) or h=f(x) 22/05/2017 INAOE 79 Local vs non-local generalization Local generalization It refers to an underlying assumption made by many learning algorithms; the output f(x1) is similar to f(x2) iff x1 is similar to (i.e. close to/in the neighbourhood of) x2. Non-local generalization Learning a function that behaves differently in different regions of the data-space requires different parameters for each of these regions. 22/05/2017 INAOE 80 Local generalization Local generalization is closely related to manifold learning; Since a manifold is locally Euclidean, it can be approximated locally by linear patches tangent to the manifold surface. If it is smooth, then these patches (i.e. the computational units) will be reasonably large and the number of patches needed (i.e. the size of the computational model) will be small. However, if the manifold is highly curved (i.e. complex highly varying function) then the patches will have to be small increasing the number of patches to characterise the manifold. Figure reproduced from [BengioY2009, pg 25] 22/05/2017 INAOE 81 Local generalization Local generalization is related to the curse of dimensionality. However what matters for generalization is is not the [extrinsic] dimensionality, but the number of variations of the function [i.e. intrinsic dimensionality] that we want to learn. Generalization is mostly achieved by a form of local interpolation between neighbouring training examples. 22/05/2017 INAOE 82 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 83 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 84 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 85 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 86 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 87 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] 88 © 2015. Dr. Felipe Orihuela-Espina Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 89 Data mining Clustering [Fayyad et atl (1996) AI magazine Fall:37-54, >6500 citations!] © 2015. Dr. Felipe Orihuela-Espina 90 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 91 Data mining [Jain et al (2000) IEEE TPAMI 22(1):4-37, >5000 citations!] © 2015. Dr. Felipe Orihuela-Espina 92 Optimizing model selection Assumes LTI Y1…Npre -> Hyperparameters for preprocessing Xi,fs -> Feature selection method Y1…Npre -> Hyperparameters for feature selection Xi,class -> Classifier method Y1…Nclass -> Hyperparameters for classification 22/05/2017 INAOE [EscalanteHJ2009] Xi,pre -> Combination of preprocessing methods 93 Data mining “Overfitting: When the algorithm searches for the best parameters for one particular model using a limited set of data, it can model not only the general patterns in the data but also any noise specific to the data set, resulting in poor performance of the model on test data. Possible solutions include cross-validation, regularization, and other sophisticated statistical strategies.” [FayyadU1996] © 2015. Dr. Felipe Orihuela-Espina 94 Deep learning Data representation refers simply to the chosen feature space, i.e. the feature vector [BengioY2013]. The construction or learning of this feature space goes under the name of feature engineering and includes more rudimentary subproblems such as feature selection and extraction e.g. processing and transformations. A good representation is one that disentangles the underlying factors of variation [BengioY2013]. 22/05/2017 INAOE 95 Deep learning Much of the actual effort in deploying machine learning algorithms goes into feature engineering. Representation learning a.k.a. deep learning, is about learning a representation of the data i.e. feature space, that makes it easier to extract useful information when building predictors (e.g. classifiers, regressors, etc). As soon as there is a notion of representation, one can think of a manifold [BengioY2013]. 22/05/2017 INAOE 96 Deep learning Expressive representations: It refers to the ability of capturing a huge number of input configurations with a reasonable sized representation. In other words, having few features suffices to cover most of the data space. That's good old content validity meet computational spatial efficiency (Felipe's dixit) Traditional algorithms require O(N) parameters (and/or O(N) training examples) to distinguish O(N) input regions. Linear features e.g. as those learnt by PCA, cannot be stacked to form deeper, more abstract representations since the composition of linear operations yields another linear operation. However, it is still possible to use the linear fetures in deep learning; e.g. inserting a non-linearity between learned singlelayer linear projections. 22/05/2017 INAOE 97 Deep learning Distributed representations: It refers to having more than one computational units charting a certain region of the data space at the same time. Distributed representations are often (always?) expressive. Example: Imagine one binary classifier over certain space. It partitions the space into 2 subregions. But having 3 classifiers over that certain space can partition the space into exponentially more regions. Distributed representations can alleviate the curse of the dimensionality and the limitations of local generalization. Figure reproduced from [BengioY2009, pg 27] 22/05/2017 INAOE 98 Deep learning Overcomplete representations: It refers to having more (hidden) computational units i.e. degrees of freedom, than training examples. Often lead to overfitting endangering generalization. May still be useful for denoising [Felipe's inferred from BengioY2009, pg 46] However; “importantly, DBMs, (in the case of MNIST - despite having million of parameters and only 60k training samples), do not appear to suffer much from overfitting” [SalakhutdinovR2009, pg453] ...hmmm, not sure about this; Salakhutdinov says so, but he does not provide any evidence that this is the case. 22/05/2017 INAOE 99 Deep learning Invariant representations: It refers to having computational units which by having learn abstract concepts, they achieve outputs which are invariant to local changes of the input. This often need highly non-linear transfer functions. Invariance and abstraction goes hand in hand. Having invariant features is a long standing goal in pattern recognition. Achieving invariance i.e. reducing sensitivity along a certain direction of the data, does not guarantee to have disentangle a certain factor of variance in the data. Although, invariance is often good, the ultimate goal is not to achieve invariance, but to disentangle explanatory factors [BengioY2013], that's manifold embedding!. Therefore; the goal of building invariant features should be removing sensitivity to directions of variance that are uninformative to the task. Building invariant representation often involves two steps; Low level features are selected to account for the data Higher level features are extracted from low level features 22/05/2017 INAOE 100 Deep learning Deep architectures are model architectures composed of multiple levels of non-linear operations or computational elements. The number of levels i.e. the longest path from an input node to an output node, is referred to as depth of the architecture. 22/05/2017 INAOE 101 Deep learning An architecture may be: Shallow architecture; often up to 3 levels of depth Deep architectures: More than 3 levels Example: Brain anatomy; 5-10 levels in the visual system [SerreT2007] Funny enough, examples and systems used in scientific papers devoted to deep learning hardly go beyond 3 levels, e.g. [SalakhutdinovR2013_TPAMI]. So not that deep! 22/05/2017 INAOE 102 Deep learning Pros and cons in a nutshell Pros Cons • Relaxes need for feature engineering •Modelling becomes truly data-driven • Bigger compartmentalization of the search space achieved (with a fixed number of hidden variables) • Higher complexity of the model • Larger number of parameters • "Direct" training becomes intractable 22/05/2017 INAOE 103 Deep learning Deep Boltzmann Machines (DBM): A variant of Boltzmann machines that instead of having one single layer of hidden variables (in contrast to the RBM), has multiple layers of hidden variables; with units in odd-numbered layers being conditionally independent given evennumbered layers and viceversa. 22/05/2017 Figure: Deep Boltzmann Machine with 3 layers. Figure reproduced from [SalakhutdinovR2013_TPAMI] INAOE 104 Questions that I'm unable to answer at the moment Overfitting. Clearly deep models are prone to overfitting considering they use overcomplete representations. …it’s not me, but Bengio who warns about this! From its particular example with MNIST images, [SalakhutdinovR2009, pg453] claims this does not seem to be the case. However, he says so but fails to provide any evidence that this is the case. I'm still unconvinced that, in general, deep learning models do not simply overfit data. 22/05/2017 INAOE 105 Deep learning To know more: [BengioY2009] Bengio, Y. (2009) "Learning deep architectures for AI" Foundations and trends in machine learning, 22/05/2017 2(1):1-127 [BengioY2013] Bengio, Y.; Courville, A.; Vincent, P. (2013) "Representation learning: a review and new perspectives" IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798-1828 [DavisRA2001] Davis, R.A. (2001). Gaussian Processes, Encyclopedia of Environmetrics, Section on Stochastic Modeling and Environmental Change, (D. Brillinger, Editor), Wiley, New York [HintonGE2006] Hinton, Geoffrey E.; Osindero, Simon; Teh, Yee-Whye (2006) "A Fast Learning Algorithm for Deep Belief Nets" Neural Computation 18:1527–1554 [LeCunY2006] LeCun, Yann; Chopra, Sumit, Hadsell, Raia; Ranzato, Marc Aurelio; Huang, Fu Jie (2006) "A tutorial on energy-based learning" in Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds), Predicting Structured Data, MIT Press [ResnikP2010] Resnik, Philip and Hardisty, Erick (2010) "Gibbs sampling for the uninitiated" Technical Report CS-TR4956, Institute for Advanced Computer Studies, University of Maryland, 23 pp. [SalakhutdinovR2008_ICML] Salakhutdinov, Ruslan and Murray, Iain (2008) "On the Quantitative Analysis of Deep Belief Networks" 25th International Conference on Machine Learning (ICML), Helsinki, Finland [SalakhutdinovR2009_AISTATS] Salakhutdinov, Ruslan and Hinton, Geoffrey (2009) "Deep Boltzmann Machines" 12th International Conference on Artificial Intelligence and Statistics (ICAISTATS), Clearwater Beach, Florida, USA, pgs. 448-455 [SalakhutdinovR2013_TPAMI] Salakhutdinov, Ruslan; Tenenbaum, Joshua B. and Torralba, Antonio "Learning with hierarchical-deep models" IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8):1958-1971 [SerreT2007] Serre, T.; Kreiman, M. K.; Cadieu, U.; Knoblich, U.; Poggio, T. (2007) "A quantitative theory of immediate visual recognition" Progress in Brain Research, Computational Neuroscience: Theoretical Insights into Brain Function, 165:33-56 [TehYW2010] Y.W. Teh. (2010) Dirichlet process. Encyclopedia of Machine Learning. Springer INAOE 106 KNOWLEDGE REPRESENTATION © 2015. Dr. Felipe Orihuela-Espina 107 Knowledge representation “Knowledge representation includes ontologies, new concepts for representing, storing, and accessing knowledge. Also included are schemes for representing knowledge and allowing the use of prior human knowledge about the underlying process by the knowledge discovery system.” [FayyadU1996] © 2015. Dr. Felipe Orihuela-Espina 108 Automating Science? “Computers with intelligence can design and run experiments, but learning from the results to generate subsequent experiments requires even more intelligence.” [WaltzD2009] Goals of automation in science [WaltzD2009]: increase productivity by increasing efficiency (e.g., with rapid throughput) improve quality (e.g., by reducing error) cope with scale 22/05/2017 INAOE 109 Knowledge generation can be streamlined: e.g. Robot scientist Robot scientist ADAM and researcher Prof. King LABORS (Laboratory Ontology for Robot Scientists) ontology [KingRD2011] Formalizes Adam’s functional genomics experiments Based on EXPO (Ontology of scientific experiments) Closing the loop; ADAM can decide on what experiment to do next [WaltzD2009] Limited to hypothesis-led discovery [KingRD2009] 22/05/2017 © 2015. Dr. Felipe Orihuela-Espina 111 Knowledge generation can be streamlined: EXPO EXPO: Ontology of scientific experiments Defines over 200 concepts for creating semantic markup about scientific experiments OWL language EXPO to formalise generic knowledge about scientific experimental design, methodology, and results representation. [SoldatovaLN2006] EXPO is available at http://expo.sourceforge.net/ 22/05/2017 © 2015. Dr. Felipe Orihuela-Espina 112 An overview of EXPO 22/05/2017 [KingRD2006 presentation on EXPO] © 2015. Dr. Felipe Orihuela-Espina 113 Knowledge generation To arrive to knowledge from experimentation 3 steps are taken: Data harvesting: Involving all observational and interventional experimentation tasks to acquire data Data acquisition: experimental design, evaluation metrics, capturing raw data Data reconstruction: Translates raw data into domain data. Inverts the data formation process. E.g.: If you captured your data with a certain sensor and the sensor throws electric voltages as output, then reconstruction involves converting those voltages into a meaningful domain variable. E.g.: Image reconstruction Data analysis: From domain data to domain knowledge When big data is involved, it is often referred to as Knowledge discovery 22/05/2017 © 2015. Dr. Felipe Orihuela-Espina 114 Knowledge discovery Figure from [Fayyad et al, 1996] © 2015. Dr. Felipe Orihuela-Espina 115 Data interpretation Research findings generated depend on the philosophical approach used [LopezKA2004] Assumptions drive methodological decisions Different (philosophical) approaches for data interpretation [PriestH2001, part 1, LopezKA2004; but basically phylosophy in general] Interpretive (or hermeneutic) phenomenology: Systematic reflection/exploration on the phenomena as a means to grasp the absolute, logical, ontological and metaphysical spirit behind the phenomena Affected by the researcher’s bias Kind of your classical hypothesis driven interpretation approach [Felipe’s dixit] Descriptive (or eidetic) phenomenology Favours data driven over hypothesis driven research [Felipe’s dixit based upon the following] “the researcher must actively strip his or her consciousness of all prior expert knowledge as well as personal biases (Natanson, 1973). To this end, some researchers advocate that the descriptive phenomenologist not conduct a detailed literature review prior to initiating the study and not have specific research questions other than the desire to describe the lived experience of the participants in relation to the topic of study” [Lopez KA 2004] Important note: I do NOT understand these very well, so do not ask me! READ. 22/05/2017 © 2015. Dr. Felipe Orihuela-Espina 116 Data interpretation Different (philosophical) approaches for data interpretation [PriestH2001, part 1, LopezKA2004; but basically phylosophy in general] (Cont.) Grounded theory analysis Generates theory through inductive examination of data Systematization to break down data, conceptualise it and re-arrange it in new ways Content analysis Facilitates the production of core constructs formulated from contextual settings from which data were derived Emphasizes reproducibility (enabling others to establish similar results) Interpretation (analysis) becomes continual checking and questioning Narrative analysis Qualitative Results (often from interviews) are revisited iteratively detracting words or phrases until core points are extracted. 22/05/2017 Important note: I do NOT understand these very well, so do not ask me! READ. © 2015. Dr. Felipe Orihuela-Espina 117 Why KR models for biomedical engineering? GOAL: Formalizing concepts and relations common in biomedical imaging Affording more time for interpretation Advantages: favours automated data processing, automated knowledge and data integration and semantic integration [HoehndorfR2012] The formalization of experimental knowledge expects that such knowledge is more easily reused to answer other scientific questions [KingRD2009] Ensure reproducibility and quality results [OrihuelaEspinaF2010] Leaves interpretation to humans! 22/05/2017 © 2015. Dr. Felipe Orihuela-Espina 118 AN EXAMPLE OF KR WITH FNIRS © 2015. Dr. Felipe Orihuela-Espina 119 Challenges in KR in fNIRS experimentation How to choose? The region to interrogate? The best (most fair) analysis? [OrihuelaEspina2010_OHBM] Inc. processing, parameterization, and analysis flow How to avoid: Physiological noise /systemic effect? Artefacts (e.g optode movement, ambient light)? How to ensure: Physiological plausability? Integrity / validity? [OrihuelaEspina2010_PMB] reuse of formalized experiment information? [KingRD2009] 22/05/2017 INAOE 120 Challenges in KR in fNIRS experimentation: Parameterization [OrihuelaEspinaF2010_OHBM] 22/05/2017 INAOE 121 Challenges in KR in fNIRS experimentation: Modelling light tissue light Light model Chromophore concentration Neurovascular coupling Physiological model Physiological information [Inspired from Banaji, fNIRS Conference, 2012] 22/05/2017 INAOE 122 Challenges in KR in fNIRS experimentation: Modelling Is the data validated? Do we really need a physiological model? A model is useful only if they fulfil very high standards of predictive capability and reliability We learn about the phenomenon while building the model (vicious circle) Purposes of models: Explain data /highlight gaps in understanding Raising open questions Predict hard-to-measure quantities Develop understanding and intuition Prepare us for experimental data Challenge dogmas May force us to ignore priors! [Banaji, fNIRS Conference, 2012; Banaji, JTB, 2006, Banaji, PLoS CB, 2008] 22/05/2017 INAOE 123 Challenges in KR in fNIRS experimentation: Modelling What are the principles that we should follow to build our model? How is the model going to interact with the data? Example of interaction 1 Simulated data Model Example of interaction 2 Model Modelled data Modelled data Compare Compare Subject / Cohort Subject / Cohort Observed data Observed data [Banaji, fNIRS Conference, 2012] 22/05/2017 INAOE 124 Challenges in KR in fNIRS experimentation Closing the loop: from experiment design and data collection to hypothesis formation and revision, and from there to new experiments [WaltzD2009] Complex experiments having different sources Different NIRS devices (HITACHI, SHIMADZU, fNIRX), but also difference sources eye-tracking, EEG, etc.. Accomodating different optical modalities Lack of standard “final” representation format Medical standard DICOM; not as standard as pretended Each provider has its own file format. SNIRF: Shared Near Infrared File Format Specification 22/05/2017 INAOE 125 Challenges in KR in fNIRS experimentation Problem size Information representation (relational, object oriented) Sample size (Extrapolation, generalization, regularization, ill posed problems i.e. Number of observations vs number of covariates) Data mining and KD strategy [FayyadU1996] Model identification and parameterization Underparameterization: low flexibility to explain complex data Overparameterization: Spurious model can explain any data. Difficulties in parameter identification Level of detail Model baoundaries, parameters, variables, purpose 22/05/2017 INAOE 126 Concept map: experimentation Light model Physiological model 22/05/2017 INAOE 127 Data analysis: more than just thinking your statistical test… Figure source: [OrihuelaEspinaF2012, Workshop on Foundations of Biomedical Knowledge Representation] •Past: Make sense of bygone situations or explain an occurring phenomena, establishing associational or causal relations •Present: Decision making • Future: Infer outcomes, reasoning, prediction, planning, optimization •Hypothesis driven vs data driven Quantitative vs Qualitative •Causality (Zero level, One level, Two level) •Incorporation of domain knowledge (priors) •Algorithm: complexity (order), strategy (e.g. greedy), serial/parellel, exact real number computation •Problem complexity (NP-complete, P-hard…) •Problem size (Information representation, Regularization) •Data relations and behaviour •Validation theory: Type (Construct, Face, Convergent, Ecological, External, Internal, etc), Technique (Leave one out, cross-fold, gold standard, ground truth) •Dimensionality (Intrisic vs Explicit) •Learning (supervised, unsupervised, reinforcement) •Comparison (metric and performance definition) •Data quality and SNR Processing Analysis Understanding •Direct (Intervention) vs Indirect (Sensing) •Sampling •Interviewing, Behavioural simulation,, observational •Synthetic, Experimental, Data base •Positive vs Negative/Complement •Type: Discrete, Continuous, Categorical / Nominal, Ordinal / Ranked •Digital vs Analogic •Nature of data: Time vs Space •Deterministic vs stochastic •Observable vs Nonobservable •One way, Two way, Nways •Fundamental vs Derived Brain map of data analysis NOT INTENDED TO BE COMPREHENSIVE! 22/05/2017 INAOE 130 [OrihuelaEspinaF2010, PMB] Taxonomy of factors in fNIRS experimentation 22/05/2017 INAOE 131 Experimental factors limit interpretation 22/05/2017 INAOE 132 INTERPRETATION GUIDELINES © 2015. Dr. Felipe Orihuela-Espina 133 Interpretation guidelines Understanding is by far the hardest part of data analysis. …and alas it is also the part where maths/stats/computing are less helpful. Look at your data! Know them by heart. Visualize them in as many possible ways as you can imagine and then a few more. Have a huge background. Read everything out there closely and loosely related to your topic. 134 Interpretation guidelines Always try more than one analysis (convergent validity). Quantitative analysis is often desirable, but never underestimate the power of good qualitative analysis. All scales are necessary and complementary; Structural, functional, effective Inter-subject, intra-subject Neuron-level, region-level 135 Interpretation guidelines Every analysis must translate the physiological, biological, experimental, etc concepts to a correct mathematical abstraction. Every interpretation must translate the “maths” to real world domain concepts. Again: Interpretation of results must be confined to the limits imposed by the assumptions made during the image reconstruction Rule of thumb: Data analysis takes at least 3 to 5 times data collection time. If it has taken less, then your analysis is likely to be weak, coarse or careless. Example: One month collecting data – 5 months worth of analysis. 136 Interpretation guidelines The laws of physics are what they are… …but research/experimentation results are not immutable. They strongly depend on the decisions made during the data harvesting, data reconstruction and the three stages of the analysis process. It is the duty of the researcher to make the best decision to arrive at the most robust outcome. Interpretation, interpretation, interpretation… LOOK at your data! 22/05/2017 INAOE 137 Final remarks Inferential statistics (SPM) are (currently) by far the most popular approach. …perhaps due to their utter simplicity and mathematical elegance together with their flexibility to accomodate virtually every experimental design …yet they are not the only option, and sometimes not the best for a given goal (e.g. graph theory superb competence for connectivity analysis) Analytical modelling (when correct) will always be the safe shot, but complexity often prevents accurate modelling 138 THANKS, QUESTIONS? © 2015. Dr. Felipe Orihuela-Espina 139