Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Tools in Cognitive Science II: Basic Statistics for Cognitive Scientists Some basic concepts in statistics Philipp Mitteröcker Basic terms Statistics from Latin »statisticum« (about the state) and Italian »statista« (statesman, politician). Historical roots 17th century Handling of demographic and economic data (“political arithmetic”) John Graunt (1662) “Observations on the Bills of Mortality” Development of Probability Theory by Pascal, Fermat, and Bernoulli 1794 The method of least squares was described by Carl Friedrich Gauss 19th and early 20th century Francis Galton, Florence Nightingale, Karl Pearson, Ronald A. Fischer Historical roots Basic terms Applied statistics Descriptive statistics Inferential statistics (hypothesis tests, confirmatory a.) Exploratory analysis, modeling, data mining Mathematical statistics Basic terms Biometrics, psychometrics, econometrics, morphometrics... metron = measurement Basic terms Measurement The process of assigning a number to an attribute (or phenomenon) according to a rule or set of rules. Sample A collection of individual observations selected by a specific procedure. Population Totality of individual observations about which inferences are to be made Data (sing. Datum), Information, Knowledge Theory, Hypothesis Basic terms Variable A symbol that stands for a value that may vary. Univariate statistics Multivariate statistics Bivariate statistics Measurements Precision (Präzision) Degree to which repeated measurements show the same results (reproducibility, repeatability) accuracy (Genauigkeit) Closeness of measurements of a quantity to the quantity‘s actual (true) value. bias (Verzerrung) Diﬀerence between the average of the measurements and the reference value Measurements 3FGFSFODFWBMVF 1SPCBCJMJUZ EFOTJUZ "DDVSBDZ 1SFDJTJPO 7BMVF Measurements Estimating measurement error by repeated measures Random error Systematic error Measurements Outliers Mistake or important measurement? Measurements Longitudinal versus cross-sectional data Measurement scales nominal scale (categorial data) e.g., gender, nationality, habitat ordinal scale e.g., school grades, rank order, Likert scale interval scale no natural zero point, i.e., we can compute diﬀerences but no ratios e.g., degree Celsius, coordinates ratio scale e.g., body height, counts, frequencies, degree Kelvin Measurement scales Discrete data (meristic data) e.g., natural numbers, rank order, number of fish in a pond, scale from 1 to 7 Continuous data e.g., real numbers, cm, kg, degree Celsius Descriptive statistics Central tendency mean, weighted mean arithmetic, geometric, harmonic mean mode, median Dispersion, spread range, variance, standard deviation, quantiles coeﬃcient of Variation Descriptive statistics The problem of multimodal distributions and outliers Descriptive statistics & measurement scales nominal scale mode, frequencies (contingency tables) ordinal scale median, percentile interval scale mean, standard deviation, correlation, regression, analysis of variance ratio scale geometric mean, coeﬃcient of variation, logarithms Descriptive statistics How to describe a bivariate distribu6on? Bivariate statistics Covariance, Correlation Correlation -1 < r < 1 r = 0 ... no linear relationship r = 1 or -1 ... perfect linear relationship 1 ... positive relationship -1 ... negative relationship Bivariate distribution s12 = 0.647 Equal frequency ellipses Data matrix Var. 1 Var. 2 Var. 3 Var. 4 ... Case 1 Case 2 Case 3 Case 4 Case 5 ... Multivariate spaces A B 0.7 0.6 2 2 1 B 0.7 0.6 3 0.5 0.5 0.2 0.4 0.4 0.3 0.3 0.1 0.2 3 0.1 0.0 0.2 0.1 0.0 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.2 0.3 A 0.7 Q-space R-space Multivariate distribution variance-covariance matrix s12 s12 s21 s22 sn1 s1n sn2 0.4 0.5 0.6 1 Multivariate distribution correlation matrix 1 r12 r21 1 rn1 r1n 1 Multivariate distribution 0.950 0.647 0.647 1.535 0 0.820 0 0.235 Principal Component Analysis (diagonalization of a covariance matrix) Multivariate distribution Discriminant function analysis