Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Data Analysis Chapter 9 - Montello and Sutton An Introduction to Scientific Research Methods in Geography Overview Statistical data analysis Statistical description Statistical inference Geospatial Analysis Data Analysis Set of display and mathematical techniques Logical and conceptual considerations Allows us to: Extract meaning from systematically collected measurements Communicate that meaning to others Geographers and Data Geographers view data as statistical (complex and imperfect) rather than deterministic Three reasons: Imperfect sample of larger population Measurement involves error Phenomena are expressions of complex sets of many interacting variables Statistical Description Goal: summarize potentially important properties of our data using Parameters - summary indices to describe the population Properties: Central tendency Variability / dispersion Form / shape of distribution Relationships Central Tendency Average or representative value Three most common: Mode - most frequent Median - middle value Mean (“average”) Variability / Dispersion Tells how data points differ from the central tendency How representative the central tendency is Greater when variability is low Three common: Range - distance between high and low Variance - average of deviations from the mean Standard deviation - square root of the variance Form / Distribution I Shape of entire data set Modality - number of local modes Skewness - distribution uneven Positive - mostly low and medium scores Negative - mostly medium and high scores Symmetry - mirror around central tendency Bimodal Unimodal - normal or “bell-shaped” curve Form / Distribution II Derived scores Describe the value of individual scores relative to the rest of the data set Three common: Rank - 1, 2, 3, etc. Percentile rank - percentage of the data that is less than the score in question z-score - standard deviation units above or below the mean of the data set Relationships I Systematic (consistent) patterns of high or low values across pairs of variables Linear relationship - two variables form a straight line when graphed Positive (or direct) - high value A has high value B; low value A has low value B Negative (or indirect) - high value A has low value B; low value A has high value B Relationships II Relationship strength - degree that patterns hold across all cases Correlation coefficient - square of correlation measure of relationship strength Regression analysis - expresses relationship as an equation that predicts the values of Y (criterion variable) as a function of X (predictor variable) Monotonic relationship - goes up or down; not necessarily in a straight line Statistical Inference I Goal: Draw informed guesses about likely patterns in population, based on sample data evidence Assign probabilities to guesses Sampling distribution - distribution of a sample statistic based on all possible samples of a given size, from a given population Statistical Inference II Assumptions: Distribution is normal and variances are equal Data values are independent Model specification (such as linearity, inclusive of relevant predictor constructs) Statistical Inference III Two approaches: Estimation Point estimate - guess about specific parameter value Confidence interval - range of values distributed around the point estimate, expressed as probability Hypothesis Testing Null hypothesis (H0) is about exact point of parameter Alternative hypothesis (HA) is that the exact point of the parameter is not the null Statistical Inference IV Four possible outcomes, based on: Two possible truths (H0 is true, HA is false) Two possible decisions (reject H0 and accept HA; reject both H0 and HA) Two types of errors: Type I - reject H0 when H0 is true Type II - fail to reject H0 when H0 is false Geospatial Analysis Geography data are different: They are spatially distributed Have location, extent or size, shape, pattern, connectivity, etc. They represent natural and human earthsurface features and processes Spatiality is the focus or is central to the analysis Spatiality Influences the accuracy of inferential statistical analyses of nonspatial variables Spatial autocorrelation exists when there are patterns of spatial dependence – places are “like” other places Distance decay – near things are “more like” each other than things further away Areal Units Which areal units to use? Problems: Using data from continuous source, but treat with discrete spatial analysis techniques Politicization of unit determination (like gerrymandering) Modifiable Areal Unit Problem (MAUP) – effect that theoretically arbitrary areal geometries have on geographic analysis Questions Why is data analysis in geography usually conceptualized in statistical (probabilistic) terms? What is meant by strength and form of statistical relationships? What is the purpose of statistical inference? Why are statistical inferences necessarily and ultimately uncertain? What are two types of correct decisions and two types of errors possible when hypothesis testing? What is spatial autocorrelation, what forms can it take, and why is it so important to geographic data analysis?