Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton Data Analysis Data Analysis Helps us achieve the four scientific goals of description, prediction, explanation, and control Statisical Data Analysis Three primary reasons geographers treat data in a statisitical fashion http://rlv.zcache.com/knowledge_is_power_do_statistics_stats_humor_fly er-p2440846222778564182dwj5_400.jpg Statistical Description Descriptive Statistics Parameters Central Tendency Mode Median Mean X , m Arithmetic mean When would you use the median or the mode instead of the mean? Descriptive Statistics Variability Range = largest value – smallest value Variance å(x - m) N 2 s2 = i =1 i N Standard Deviation N s= å(x - m) i i =1 N 2 Descriptive Statistics Form Modality Skewness Positive Negative Symmetry Unimodal – Bell-shaped Normal Distribution http://people.eku.edu/falkenbergs/images/skewness.jpg Descriptive Statistics Derived Scores Percentile Rank Highest – 99th percentile Where is the median? Z-score Standard deviation units above or below the mean x-m z= s Descriptive Statistics Relationship Linear Relationship Positive Negative Relationship Strength Weak, strong, no relationship Correlation Coefficient Between -1 and 1 0 – no relationship Regression Analysis Criterion variables (Y) Predictor variables (X) http://hosting.soonet.ca/eliris/remotesensing/LectureImages/correlation.gif Correlation – Causation? http://xkcd.com/552/ “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’.” - XKCD Statistical Inference Inferential Statistics Statistics Sampling error Given our sample statistics, we infer our parameters Assign probabilities to our guesses Power and difficulty of inferential statistics comes from deriving probabilities about how likely it is that sample patterns reflect population patterns Inferential Statistics Sampling distribution Ex: sampling distribution of means – show the probability that a single sample would have a mean within some given RANGE of values Central limit theorem – sampling distribution of sample means will be normal with a mean equal to the population mean and a standard deviation equal to the population standard deviation divided by the square root of the sample size Inferential Statistics Generation of sampling distributions Assumptions Distributional assumptions Nonparametric Parametric Normality Homogeneity of variance Independence of scores Correct specification of models Estimation and Hypothesis Testing Estimation Point estimation Confidence Interval Usually 95% Hypothesis Testing Null hypothesis A hypothesis about the exact (point) value of a parameter or set of parameters Use sample statistics to make an inference about the probable truth of our null hypothesis Hypothesis Testing Alternative Hypothesis Hypothesis that the parameter does not equal the exact value hypothesized in the null A range rather than an exact value If A is true, Then B is true B is not true B is true Therefore, A is not true Therefore, ??? Modus Tollens Useful for disconfirming Not confirming! Example From a recent nationwide study it is known that the typical American watches 25 hours of television per week, with a population standard deviation of 5.6 hours. Suppose 50 Denver residents are randomly sampled with an average viewing time of 22 hours per week and a standard deviation of 4.8. Are Denver television viewing habits different from nationwide viewing habits? Step 1: State your null and alternative hypotheses H 0 : X = 25 H A : X ¹ 25 What is this saying? Example Step 2: Determine your appropriate test statistic and its sampling distribution assuming the null is true We are testing a sample mean where n>30 and so a z distribution can be used Step 3: Calculate the test statistic from your sample data X = 22 m = 25 X -m 22 - 25 z= = = -3.79 s = 4.8 s = 5.6 s/ n 5.6 / 50 n = 50 Step 4: Compare the empirically obtained test statistic to the null sampling distribution P value: p = .0001 OR Critical value at .05 significance level: z = ±1.96 Decision: Reject the null hypothesis -3.79 is less than -1.96: reject The p value is very small, less than .05 and even .01: reject Error You have made either a correct inference or a mistake Type I error is the rejection level, p (or α) Type II error - β http://www.mirrorservice.org/sites/home.ubalt.edu/ntsbarsh/Businessstat/error.gif Data in Space and Place Spatiality is a focus in geography, unlike other disciplines Spatial autocorrelation First Law of Geography: Everything is related to everything else, but near things are more related than distant things Positive v negative spatial autocorrelation A violation of the important statistical assumption of independence Ex: If its raining in my backyard, I can say with a high degree of confidence its raining in my neighbor’s backyard, but my level of confidence that it is raining across town is lower, and 300 miles away even lower Variogram http://www.innovativegis.com/basis/Papers/Other/ASPRSchapter/ Default_files/image023.png Data in Space and Place “Spatial data are special” – a special difficulty Which areal units should be used to analyze geographic data Modifiable Areal Unit Problem Gerrymandering Geographic phenomena are often scale dependent Must identify the scale of a phenomena and collect and organize data in units of that size Data aggregation issues Discussion Questions What measure of central tendency is best for nominal data? When pollsters tell you that a candidate is favored by 44% of likely voters, plus or minus 3 percent, what is the 44% and what is the plus/minus 3%? A survey of all users of a park in 1980 found the average number of people per party to be 3.5. In a random sample of 35 parties in 2000 the average was 2.9. If you wanted to test if the number of persons per party in 2000 was different from the number in 1980, what would your null and alternative hypotheses be? In the United States, we presume that someone is innocent. If a guilty person were found to be not guilty, what type of error would this be? A researcher finds that a particular learning software has an effect on student’s test scores, when actually it does not. What type of error is this?