Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ESRM 304 Statistics Module: Sampling in Natural Resources Management Dr. Indroneil Ganguly Asst. Professor School of Environmental and Forest Sciences, University of Washington This we will cover today • I‐Basic Concepts Governing Sampling • II‐Background Statistics Statistics • Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. ‐ H. G. Wells, author of “War of the Worlds”. • Definition: • Statistics is the science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively evaluated. • Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advances (Davidian, M. and Louis, T. A., 10.1126/science.1218685). I‐Basic Concepts Governing Sampling Three Phases of Statistics • Collect the data • Analyze the data • order the data • graphical displays • numerical calculations (such as mean and standard dev) • Interpret the results • use proper statistical techniques to substantiate or refute hypothesized statements • match data to the appropriate technique • determine whether the proper assumptions are satisfied I‐Basic Concepts Governing Sampling Two types of statistics • Descriptive statistics: summarize and describe a characteristic for some group • Batting Average • Yards Per Carry • Test Scores • Inferential statistics: estimate, infer, predict, or conclude something about a larger group • Polls • Ecological Studies • Market Surveys I‐Basic Concepts Governing Sampling Two types of data • Quantitative data: values recorded on a natural numerical scale • Weight of subjects in medical sample • Height of buildings in Chicago • Temperatures per day at Antarctica Weather Station • Qualitative data: classified into categories • Gender of subjects in medical sample • Political affiliation of respondents in a poll survey • Class (fresh, soph, jr, sr) of Math 101 students I‐Basic Concepts Governing Sampling Relevant Vocabulary • The population is the entire set of objects (people or things) under consideration. • A sample is a subset of the population that is available for the analysis. • A bias is a favoring of certain outcomes over others. • A census collects data from each member of the population. • A statistic is a statement of numerical information about a sample. • A parameter is a statement of numerical information about a population. I‐Basic Concepts Governing Sampling Census versus Sample • Would you use a census or a sample to determine the following: • Project the winner of an election • Calculate a baseball player's batting average • Predict the difference in growth of trees with and without fertilizer in a particular location • Use a market study to determine a new flavor of toothpaste • Generalize an ecological study to other locations • The average score on the first test I‐Basic Concepts Governing Sampling Concepts around bias Bias, Accuracy, Precision 1. Bias:- Systematic distortion 2. Accuracy:- Nearness to true (or population) value 3. Precision:- clustering of sample units to their own mean I‐Basic Concepts Governing Sampling Dealing with Bias • Bias in some form occurs in the collecting of most, if not all, sets of data. • The bias may come from • the portion of the population surveyed • “Height/weight ratio for UW students calculated to predict the Height/weight ratio of Seattleites” • the phrasing of the questions: • “Are you in favor of Seattle banning cell phones in cars? Dial *91 on your cellular phone to vote.“ I‐Basic Concepts Governing Sampling Methods for Choosing Samples • Judgment Sample • Use the opinion of person(s) deemed qualified to choose members of the sample. • Example: to investigate study habits of athletes, ask their coaches and teachers. • Simple Random Selection • Use random numbers to select the sample. • May use random number tables or software • Stratified Sampling • Divide the population into relatively homogenous groups, draw a sample from each group, and take their union. I‐Basic Concepts Governing Sampling Goals of a good sample • from the correct population • chosen in an unbiased way • large enough to reflect total population I‐Basic Concepts Governing Sampling II‐Background Statistics A. B. C. D. E. Subscripts, Summations, Brackets Mean, Variance, Standard Deviation Standard Error of the estimate Coefficient of Variation Covariance, Correlation (on Wednesday) II‐Background Statistics Subscripts A subscript can refer to a unit in a sample, e.g., x1 is height of 1st unit, x2 is height of 2nd, etc., … it can refer to different populations of values, e.g., x1 can refer to height of a tree, while x2 can refer to diameter of a tree, … there can be more than one subscript, e.g., xij may refer to the jth individual of the ith species of tree, where j = 1, …, 50; i = DF, WH, RC II‐Background Statistics Summations To indicate that several (say 6) values of a variable, x, are to be added together, we could write x x x x x x or shorter x x 1 2 1 x6 2 3 4 5 6 shorter still 6 xi i1 or even xi or just x i II‐Background Statistics Brackets Order of operations still apply using “sigma” notation, e.g., 3 x y i i x1 y1 x2 y2 x3 y3 i1 3 3 xi yi x1 x2 x3 y1 y2 y3 i 1 i1 2 3 2 3 2 2 2 x x x x x x1 x2 x3 i.e., i i 1 2 3 i1 i1 2 II‐Background Statistics Mean, Variance, Standard Deviation Mean: 1 n 1 n x xi = xi n i 1 n i 1 xi x n Variance: Standard Deviation: sx2 i1 n 1 s s2 1 n 2 xi n xi i1 i1 n 2 = n 1 2 Mean, Variance, Standard Deviation ‐ Example Let’s say we have measurements on 3 units sampled from a large population. Values are 7, 8, and 12 ft. 1 n 1 x xi 7 8 12 9 ft n i1 3 s 2 s 7 2 82 12 2 1 7 8 12 3 2 s2 2 7 ft 2 7 ft 2 2.64 ft II‐Background Statistics Standard Error of an estimate The most frequently desired estimate is for the mean of a population We need to be able to state how reliable our estimate is Standard error is key for stating our reliability Standard error quantifies the dispersion between an estimate derived from different samples taken from the same population of values Standard deviation of the observations is the square root of their variance, standard error (of an estimate) is the square‐root of the variance of the estimate Standard Error of an estimate ‐ Example Let’s say we have a population of (N = 15) tree heights: 7, 10, 8, 12, 2, 6, 5, 9, 3, 7, 4, 8, 9, 11, 5 from which we take 4 units (n = 4) five separate times … pick 1 (units 10, 8, 3, 11): 7, 9, 8, 4; x 7; s 2.16 pick 2 (units 5, 3, 6, 4) : 2, 8, 6, 12; x 7; s 4.16 x 7.5; s 2.38 pick 3 (units 8, 11, 3, 13): 9, 4, 8, 9; x 5; s 4.08 pick 4 (units 9, 14, 11, 5): 3, 11, 4, 2; x 6.75; s 3.40 pick 5 (units 5, 3, 2, 10) : 2, 8, 10, 7; … there are 1,365 possible unique samples of size 4 !!! II‐Background Statistics Standard Error of an estimate ‐ Example (cont’d) If we used Simple Random Sampling (SRS), there is a very direct way to calculate standard error of the estimated (sample) mean In words: standard deviation divided by the square-root of the sample size In formula: sx s n pick 1: 1.08; 2 : 2.08; 3 : 1.19; 4 : 2.04; 5 : 1.70 Population mean = 7.07; std.dev = 2.91; std.err = 1.457 II‐Background Statistics Coefficient of Variation Puts variability on a relative scale so we can compare the dispersions of values measured in different units (say feet and meters) or the dispersion of different populations (say heights and weights) Ratio of standard deviation to the mean II‐Background Statistics Coefficient of Variation ‐ Example Using the previous tree height population … pick 1: x 7; s 2.16 C s x 2.16 0.308 or, ~ 31 % 7 If inches had been used, x 25.92 s 0.308 C 84 x 84; s 25.92 II‐Background Statistics