Download x - School of Environmental and Forest Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
ESRM 304
Statistics Module: Sampling in Natural Resources Management
Dr. Indroneil Ganguly
Asst. Professor
School of Environmental and Forest Sciences, University of Washington
This we will cover today
• I‐Basic Concepts Governing Sampling
• II‐Background Statistics
Statistics
• Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. ‐ H. G. Wells, author of “War of the Worlds”.
• Definition:
• Statistics is the science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively evaluated.
• Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advances (Davidian, M. and Louis, T. A., 10.1126/science.1218685).
I‐Basic Concepts Governing Sampling
Three Phases of Statistics
• Collect the data
• Analyze the data
• order the data
• graphical displays
• numerical calculations (such as mean and standard dev)
• Interpret the results
• use proper statistical techniques to substantiate or refute hypothesized statements
• match data to the appropriate technique
• determine whether the proper assumptions are satisfied
I‐Basic Concepts Governing Sampling
Two types of statistics
• Descriptive statistics: summarize and describe a characteristic for some group
• Batting Average
• Yards Per Carry
• Test Scores
• Inferential statistics: estimate, infer, predict, or conclude something about a larger group
• Polls
• Ecological Studies
• Market Surveys
I‐Basic Concepts Governing Sampling
Two types of data
• Quantitative data: values recorded on a natural numerical scale
• Weight of subjects in medical sample
• Height of buildings in Chicago
• Temperatures per day at Antarctica Weather Station
• Qualitative data: classified into categories
• Gender of subjects in medical sample
• Political affiliation of respondents in a poll survey
• Class (fresh, soph, jr, sr) of Math 101 students
I‐Basic Concepts Governing Sampling
Relevant Vocabulary
• The population is the entire set of objects (people or things) under consideration.
• A sample is a subset of the population that is available for the analysis.
• A bias is a favoring of certain outcomes over others.
• A census collects data from each member of the population.
• A statistic is a statement of numerical information about a sample.
• A parameter is a statement of numerical information about a population.
I‐Basic Concepts Governing Sampling
Census versus Sample
• Would you use a census or a sample to determine the following:
• Project the winner of an election
• Calculate a baseball player's batting average
• Predict the difference in growth of trees with and without fertilizer in a particular location
• Use a market study to determine a new flavor of toothpaste
• Generalize an ecological study to other locations
• The average score on the first test
I‐Basic Concepts Governing Sampling
Concepts around bias
Bias, Accuracy, Precision
1. Bias:- Systematic distortion
2. Accuracy:- Nearness to true (or population) value
3. Precision:- clustering of sample units to their own mean
I‐Basic Concepts Governing Sampling
Dealing with Bias
• Bias in some form occurs in the collecting of most, if not all, sets of data.
• The bias may come from
• the portion of the population surveyed
• “Height/weight ratio for UW students calculated to predict the Height/weight ratio of Seattleites”
• the phrasing of the questions: • “Are you in favor of Seattle banning cell phones in cars? Dial *91 on your cellular phone to vote.“
I‐Basic Concepts Governing Sampling
Methods for Choosing Samples
• Judgment Sample
• Use the opinion of person(s) deemed qualified to choose members of the sample.
• Example: to investigate study habits of athletes, ask their coaches and teachers.
• Simple Random Selection
• Use random numbers to select the sample.
• May use random number tables or software
• Stratified Sampling
• Divide the population into relatively homogenous groups, draw a sample from each group, and take their union.
I‐Basic Concepts Governing Sampling
Goals of a good sample
• from the correct population
• chosen in an unbiased way
• large enough to reflect total population
I‐Basic Concepts Governing Sampling
II‐Background Statistics
A.
B.
C.
D.
E.
Subscripts, Summations, Brackets
Mean, Variance, Standard Deviation
Standard Error of the estimate
Coefficient of Variation Covariance, Correlation (on Wednesday)
II‐Background Statistics
Subscripts
A subscript can refer to a unit in a sample, e.g., x1 is height of 1st unit, x2 is height of 2nd, etc.,
… it can refer to different populations of values, e.g., x1 can refer to height of a tree, while x2 can refer to diameter of a tree,
… there can be more than one subscript, e.g., xij may refer to the jth
individual of the ith species of tree, where j = 1, …, 50; i = DF, WH, RC
II‐Background Statistics
Summations
To indicate that several (say 6) values of a variable, x, are to be added together, we could write
x x x x x x
or shorter
x  x
1

2
1
   x6 
2
3
4
5
6

shorter still
6
 xi
i1
or even  xi
or just x
i
II‐Background Statistics
Brackets
Order of operations still apply using “sigma” notation, e.g., 3
x y
i
i
x1 y1  x2 y2  x3 y3
i1
 3  3 
  xi    yi   x1  x2  x3 y1  y2  y3
i 1



i1
2

 
 3 2
 3 
2
2
2

x
x
x

x

x
 x1  x2  x3
i.e.,


i
i




1
2
3
i1
i1

2
II‐Background Statistics
Mean, Variance, Standard Deviation
Mean:
1 n 
1 n
x    xi  =  xi
n  i 1 
n i 1
  xi  x 
n
Variance:
Standard Deviation:
sx2 
i1
 n  1
s  s2
1 n 
2
 xi  n   xi 
i1
i1
n
2
=
 n  1
2
Mean, Variance, Standard Deviation ‐ Example
Let’s say we have measurements on 3 units sampled from a large population. Values are 7, 8, and 12 ft.


1 n
1
x   xi 
7  8  12  9 ft
n i1
3
s 
2
s

 
7 2  82  12 2 
1
7  8  12
3
2
s2 

2
 7 ft 2
7 ft 2  2.64 ft
II‐Background Statistics
Standard Error of an estimate
The most frequently desired estimate is for the mean of a population

We need to be able to state how reliable our estimate is

Standard error is key for stating our reliability

Standard error quantifies the dispersion between an estimate derived from different samples taken from the same population of values

Standard deviation of the observations is the square root of their variance, standard error (of an estimate) is the square‐root of the variance of the estimate
Standard Error of an estimate ‐ Example
Let’s say we have a population of (N = 15) tree heights:
7, 10, 8, 12, 2, 6, 5, 9, 3, 7, 4, 8, 9, 11, 5 from which we take 4 units (n = 4) five separate times …
pick 1 (units 10, 8, 3, 11): 7, 9, 8, 4; x  7; s  2.16
pick 2 (units 5, 3, 6, 4) : 2, 8, 6, 12; x  7; s  4.16
x  7.5; s  2.38
pick 3 (units 8, 11, 3, 13): 9, 4, 8, 9; x  5; s  4.08
pick 4 (units 9, 14, 11, 5): 3, 11, 4, 2; x  6.75; s  3.40
pick 5 (units 5, 3, 2, 10) : 2, 8, 10, 7; … there are 1,365 possible unique samples of size 4 !!!
II‐Background Statistics
Standard Error of an estimate ‐ Example (cont’d)
If we used Simple Random Sampling (SRS), there is a very direct way to calculate standard error of the estimated (sample) mean
In words: standard deviation divided by the square-root of
the sample size
In formula:
sx 
s
n
pick 1: 1.08; 2 : 2.08; 3 : 1.19; 4 : 2.04; 5 : 1.70
Population mean = 7.07; std.dev = 2.91; std.err = 1.457
II‐Background Statistics
Coefficient of Variation

Puts variability on a relative scale so we can compare the dispersions of values measured in different units (say feet and meters) or the dispersion of different populations (say heights and weights)

Ratio of standard deviation to the mean
II‐Background Statistics
Coefficient of Variation ‐ Example
Using the previous tree height population …
pick 1:
x  7; s  2.16
C
s

x
2.16
 0.308 or, ~ 31 %
7
If inches had been used, x
25.92
s

 0.308
C
84
x
 84; s  25.92
II‐Background Statistics