Download Review for Exam I

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Mathematical Notation
• The mathematical notation used most often in
this course is the summation notation
• The Greek letter is used as a shorthand way of
indicating that a sum is to be taken:
in
x
i 1
i
The expression is equivalent to:
x1  x2      xn
Summation Notation: Simplification
• A summation will often be written leaving out the
upper and/or lower limits of the summation,
assuming that all of the terms available are to be
summed
i n
n
 x   x  x  x
i 1
i
i 1
i
i
i
i
Summation Notation: Rules
• Rule I: Summing a constant n times yields a result
of na:
n
 a  a  a      a  na
i 1
• Here we are simply using the summation notation
to carry out a multiplication, e.g.:
5
4
i 1
 4  4  4  4  4  4  5  20
Summation Notation: Rules
• Rule II: Constants may be taken outside of the
summation sign
n
 ax
i 1
n
 ax
i 1
i
i
n
 a  xi
i 1
 ax1  ax2      axn
n
 a( x1  x2      xn )  a  xi
i 1
Summation Notation: Rules
• Rule III: The order in which addition operations are
carried out is unimportant
n
n
n
 (x  y )   x   y
i 1
i
i
i 1
i
i 1
i
 ( x1  x2  x3     xn1  xn )
+
( y1  y2  y3      yn1  yn )
Summation Notation: Rules
• Rule IV: Exponents are handled differently depending
on whether they are applied to the observation term
or the whole sum
n
x
i 1
k
i
 x  x    x
k
1
k
k
2
k
n


k
  xi   ( x1  x2      xn )
 i 1 
n
Summation Notation: Rules
• Rule V: Products are handled much like exponents
n
x y
i
i 1
i
 ( x1 y1  x2 y2      xn yn )
n
n
n
x y  x y
i 1
n
n
x y
i 1
i
i 1
i
i
i
i 1
i
i 1
i
 ( x1  x2      xn )  ( y1  y2      yn )
Pi Notation
• Whereas the summation notation refers to the
addition of terms, the product notation applies to
the multiplication of terms
• It is denoted by the following capital Green letter
(pi), and is used in the same way as the
summation notation
n
x
i
 x1 x2    xn
i 1
n
 (x  y )  (x
i
i 1
i
1
 y1 )( x2  y2 )    ( xn  yn )
Factorial
• The factorial of a positive integer, n, is equal to
the product of the first n integers
• Factorials can be denoted by an exclamation
point
n
n!  i
i 1
5
5! 5  4  3  2 1  120   i
i 1
• There is also a convention that 0! = 1
• Factorials are not defined for negative integers
or nonintegers
Combinations
• Combinations refer to the number of possible
outcomes that particular probability experiments
may have
• Specifically, the number of ways that r items may
be chosen from a group of n items is denoted by:
n
n!
  
 r  r!(n  r )!
or
n!
C (n, r ) 
r!(n  r )!
Descriptive Statistics
• Measures of central tendency
– Measures of the location of the middle or the
center of a distribution
– Mean, median, mode
• Measures of dispersion
– Describe how the observations are distributed
– Variance, standard deviation, range, etc
Measures of Central Tendency – Mean
• Mean – Most commonly used measure of central
tendency
• Note: Assuming that each observation is equally
significant
n
• Sensitive to outliers
x
x
i 1
n
i
Measures of Central Tendency – Mean
• A standard geographic application of the mean
is to locate the center (centroid) of a spatial
distribution
• Assign to each member a gridded coordinate and
calculating the mean value in each coordinate
direction --> Bivariate mean or mean center
• For a set of (x, y) coordinates, the mean center
is calculated as:
(x, y)
x
n
n
 xi
i 1
n
y
y
i 1
n
i
Weighted Mean
• We can also calculate a weighted mean using
some weighting factor:
n
x
w x
i 1
n
i i
w
i 1
i
Measures of Central Tendency – Median
• Median – This is the value of a variable such that half
of the observations are above and half are below this
value i.e. this value divides the distribution into two
groups of equal size
• When the number of observations is odd, the median is
simply equal to the middle value
• When the number of observations is even, we take the
median to be the average of the two values in the
middle of the distribution
Measures of Central Tendency – Mode
• Mode - This is the most frequently occurring value
in the distribution
• This is the only measure of central tendency that
can be used with nominal data
• The mode allows the distribution's peak to be
located quickly
Which one is better: mean, median,
or mode?
• Most often, the mean is selected by default
• The mean's key advantage is that it is sensitive
to any change in the value of any observation
• The mean's disadvantage is that it is very
sensitive to outliers
• We really must consider the nature of the data,
the distribution, and our goals to choose
properly
Some Characteristics of Data
• Not all data is the same. There are some limitations
as to what can and cannot be done with a data set,
depending on the characteristics of the data
• Some key characteristics that must be considered
are:
• A. Continuous vs. Discrete
• B. Grouped vs. Individual
• C. Scale of Measurement
C. Scales of Measurement
• The data used in statistical analyses can be
divided into four types:
1. The Nominal Scale
2. The Ordinal Scale
3. The interval Scale
4. The Ratio Scale
As we progress through
these scales, the types
of data they describe
have increasing
information content
The Nominal Scale
• Nominal scale data are data that can simply be
broken down into categories, i.e., having to do
with names or types
• Dichotomous or binary nominal data has just
two types, e.g., yes/no, female/male, is/is not,
hot/cold, etc
• Multichotomous data has more than two types,
e.g., vegetation types, soil types, counties, eye
color, etc
• Not a scale in the sense that categories cannot
be ranked or ordered (no greater/less than)
The Ordinal Scale
• Ordinal scale data can be categorized AND can
be placed in an order, i.e., categories that can be
assigned a relative importance and can be ranked
such that numerical category values have
– star-system restaurant rankings
5 stars > 4 stars, 4 stars > 3 stars, 5 stars > 2 stars
• BUT ordinal data still are not scalar in the sense
that differences between categories do not have a
quantitative meaning
– i.e., a 5 star restaurant is not superior to a 4 star
restaurant by the same amount as a 4 star restaurant is
than a 3 star restaurant
The Interval Scale
• Interval scale data take the notion of ranking
items in order one step further, since the distance
between adjacent points on the scale are equal
• For instance, the Fahrenheit scale is an interval
scale, since each degree is equal but there is no
absolute zero point.
• This means that although we can add and
subtract degrees (100° is 10° warmer than 90°),
we cannot multiply values or create ratios (100°
is not twice as warm as 50°)
The Ratio Scale
• Similar to the interval scale, but with the addition
of having a meaningful zero value, which allows
us to compare values using multiplication and
division operations, e.g., precipitation, weights,
heights, etc
• e.g., rain – We can say that 2 inches of rain is
twice as much rain as 1 inch of rain because this
is a ratio scale measurement
• e.g., age – a 100-year old person is indeed twice
as old as a 50-year old one
Scales of Measurements & Measures of
Central Tendency
• The mean is valid only for interval data or ratio
data.
• The median can be determined for ordinal data as
well as interval and ratio data.
• The mode can be used with nominal, ordinal,
interval, and ratio data
• Mode is the only measure of central tendency that
can be used with nominal data
Measures of Dispersion
• Measures of dispersion are concerned with the
distribution of values around the mean in data:
1. Range
2. Interquartile range
3. Variance
4. Standard deviation
5. z-scores
6. Coefficient of Variation (CV)
Measures of Dispersion - Range
1. Range – this is the most simply formulated of all
measures of dispersion
• Given a set of measurements x1, x2, x3, … ,xn-1, xn ,
the range is defined as the difference between the
largest and smallest values:
Range = xmax – xmin
• This is another descriptive measure that is
vulnerable to the influence of outliers in a data
set, which result in a range that is not really
descriptive of most of the data
Measures of Dispersion –
Interquartile Range
• Quartiles – We can divide distributions into four
parts each containing 25% of observations
• Percentiles – each contains 1% of all values
• Interquartile range – The difference between the
25th and 75th percentiles
Measures of Dispersion – Variance
• Variance is formulated as the sum of
squares of statistical distances divided by
the population size or the sample size minus
one
n
s 
2
 (x  x)
i 1
i
n 1
2
Measures of Dispersion – Standard
Deviation
• Standard deviation is equal to the square root of
the variance
n
s
 (x  x)
i 1
2
i
n 1
• Compared with variance, standard deviation
has a scale closer to that used for the mean and
the original data
Measures of Dispersion – z-score
• Since data come from distributions with different
means and difference degrees of variability, it is
common to standardize observations
• One way to do this is to transform each observation
into a z-score
xi  x
z
s
• May be interpreted as the number of standard
deviations an observation is away from the mean
Measures of Dispersion – Coefficient
of Variation
• Coefficient of variation (CV) measures the spread
of a set of data as a proportion of its mean.
• It is the ratio of the sample standard deviation to
the sample mean
s
CV   100%
x
• It is sometimes expressed as a percentage
• There is an equivalent definition for the coefficient
of variation of a population
Further Moments – Skewness
• Skewness measures the degree of asymmetry
exhibited by the data
n
skewness 
 (x  x)
i 1
3
i
ns
3
• If skewness equals zero, the histogram is
symmetric about the mean
• Positive skewness vs negative skewness
n
Mode
Median
skewness 
Mean
A
B
3
(
x

x
)
 i
i 1
ns
3
Further Moments – Kurtosis
• Kurtosis measures how peaked the histogram is
n
kurtosis 
 (x  x)
i
i
ns
4
4
3
• The kurtosis of a normal distribution is 0
• Kurtosis characterizes the relative peakedness
or flatness of a distribution compared to the
normal distribution
Source: http://espse.ed.psu.edu/Statistics/Chapters/Chapter3/Chap3.html
Functions of a Histogram
• The function of a histogram is to graphically
summarize the distribution of a data set
• The histogram graphically shows the following:
1. Center (i.e., the location) of the data
2. Spread (i.e., the scale) of the data
3. Skewness of the data
4. Kurtosis of the data
4. Presence of outliers
5. Presence of multiple modes in the data.
Box Plots
• We can also use a box plot to graphically
summarize a data set
• A box plot represents a graphical summary of
what is sometimes called a “five-number
summary” of the distribution
– Minimum
– Maximum
– 25th percentile
– 75th percentile
– Median
• Interquartile Range (IQR)
max.
median
min.
Rogerson, p. 8.
75th
%-ile
25th
%-ile
Probability-Related Concepts
• An event – Any phenomenon you can observe that
can have more than one outcome (e.g., flipping a
coin)
• An outcome – Any unique condition that can be
the result of an event (e.g., flipping a coin: heads or
tails), a.k.a simple event or sample points
• Sample space – The set of all possible outcomes
associated with an event
– e.g., flip a coin – heads (H) and tails (T)
– e.g., flip a coin twice – HH, HT, TH, TT
Probability-Related Concepts
• Associated with each possible outcome in a
sample space is a probability
• Probability is a measure of the likelihood of
each possible outcome
• Probability measures the degree of uncertainty
• Each of the probabilities is greater than or equal
to zero, and less than or equal to one
• The sum of probabilities over the sample space
is equal to one
How To Assign Probabilities
to Experimental Outcomes?
• There are numerous ways to assign probabilities
to the elements of sample spaces
• Classical method assigns probabilities based on
the assumption of equally likely outcomes
• Relative frequency method assigns probabilities
based on experimentation or historical data
• Subjective method assigns probabilities based on
the assignor’s judgment or belief
Probability Rules
• Rules for combining multiple probabilities
• A useful aid is the Venn diagram - depicts multiple
probabilities and their relations using a graphical
depiction of sets
• The rectangle that forms the area of
the Venn Diagram represents the
sample (or probability) space, which
we have defined above
• Figures that appear within the
sample space are sets that represent
events in the probability context, &
their area is proportional to their
probability (full sample space = 1)
A
B
Discrete & Continuous Variables
• Discrete variable – A variable that can take on
only a finite number of values
– # of malls within cities
– # of vegetation types within geographic regions
– # population
• Continuous variable – A variable that can take
on an infinite number of values (all real number
values)
– Elevation (e.g., [500.0, 1000.0])
– Temperature (e.g., [10.0, 20.0])
– Precipitation (e.g., [100.0, 500.0]
Probability Mass Functions
• A discrete random variable can be described by a
probability mass function (pmf)
• A probability mass function is usually represented
by a table, graph, or equation
• The probability of any outcome must satisfy:
i = 1, 2, 3, …, k-1, k
0 <= p(X=xi) <= 1
• The sum of all probabilities in the sample space
k
must total one, i.e.
 p( X  x )  1
i 1
i
a
b
f(x)
x
• The probability of a continuous random variable
X within an arbitrary interval is given by:
b
p(a  X  b)   f ( x)dx
a
• Simply calculate the shaded shaded area  if we
know the density function, we could use calculus
Discrete Probability Distributions
• Discrete probability distributions
– The Uniform Distribution
– The Binomial Distribution
– The Poisson Distribution
• Each is appropriately applied in certain
situations and to particular phenomena
The Binomial Distribution
• Provides information about the probability of the
repetition of events when there are only two
possible outcomes,
– e.g. heads or tails, left or right, success or failure, rain
or no rain …
– Events with multiple outcomes may be simplified as
events with two outcomes (e.g., forest or non-forest)
• Characterizing the probability of a proportion of
the events having a certain outcome over a
specified number of events
The Binomial Distribution
• A general formula for calculating the probability
of x successes (n trials & a probability p of
success:
P(x) = C(n,x) * px * (1 - p)n - x
• where C(n,x) is the number of possible
combinations of x successes and (n –x) failures:
n!
C(n,x) =
x! * (n – x)!
Source: http://home.xnet.com/~fidler/triton/math/review/mat170/probty/p-dist/discrete/Binom/binom1.htm
The Poisson Distribution
• In the 1830s, S.D. Poisson described a distribution
with these characteristics
• Describing the number of events that will occur
within a certain area or duration (e.g. # of
meteorite impacts per state, # of tornados per year,
# of hurricanes in NC)
• Poisson distribution’s characteristics:
• 1. It is used to count the number of occurrences of
an event within a given unit of time, area, volume,
etc. (therefore a discrete distribution)
The Poisson Distribution
• 2. The probability that an event will occur within
a given unit must be the same for all units (i.e.
the underlying process governing the
phenomenon must be invariant)
• 3. The number of events occurring per unit must
be independent of the number of events
occurring in other units (no interactions)
• 4. The mean or expected number of events per
unit (λ) is found by past experience (observations)
The Poisson Distribution
• Poisson formulated his distribution as follows:
P(x) =
-l
e
*
x!
x
l
where e = 2.71828 (base of the natural logarithm)
λ = the mean or expected value
x = 1, 2, …, n – 1, n # of occurrences
x! = x * (x – 1) * (x – 2) * … * 2 * 1
• To calculate a Poisson distribution, you must
know λ
The Poisson Distribution
• Procedure for finding Poisson probabilities and
expected frequencies:
• (1) Set up a table with five columns as on the
previous slide
• (2) Multiply the values of x by their observed
frequencies (x * Fobs)
• (3) Sum the columns of Fobs (observed
frequency) and x * Fobs
• (4) Compute λ = Σ (x * Fobs) / Σ Fobs
• (5) Compute P(x) values using the equation or a
table
• (6) Compute the values of Fexp = P(x) * Σ Fobs
Source: http://www.mpimet.mpg.de/~vonstorch.jinsong/stat_vls/s3.pdf
The Normal Distribution
• The probability density function of the normal
distribution:
1
f ( x) 
e
 2
x 2 

  0.5 (  ) 


• You can see how the value of the distribution at x
is a f(x) of the mean and standard deviation
Standardization of Normal Distributions
• The standardization is achieved by converting
the data into z-scores
z-score
xi  x

s
• The z-score is the means that is used to transform our
normal distribution into a standard normal distribution
(  = 0 &  = 1)
Finding the P(x) for Various Intervals
1.
a
P(Z  a) = (table value)
• Table gives the value of P(x) in the
tail above a
a
P(Z  a) = [1 – (table value)]
•Total Area under the curve = 1, and
we subtract the area of the tail
2.
3.
a
P(0  Z  a) = [0.5 – (table value)]
•Total Area under the curve = 1, thus
the area above x is equal to 0.5, and
we subtract the area of the tail
Finding the P(x) for Various Intervals
4.
a
5.
P(Z  a) = (table value)
• Table gives the value of P(x) in the
tail below a, equivalent to P(Z  a)
when a is positive
a
P(Z  a) = [1 – (table value)]
• This is equivalent to P(Z  a) when
a is positive
a
P(a  Z  0) = [0.5 – (table value)]
• This is equivalent to P(0  Z  a)
when a is positive
6.
Finding the P(x) for Various Intervals
P(a  Z  b) if a < 0 and b > 0
7.
b
a
= (0.5 – P(Z<a)) + (0.5 – P(Z>b))
= 1 – P(Z<a) – P(Z>b)
or
= [0.5 – (table value for a)] +
[0.5 – (table value for b)]
= [1 – {(table value for a) +
(table value for b)}]
• With this set of building blocks, you should be able to
calculate the probability for any interval using a standard
normal table
Confidence Interval & Probability
• A confidence interval is expressed in terms of a range
of values and a probability (e.g. my lectures are
between 60 and 70 minutes long 95% of the time)
• For this example, the confidence level that I used is the
95% level, which is the most commonly used
confidence level
• Other commonly selected confidence levels are 90%
and 99%, and the choice of which confidence level to
use when constructing an interval often depends on the
application
The Central Limit Theorem
• Given a distribution with a mean μ and variance σ2, the
sampling distribution of the mean approaches a
normal distribution with a mean (μ) and a variance
σ2/n as n, the sample size, increases
• The amazing and counter- intuitive thing about the
central limit theorem is that no matter what the shape
of the original (parent) distribution, the sampling
distribution of the mean approaches a normal
distribution
Confidence Intervals for the Mean
• Generally, a (1- α)*100% confidence interval
around the sample mean is:
margin of
Standard
error
error

 
 

pr  x  z
     x  z
  1  
n
n 


• Where zα is the value taken from the z-table that
is associated with a fraction α of the weight in the
tails (and therefore α/2 is the area in each tail)
Constructing a Confidence Interval
• 1. Select our desired confidence level (1-α)*100%
• 2. Calculate α and α/2
• 3. Look up the corresponding z-score in a
standard normal table
• 4. Multiply the z-score by the standard error to
find the margin of error
• 5. Find the interval by adding and subtracting this
product from the mean
t-distribution
• The central limit theorem applies when the sample size
is “large”, only then will the distribution of means
possess a normal distribution
• When the sample size is not “large”, the frequency
distribution of the sample means has what is known as
the t-distribution
• t-distribution is symmetric, like the normal distribution,
but has a slightly different shape
• The t distribution has relatively more scores in its tails
than does the normal distribution. It is therefore
leptokurtic
Assignment III
• Probability, Discrete, and Continuous Distributions
• Due: 03/07/2006 (Tuesday)
• http://www.unc.edu/courses/2006spring/geog/090/001/www/