Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mathematical Notation • The mathematical notation used most often in this course is the summation notation • The Greek letter is used as a shorthand way of indicating that a sum is to be taken: in x i 1 i The expression is equivalent to: x1 x2 xn Summation Notation: Simplification • A summation will often be written leaving out the upper and/or lower limits of the summation, assuming that all of the terms available are to be summed i n n x x x x i 1 i i 1 i i i i Summation Notation: Rules • Rule I: Summing a constant n times yields a result of na: n a a a a na i 1 • Here we are simply using the summation notation to carry out a multiplication, e.g.: 5 4 i 1 4 4 4 4 4 4 5 20 Summation Notation: Rules • Rule II: Constants may be taken outside of the summation sign n ax i 1 n ax i 1 i i n a xi i 1 ax1 ax2 axn n a( x1 x2 xn ) a xi i 1 Summation Notation: Rules • Rule III: The order in which addition operations are carried out is unimportant n n n (x y ) x y i 1 i i i 1 i i 1 i ( x1 x2 x3 xn1 xn ) + ( y1 y2 y3 yn1 yn ) Summation Notation: Rules • Rule IV: Exponents are handled differently depending on whether they are applied to the observation term or the whole sum n x i 1 k i x x x k 1 k k 2 k n k xi ( x1 x2 xn ) i 1 n Summation Notation: Rules • Rule V: Products are handled much like exponents n x y i i 1 i ( x1 y1 x2 y2 xn yn ) n n n x y x y i 1 n n x y i 1 i i 1 i i i i 1 i i 1 i ( x1 x2 xn ) ( y1 y2 yn ) Pi Notation • Whereas the summation notation refers to the addition of terms, the product notation applies to the multiplication of terms • It is denoted by the following capital Green letter (pi), and is used in the same way as the summation notation n x i x1 x2 xn i 1 n (x y ) (x i i 1 i 1 y1 )( x2 y2 ) ( xn yn ) Factorial • The factorial of a positive integer, n, is equal to the product of the first n integers • Factorials can be denoted by an exclamation point n n! i i 1 5 5! 5 4 3 2 1 120 i i 1 • There is also a convention that 0! = 1 • Factorials are not defined for negative integers or nonintegers Combinations • Combinations refer to the number of possible outcomes that particular probability experiments may have • Specifically, the number of ways that r items may be chosen from a group of n items is denoted by: n n! r r!(n r )! or n! C (n, r ) r!(n r )! Descriptive Statistics • Measures of central tendency – Measures of the location of the middle or the center of a distribution – Mean, median, mode • Measures of dispersion – Describe how the observations are distributed – Variance, standard deviation, range, etc Measures of Central Tendency – Mean • Mean – Most commonly used measure of central tendency • Note: Assuming that each observation is equally significant n • Sensitive to outliers x x i 1 n i Measures of Central Tendency – Mean • A standard geographic application of the mean is to locate the center (centroid) of a spatial distribution • Assign to each member a gridded coordinate and calculating the mean value in each coordinate direction --> Bivariate mean or mean center • For a set of (x, y) coordinates, the mean center is calculated as: (x, y) x n n xi i 1 n y y i 1 n i Weighted Mean • We can also calculate a weighted mean using some weighting factor: n x w x i 1 n i i w i 1 i Measures of Central Tendency – Median • Median – This is the value of a variable such that half of the observations are above and half are below this value i.e. this value divides the distribution into two groups of equal size • When the number of observations is odd, the median is simply equal to the middle value • When the number of observations is even, we take the median to be the average of the two values in the middle of the distribution Measures of Central Tendency – Mode • Mode - This is the most frequently occurring value in the distribution • This is the only measure of central tendency that can be used with nominal data • The mode allows the distribution's peak to be located quickly Which one is better: mean, median, or mode? • Most often, the mean is selected by default • The mean's key advantage is that it is sensitive to any change in the value of any observation • The mean's disadvantage is that it is very sensitive to outliers • We really must consider the nature of the data, the distribution, and our goals to choose properly Some Characteristics of Data • Not all data is the same. There are some limitations as to what can and cannot be done with a data set, depending on the characteristics of the data • Some key characteristics that must be considered are: • A. Continuous vs. Discrete • B. Grouped vs. Individual • C. Scale of Measurement C. Scales of Measurement • The data used in statistical analyses can be divided into four types: 1. The Nominal Scale 2. The Ordinal Scale 3. The interval Scale 4. The Ratio Scale As we progress through these scales, the types of data they describe have increasing information content The Nominal Scale • Nominal scale data are data that can simply be broken down into categories, i.e., having to do with names or types • Dichotomous or binary nominal data has just two types, e.g., yes/no, female/male, is/is not, hot/cold, etc • Multichotomous data has more than two types, e.g., vegetation types, soil types, counties, eye color, etc • Not a scale in the sense that categories cannot be ranked or ordered (no greater/less than) The Ordinal Scale • Ordinal scale data can be categorized AND can be placed in an order, i.e., categories that can be assigned a relative importance and can be ranked such that numerical category values have – star-system restaurant rankings 5 stars > 4 stars, 4 stars > 3 stars, 5 stars > 2 stars • BUT ordinal data still are not scalar in the sense that differences between categories do not have a quantitative meaning – i.e., a 5 star restaurant is not superior to a 4 star restaurant by the same amount as a 4 star restaurant is than a 3 star restaurant The Interval Scale • Interval scale data take the notion of ranking items in order one step further, since the distance between adjacent points on the scale are equal • For instance, the Fahrenheit scale is an interval scale, since each degree is equal but there is no absolute zero point. • This means that although we can add and subtract degrees (100° is 10° warmer than 90°), we cannot multiply values or create ratios (100° is not twice as warm as 50°) The Ratio Scale • Similar to the interval scale, but with the addition of having a meaningful zero value, which allows us to compare values using multiplication and division operations, e.g., precipitation, weights, heights, etc • e.g., rain – We can say that 2 inches of rain is twice as much rain as 1 inch of rain because this is a ratio scale measurement • e.g., age – a 100-year old person is indeed twice as old as a 50-year old one Scales of Measurements & Measures of Central Tendency • The mean is valid only for interval data or ratio data. • The median can be determined for ordinal data as well as interval and ratio data. • The mode can be used with nominal, ordinal, interval, and ratio data • Mode is the only measure of central tendency that can be used with nominal data Measures of Dispersion • Measures of dispersion are concerned with the distribution of values around the mean in data: 1. Range 2. Interquartile range 3. Variance 4. Standard deviation 5. z-scores 6. Coefficient of Variation (CV) Measures of Dispersion - Range 1. Range – this is the most simply formulated of all measures of dispersion • Given a set of measurements x1, x2, x3, … ,xn-1, xn , the range is defined as the difference between the largest and smallest values: Range = xmax – xmin • This is another descriptive measure that is vulnerable to the influence of outliers in a data set, which result in a range that is not really descriptive of most of the data Measures of Dispersion – Interquartile Range • Quartiles – We can divide distributions into four parts each containing 25% of observations • Percentiles – each contains 1% of all values • Interquartile range – The difference between the 25th and 75th percentiles Measures of Dispersion – Variance • Variance is formulated as the sum of squares of statistical distances divided by the population size or the sample size minus one n s 2 (x x) i 1 i n 1 2 Measures of Dispersion – Standard Deviation • Standard deviation is equal to the square root of the variance n s (x x) i 1 2 i n 1 • Compared with variance, standard deviation has a scale closer to that used for the mean and the original data Measures of Dispersion – z-score • Since data come from distributions with different means and difference degrees of variability, it is common to standardize observations • One way to do this is to transform each observation into a z-score xi x z s • May be interpreted as the number of standard deviations an observation is away from the mean Measures of Dispersion – Coefficient of Variation • Coefficient of variation (CV) measures the spread of a set of data as a proportion of its mean. • It is the ratio of the sample standard deviation to the sample mean s CV 100% x • It is sometimes expressed as a percentage • There is an equivalent definition for the coefficient of variation of a population Further Moments – Skewness • Skewness measures the degree of asymmetry exhibited by the data n skewness (x x) i 1 3 i ns 3 • If skewness equals zero, the histogram is symmetric about the mean • Positive skewness vs negative skewness n Mode Median skewness Mean A B 3 ( x x ) i i 1 ns 3 Further Moments – Kurtosis • Kurtosis measures how peaked the histogram is n kurtosis (x x) i i ns 4 4 3 • The kurtosis of a normal distribution is 0 • Kurtosis characterizes the relative peakedness or flatness of a distribution compared to the normal distribution Source: http://espse.ed.psu.edu/Statistics/Chapters/Chapter3/Chap3.html Functions of a Histogram • The function of a histogram is to graphically summarize the distribution of a data set • The histogram graphically shows the following: 1. Center (i.e., the location) of the data 2. Spread (i.e., the scale) of the data 3. Skewness of the data 4. Kurtosis of the data 4. Presence of outliers 5. Presence of multiple modes in the data. Box Plots • We can also use a box plot to graphically summarize a data set • A box plot represents a graphical summary of what is sometimes called a “five-number summary” of the distribution – Minimum – Maximum – 25th percentile – 75th percentile – Median • Interquartile Range (IQR) max. median min. Rogerson, p. 8. 75th %-ile 25th %-ile Probability-Related Concepts • An event – Any phenomenon you can observe that can have more than one outcome (e.g., flipping a coin) • An outcome – Any unique condition that can be the result of an event (e.g., flipping a coin: heads or tails), a.k.a simple event or sample points • Sample space – The set of all possible outcomes associated with an event – e.g., flip a coin – heads (H) and tails (T) – e.g., flip a coin twice – HH, HT, TH, TT Probability-Related Concepts • Associated with each possible outcome in a sample space is a probability • Probability is a measure of the likelihood of each possible outcome • Probability measures the degree of uncertainty • Each of the probabilities is greater than or equal to zero, and less than or equal to one • The sum of probabilities over the sample space is equal to one How To Assign Probabilities to Experimental Outcomes? • There are numerous ways to assign probabilities to the elements of sample spaces • Classical method assigns probabilities based on the assumption of equally likely outcomes • Relative frequency method assigns probabilities based on experimentation or historical data • Subjective method assigns probabilities based on the assignor’s judgment or belief Probability Rules • Rules for combining multiple probabilities • A useful aid is the Venn diagram - depicts multiple probabilities and their relations using a graphical depiction of sets • The rectangle that forms the area of the Venn Diagram represents the sample (or probability) space, which we have defined above • Figures that appear within the sample space are sets that represent events in the probability context, & their area is proportional to their probability (full sample space = 1) A B Discrete & Continuous Variables • Discrete variable – A variable that can take on only a finite number of values – # of malls within cities – # of vegetation types within geographic regions – # population • Continuous variable – A variable that can take on an infinite number of values (all real number values) – Elevation (e.g., [500.0, 1000.0]) – Temperature (e.g., [10.0, 20.0]) – Precipitation (e.g., [100.0, 500.0] Probability Mass Functions • A discrete random variable can be described by a probability mass function (pmf) • A probability mass function is usually represented by a table, graph, or equation • The probability of any outcome must satisfy: i = 1, 2, 3, …, k-1, k 0 <= p(X=xi) <= 1 • The sum of all probabilities in the sample space k must total one, i.e. p( X x ) 1 i 1 i a b f(x) x • The probability of a continuous random variable X within an arbitrary interval is given by: b p(a X b) f ( x)dx a • Simply calculate the shaded shaded area if we know the density function, we could use calculus Discrete Probability Distributions • Discrete probability distributions – The Uniform Distribution – The Binomial Distribution – The Poisson Distribution • Each is appropriately applied in certain situations and to particular phenomena The Binomial Distribution • Provides information about the probability of the repetition of events when there are only two possible outcomes, – e.g. heads or tails, left or right, success or failure, rain or no rain … – Events with multiple outcomes may be simplified as events with two outcomes (e.g., forest or non-forest) • Characterizing the probability of a proportion of the events having a certain outcome over a specified number of events The Binomial Distribution • A general formula for calculating the probability of x successes (n trials & a probability p of success: P(x) = C(n,x) * px * (1 - p)n - x • where C(n,x) is the number of possible combinations of x successes and (n –x) failures: n! C(n,x) = x! * (n – x)! Source: http://home.xnet.com/~fidler/triton/math/review/mat170/probty/p-dist/discrete/Binom/binom1.htm The Poisson Distribution • In the 1830s, S.D. Poisson described a distribution with these characteristics • Describing the number of events that will occur within a certain area or duration (e.g. # of meteorite impacts per state, # of tornados per year, # of hurricanes in NC) • Poisson distribution’s characteristics: • 1. It is used to count the number of occurrences of an event within a given unit of time, area, volume, etc. (therefore a discrete distribution) The Poisson Distribution • 2. The probability that an event will occur within a given unit must be the same for all units (i.e. the underlying process governing the phenomenon must be invariant) • 3. The number of events occurring per unit must be independent of the number of events occurring in other units (no interactions) • 4. The mean or expected number of events per unit (λ) is found by past experience (observations) The Poisson Distribution • Poisson formulated his distribution as follows: P(x) = -l e * x! x l where e = 2.71828 (base of the natural logarithm) λ = the mean or expected value x = 1, 2, …, n – 1, n # of occurrences x! = x * (x – 1) * (x – 2) * … * 2 * 1 • To calculate a Poisson distribution, you must know λ The Poisson Distribution • Procedure for finding Poisson probabilities and expected frequencies: • (1) Set up a table with five columns as on the previous slide • (2) Multiply the values of x by their observed frequencies (x * Fobs) • (3) Sum the columns of Fobs (observed frequency) and x * Fobs • (4) Compute λ = Σ (x * Fobs) / Σ Fobs • (5) Compute P(x) values using the equation or a table • (6) Compute the values of Fexp = P(x) * Σ Fobs Source: http://www.mpimet.mpg.de/~vonstorch.jinsong/stat_vls/s3.pdf The Normal Distribution • The probability density function of the normal distribution: 1 f ( x) e 2 x 2 0.5 ( ) • You can see how the value of the distribution at x is a f(x) of the mean and standard deviation Standardization of Normal Distributions • The standardization is achieved by converting the data into z-scores z-score xi x s • The z-score is the means that is used to transform our normal distribution into a standard normal distribution ( = 0 & = 1) Finding the P(x) for Various Intervals 1. a P(Z a) = (table value) • Table gives the value of P(x) in the tail above a a P(Z a) = [1 – (table value)] •Total Area under the curve = 1, and we subtract the area of the tail 2. 3. a P(0 Z a) = [0.5 – (table value)] •Total Area under the curve = 1, thus the area above x is equal to 0.5, and we subtract the area of the tail Finding the P(x) for Various Intervals 4. a 5. P(Z a) = (table value) • Table gives the value of P(x) in the tail below a, equivalent to P(Z a) when a is positive a P(Z a) = [1 – (table value)] • This is equivalent to P(Z a) when a is positive a P(a Z 0) = [0.5 – (table value)] • This is equivalent to P(0 Z a) when a is positive 6. Finding the P(x) for Various Intervals P(a Z b) if a < 0 and b > 0 7. b a = (0.5 – P(Z<a)) + (0.5 – P(Z>b)) = 1 – P(Z<a) – P(Z>b) or = [0.5 – (table value for a)] + [0.5 – (table value for b)] = [1 – {(table value for a) + (table value for b)}] • With this set of building blocks, you should be able to calculate the probability for any interval using a standard normal table Confidence Interval & Probability • A confidence interval is expressed in terms of a range of values and a probability (e.g. my lectures are between 60 and 70 minutes long 95% of the time) • For this example, the confidence level that I used is the 95% level, which is the most commonly used confidence level • Other commonly selected confidence levels are 90% and 99%, and the choice of which confidence level to use when constructing an interval often depends on the application The Central Limit Theorem • Given a distribution with a mean μ and variance σ2, the sampling distribution of the mean approaches a normal distribution with a mean (μ) and a variance σ2/n as n, the sample size, increases • The amazing and counter- intuitive thing about the central limit theorem is that no matter what the shape of the original (parent) distribution, the sampling distribution of the mean approaches a normal distribution Confidence Intervals for the Mean • Generally, a (1- α)*100% confidence interval around the sample mean is: margin of Standard error error pr x z x z 1 n n • Where zα is the value taken from the z-table that is associated with a fraction α of the weight in the tails (and therefore α/2 is the area in each tail) Constructing a Confidence Interval • 1. Select our desired confidence level (1-α)*100% • 2. Calculate α and α/2 • 3. Look up the corresponding z-score in a standard normal table • 4. Multiply the z-score by the standard error to find the margin of error • 5. Find the interval by adding and subtracting this product from the mean t-distribution • The central limit theorem applies when the sample size is “large”, only then will the distribution of means possess a normal distribution • When the sample size is not “large”, the frequency distribution of the sample means has what is known as the t-distribution • t-distribution is symmetric, like the normal distribution, but has a slightly different shape • The t distribution has relatively more scores in its tails than does the normal distribution. It is therefore leptokurtic Assignment III • Probability, Discrete, and Continuous Distributions • Due: 03/07/2006 (Tuesday) • http://www.unc.edu/courses/2006spring/geog/090/001/www/