Download Measures of Location and Spread Measures of Location

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Mean field particle methods wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
9/1/2016
Measures of Location & Spread
Summary Statistics: Measures of Location and Spread
• Illustrate where the majority of locations are found:
– e.g., means, medians, modes
• Illustrate how variable the data are:
– e.g., standard deviation, variance, standard error
Statistics versus Parameters
Measures of Location: Mean
• Statistics describe the sample
• Parameters describe the [unknown?] population
Arithmetic Mean
• Unbaised estimate of  if:
– Observations from random individuals
– Samples are independent of each other
– Observations drawn from a large population that can be described by a normal random variable
• Arithmetic Mean
– All observations weighted equally in calculation
Other Means
• Geometric Mean
– Example from exponential population growth: when numbers are multiplied on an arithmetic scale then can be added on a logarithmic scale ...
– So it depends how you use the ‘mean’
X~N(,)
1
9/1/2016
Median and Mode
• Median: the ‘middle’ observation (unless tied)
• Mode: the observations that occurs most frequently
Which measure of location?
• Arithmetic mean most common
– Supported by Central Limit Theorem
• Geometric mean most appropriate for multiplicative measures
• Median or Mode when distribution doesn’t match a standard probability distribution
• Pay attention to what measure is supplied and always be suspicious of any measure of location that is not accompanied by a measure of spread!
Measures of Spread
• Variance and Standard Deviation
Measures of Spread
• Variance
• Sum of Squares (SS):
• Unbiased estimate of 2
Degrees of Freedom
• The number of independent observations that we have for estimating statistical parameters
• ‘Usually’... n‐1 Measures of Spread
• Variance
• Standard Deviation
2
9/1/2016
Standard Error of the Mean
• Think of the standard error (or the mean) as an estimate of the standard deviation of the POPULATION MEAN
Skewness, Kurtosis, and Central Moments
• A central moment is the average of the deviations of all observations in a dataset from the mean of the observations, raised to a power r:
Standard Error of the Mean
• If inference is about the sample: provide SD (s)
• If the inference is about the means: provide the SE
Skewness, Kurtosis, and Central Moments
• r = 1 (1st moment) always 0
• r = 2 (2nd moment) is the variance
Skewness
• r = 3 (3rd moment) divided by s3 = skewness
Skewness
• g1 = 0  normal distribution
• g1 > 0  right‐skewed (longer
tail of observations to the right of the mean
• Skewness describes how the sample differs in shape from a symmetrical distribution
• g1 < 0  left‐skewed (longer
tail of observations to the left of the mean
3
9/1/2016
Skewness
Kurtosis
• Based on 4th central moment (r=4)
• Measures the extent to which the distribution is distributed in the tails versus the center of the distribution
Kurtosis
Kurtosis
• Clumped, or platykurtic distributions have g2 < 0 (less probability in the tails)
• Leptokurtic distributions have g2 > 0 (less probability in the center)
Skewness and Kurtosis
• Should be tested, but both measures are sensitive to outliers ...
Quantiles
• Box plots of quantiles can portray the distribution of data more accurately than plots of means and standard deviations
4
9/1/2016
Other Measures
• Coefficient of Variation (CV)
– Variability ‘independent of the mean’
• Coefficient of Dispersion
Distribution of Points
• For normally distributed random variables:
– 67% of observations occur within 1 SD of the mean
– 96% of observations occur within 2 SD of the mean
– For discrete variables ‘variance‐
to‐mean’ ratio
– Measure of clumping, but dependent on ‘scale’
Confidence Intervals
Confidence Intervals
Confidence Intervals
Confidence Intervals
5
9/1/2016
Confidence Intervals
Confidence Intervals
• Interpretation:
– 95% of the time such an interval will contain the true value of 
– NOT: “there is a 95% chance that the true 
occurs within the interval”  it either does or
does not ...
6