Download means - My LIUC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Quantitative Methods for
Economics, Finance and
Management
(A86050 – F86050)
Matteo Manera – [email protected]
Marzio Galeotti – [email protected]
1
This material is taken and adapted from
Guy Judge’s Introduction to Econometrics page
at the University of Portsmouth Business School
http://judgeg.myweb.port.ac.uk/ECONMET/
2
Statistical Review
Statistics are tools: Statistics + Economics =
Econometrics
Descriptive statistics: methods used to
summarize or describe observations
Measures of central tendency
Measures of variability
Measures of association
Inferential Statistics
Random Variables and Theoretical Distributions
Estimation
Hypothesis Testing
3
Measures of Central Tendency
Mean, µ
The average (balancing point in the
distribution)
Median, Md
The value of the middle point of the
ordered measurements
Mode, Mo
The most frequent value
Measures of Variability
Knowing the variability tells us how typical the
mean is of the scores as a set
If the variability is small then the mean is a
good representation of the scores
Measures of variability are statistics that
convey information about the spread of
variability of a set of data
Measures of Variability
Range
High score minus low score
Variance
The average of the squared deviations of
all the population measurements from the
population mean
Standard Deviation
The square root of the variance
Measures of Variability
Range
High score minus low score
Variance
The average of the squared deviations of
all the population measurements from the
population mean
Standard Deviation
The square root of the variance
Skewness
Skewed distributions are not symmetrical
about their center. Rather, they are lop-sided
with a longer tail on one side or the other.
The skew is the side with the longer tail
Left/Negative Skewed
Symmetric
Right/Positive Skewed
Relationships Among Mean,
Median and Mode
Correlation Coefficient
• A single number indicating the strength of
association between 2 variables.
– To what extent does the association resemble
a straight line?
• Pearson’s Product-Moment Coefficient r
– Most commonly used measure to indicate the
degree of linear relationship between two
variables
– Range +1 to –1
– Coefficients reveal both the magnitude and
direction of the relationship
Hours Studying
Grades
Grades
Grades
Patterns for Linear Correlation
Hours Watching TV
Hours Gardening
Copyright © 2000 - 2009 by Michael J. Miller. All rights reserved.
Problems with r
• Does not, by itself, demonstrate causation!!!
– Causation implies correlation, but correlation does
not imply causation
• Third (‘lurking’) variable problem
– Often difficult to uncover
• Directionality problem
– Does Variable A cause a change in Variable B or
does Variable B cause a change in Variable A?
• Unstable with small sample sizes
Copyright © 2000 - 2009 by Michael J. Miller. All rights reserved.
Inferential Statistics
• Random Variables and Theoretical
Distributions
• Estimation
• Hypothesis Testing
Copyright © 2000 - 2009 by Michael J. Miller. All rights reserved.
what is a distribution??
• describes the ‘shape’ of a batch of numbers
• the characteristics of a distribution can
sometimes be defined using a small number
of numeric descriptors called ‘parameters’
why??
• can serve as a basis for standardized
comparison of empirical distributions
• can help us estimate confidence intervals
for inferential statistics
• form a basis for more advanced statistical
methods
– ‘fit’ between observed distributions and certain
theoretical distributions is an assumption of
many statistical procedures
Normal (Gaussian) distribution
• “known” distributions (discrete vs
continuous)
• “known” distributions (Student T; Chisquare; Fisher’s F)
• Normal: continuous distribution
• Normal: tails stretch infinitely in both
directions
µ
180
168
156
144
132
120
σ
108
σ
96
84
72
60
48
36
24
12
0
1
2
3
4
5
6
7
8
9
10
11
12
• symmetric around the mean (µ)
• maximum height at µ
• standard deviation (σ) is at the point of
inflection
13
• a single normal curve exists for any
combination of µ, σ
– these are the parameters of the distribution and
define it completely
• a family of bell-shaped curves can be
defined for the same combination of µ, σ,
but only one is the normal curve
• lots of natural phenomena in the real world
approximate normal distributions—near
enough that we can make use of it as a
model
• e.g. height
• phenomena that emerge from a large
number of uncorrelated, random events will
usually approximate a normal distribution
• standard probability intervals (proportions
under the curve) are defined by multiples of
the standard deviation around the mean
• true of all normal curves, no matter what µ
or σ happens to be
• P(µ-σ <= µ <= µ+σ) = .683
• µ+/-1σ = .683
• µ+/-2σ = .955
• µ+/-3σ = .997
µ
180
168
156
144
132
120
σ
108
• 50% = µ+/-0.67σ
• 95% = µ+/-1.96σ
• 99% = µ+/-2.58σ
96
84
72
60
48
36
24
12
0
1
2
3
4
5
6
7
8
9
10
11
12
13
• the logic works backwards
• if µ+/-σ < > .68, the distribution is not
normal
z-scores (Standardization)
• standardizing values by re-expressing them
in units of the standard deviation
• measured away from the mean (where the
mean is adjusted to equal 0)
xi − x
Zi =
s
• z-scores = “standard normal deviates”
• converting number sets from a normal
distribution to z-scores:
presents data in a standard form that can be
easily compared to other distributions
mean = 0
standard deviation = 1
• z-scores often summarized in table form as
a CDF (cumulative density function)
• can use in various ways, including
determining how different proportions of a
batch are distributed “under the curve”
Neanderthal stature
• population of Neanderthal skeletons
• stature estimates appear to follow an
approximately normal distribution…
– mean = 163.7 cm
– sd = 5.79 cm
Quest. 1: what proportion of the
population is >165 cm?
• z-score = ?
• z-score = (165-163.7)/5.79 = .23 (+)
mean = 163.7 cm
sd = 5.79 cm
.48803 .48405 .48006 .47608
Quest. 1: what proportion of the
population is >165 cm?
• z-score = .23 (+)
• using Table C-2
– cdf(.23) = .40905
– 40.9%
Quest. 2: 98% of the population fall
below what height?
• Cdf(x)=.98
• can use either table
– Table C-1; look for .98
– Table C-2; look for .02
.48803 .48405 .48006 .47608
Quest. 2: 98% of the population fall
below what height?
• Cdf(x)=.98
• can use either table
– Table C-1; look for .98
– Table C-2; look for .02
– both give you a value of 2.05 for z
• solve z-score formula for x: xi
• x = 2.05*5.79+163.7 = 175.6cm
= Z iσ + x
“sample distribution of the mean”
• we don’t know the shape of the distribution
an underlying population
• it may not be normal
• we can still make use of some properties of
the normal distribution
• envision the distribution of means associated
with a large number of samples…
central limits theorem
• distribution of means derived from sets of
random samples taken from any population
will tend toward normality
• conformity to a normal distribution
increases with the size of samples
• these means will be distributed around the
mean of the population
Xx = µ
• we usually have one of these samples…
• we can’t know where it falls relative to the
population mean, but we can estimate odds
about how far it is likely to be…
• this depends on
– sample size
– an estimate of the population variance
• the smaller the sample and the more
dispersed the population, the more likely
that our sample is far from the population
mean
• this is reflected in the equation used to
calculate the variance of sample means:
s =
2
x
σ
2
n
• the standard deviation of sample means is the
standard error of the estimate of the mean:
se =
σ
1
σ
=σ
=
n
n
n
2
• you can use the standard error to calculate
a range that contains the population mean,
at a particular probability, and based on a
specific sample:
x ± Zα
s
n
(where Z might be 1.96 for .95 probability, for example)
example
• 50 arrow points
– mean length = 22.6 mm
– sd = 4.2 mm
•
•
•
•
σ
4.2
s =
=
= .594
n
50
standard error = ??
22.6 +/- 1.96*.594
22.6 +/- 1.16
95% probability that the population mean is
within the range 21.4 to 23.8
INEMET [U13783]
Related documents