Download Wall Street Journal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Topics, Summer 2008
Day 1. Introduction
Day 2. Samples and populations
• Measures of central tendency and dispersion
• Evaluating differences between sample means to
estimate differences between populations – normal
distribution and t-test
Day 3. Evaluating relationships
• Scatterplots
• Correlation
Day 4. Regression and Analysis of Variance
Day 5. Logistic regression
Distributions for nominal variables
• Counts (i.e., frequency)
How many Xs do I have?
• Proportions (i.e., probability density)
How many Xs do I have out of the total number of
observations?
Example:
• How many of the clauses tagged in the Switchboard
portion of the Bresnan et al. (2007) dataset show the
PP realization of the recipient?
• What proportion of the Switchboard observations …?
Frequency, probability, odds
Frequency and expectation:
• Of the 17 students who received financial support to
attend the LSA Summer Meeting, how many do we
expect to be women?
• If 7 were women, is this deviation from the expected
value of 8.5 larger than we could expect by chance?
Evaluating frequency differences:
• Of the tagged clauses in the Switchboard portion of
the Bresnan et al. (2007) dataset, 79% show the PP
realization of the recipient.
• Is the proportion of PP realizations the same in the
Wall Street Journal portion of the dataset?
Distributions for ratio variables
• Raw counts of values not very useful
How many Xs are equal to n1?
How many Xs are more than n1 but less than n2?
• Proportions
What percentage of Xs such that n1 < x < n2?
• Histogram: X={x1, x2, …, xn}, breaks = {b1, b2, …, bm }
What percentage of Xs such that x ≤ b1 ?
What percentage of Xs such that b1 < x ≤ b2 ?
…
What percentage of Xs such that bm-1 < x ≤ bm ?
Summary measures
• Central tendency (expected value)
• mode
• median
• mean
• Dispersion (reliability of expectation)
• range
• inter-quartile range
• variance
• standard deviation
Descriptive vs inferential statistics
• descriptive statistics
• summary of your sample
• examples:
• calculate sample mean (written “x-bar”)
• calculate sample variance (s2)
• inferential statistics
• generalization from your sample to the population
from which your sample was drawn
• examples:
• use x-bar to estimate population mean ()
• use s2 to estimate population variance (2)
Distribution families
• Uniform distribution
Example:
Expected value for throw of one die
• Binomial distribution
Example:
Expected number of heads when n coins tossed
• Normal distribution
Example:
Expected total value for throw of n=many dice
Expected value for many variables that are the
cumulative result of many independent influences
Central Limit Theorem
• Because the mean value of a large random sample is
the cumulative result of many independent influences,
the distribution of mean values of large random
samples taken from a population will approximate a
normal curve whatever the shape of the population
distribution.
• Example:
• distribution of values in random throw of a die vs
distribution of mean values calculated for a set of
random throws of 10,000 dice
Hypothesis testing
• Null hypothesis (H0)
• examples:
• mean F4 for Detroit vowels is 3500
(written H0:  = 3500 Hz)
• mean F4 of Detroit men’s vowels is 3500
• mean F4 of men’s vowel is same as mean F4
of women’s vowels
• Alternative hypothesis
• examples (matching those above):
• mean F4 for Detroit vowels is not 3500
(written H0:  ≠ 3500 Hz)