Download lect1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Regression toward the mean wikipedia , lookup

German tank problem wikipedia , lookup

Linear regression wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
1
Week 1
Statistics Background
Measurement, testing
 Process of assigning numbers to objects so that specific properties of objects
are represented by properties of numbers
 Psychological measurement concerned with attributes of persons (not person
him/herself)
 Method of expressing individual differences (IDs) in a quantitative form
 Statistics allow for answering questions about meaning of numbers
o Description of test scores
o Make inferences about meaning of scores
Scales of Measurement




Numbers have different meanings depending on type of data collected
Meanings based on scales of measurement
Statistical procedure depends on level of measurement
Ordinal level (vs. interval) in scale development
1. Nominal
 Most rudimentary
 Numbers classify, identify, code persons
 Sue belongs to Group 1; Megan and Kate are in Group 2
2. Ordinal




Numbers demonstrate rank orders of individuals
Sufficient for decision-making purposes
No indication of absolute levels
Sue - Megan - Kate
3. Interval
 Differences between numbers are equivalent
 Absolute values
 No fixed zero point
2
 Difference in ability between Sue and Megan equals difference
in ability between Megan and Kate
4. Ratio




Most informative
Absolute zero point
Ratios between numbers are equivalent
Megan is twice as agile as Sue
Probability and Statistical Significance
 Science is conservative
 Alpha level (α) corresponds to p-value (or p-level) for statistical test
o Stands for probability level
o Finding from sample would occur by chance alone 5 times out of 100
(α = .05)
o Usual levels used in social sciences are .05 or .01 (sometimes .10)
Sampling Distributions





Sample vs. sampling distributions
Theoretical distributions used in making statistical decisions
Shape based on sample size and number of variables in analysis
Used to set α level for statistical tests and decide whether or not to reject H0
As N increases, the critical value decreases (i.e., significant difference easier
to find)
The Normal Distribution (μ = 0, σ = 1)
 Individual difference normally distributed within population
 Asymmetrical, bell-shaped curve
 Helps determine if value is extreme or not
Central Tendency, Variability
 Average (mean, or M) score provides good summary but does not describe
individual differences
 Variance, standard deviation (SD, or S) measures extent to which individuals
differ
3
 Z-score
o Indicates – in SD units – how far score is from mean (standard normal
distribution)
o Provides direct and easily interpretable descriptions of scores
 Formulas
Correlation




Describes direction and magnitude of linear relationship between two scores
Data into two vectors of numbers, X and Y
Scatterplot
Pearson product-moment correlation coefficient (r)
o Ranges from -1 to +1
o The more the pairs of values vary together, the stronger the
relationship and the farther from 0 r will be
o Significance
o N ≈ 20 usually sufficient to draw conclusions about sample data if
moderate relationship exists
 Depends on strength of relationship
 Caution needs to be exercised when interpreting results
 Science is conservative
o Formulas
 Influences
o Restriction of range, i.e., r decreases (attenuation)
o Combining groups, i.e., relationship between variables varies across
groups
 Alternative measures of association
o Both interval/ratio: Pearson coefficient
o One nominal (truly dichotomized; i.e., item score): Point-biserial
(same as Pearson’s)
o Both nominal: Phi
o Both ordinal: Spearman
Linear Regression - Prediction
 One variable, X, is dependent (criterion) on the other(s), Y (predictor(s))
 Regression line is mathematical function that relates predictor(s) to criterion
o Minimizes the squared distances of each point from line
o Formulas
4









Predicted score on Y
Intercept, a, adjusts for different scales for X and Y
Slope, b, i.e., expected change in Y per unit change in X
Score on predictor, X
Unstandardized vs. standardized (≈ correlation) coefficient
values + significance
R-value is multiple correlation between predictor(s) & criterion
R-square & adjusted (i.e., shrunk) R-square values + significance
Proportion of variance in Y accounted for by predictor(s)
Standard error of estimate associated with predicting scores on criterion (see
also residuals)
F-value = (t-value)2
Simple vs. multiple regression
Hierarchical vs. simultaneous multiple regression