Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychological Science 342 Advanced Statistics Review of Basic Concepts Basic Terminology • Descriptive statistics Central tendency, variability Displaying data • Inferential statistics Populations and Samples Hypothesis testing t tests, ANOVA, Regression Measurement Basics Variables • Define variable Property of an object or event that can take on different values • Discrete variable Variable that can take on only a small set of possible values • Continuous variable Variable that can take on any value Cont. Variables--cont. • Independent variables Those variables controlled by the experimenter • Dependent variables Those variables being measured The data or score Random Assignment • Define Each P has an equal chance of being in any condition Equates groups Defines experimental (vs. correlational) procedure Independent/Predictor variable Random Sampling • Define Each member of a population has an equal chance of being included Generalizability Do psychologists use random sampling? Scales of Measurement • Definition • Nominal scales • Ordinal scales • Interval scales • Ratio scales Sample Problems • For each of the following identify the IV (s), DV(s), whether the variable is categorical or continuous, and level of measurement Sample Problems • 1. People will read a paragraph more quickly if it has a title than if it doesn’t have a title. • 2. People from collectivist cultures have lower self-esteem than people from individualist cultures, and the difference is larger for males than for females. Sample Problems • 3. The right hemisphere is more specialized (i.e., faster) than the left hemisphere for negative emotion words and the left hemisphere is more specialized than the right hemisphere for positive emotion words. Sample Problems • 4. When taking an exam, increasing levels of noise is associated with better performance for extraverts than for introverts. • 5. People will retain more information if a text is written in an ugly font than if it is written in a non-ugly font. Sample Problems • 6. People appear to be more outgoing on facebook than in real life. • 7. Reported well-being increases as a function of temperature (up to 80F) and whether or not it is sunny. Deciding on a Procedure • Decision tree • What types of variables? • How many groups or variables? Choosing a Procedure Type of Dependent Variable Categorical Continuous One Category Two Categories Goodness of Fit Chi-square Contingency Table Chi-square See next slide Cont. Choosing a Procedure Continuous DV Continuous IV One Predictor Degree of Relationship Form of Relationship Correlation Regression Categorical IV Two Predictors See next slide Multiple Regression Cont. Choosing a Procedure Categorical IV Two Groups Multiple Groups Independent Dependent Independent Groups Dependent Measures Two-sample t Related Sample t Anova Repeated measures Anova Notation • Variable names X and Y • Individual values Xi • X versus Xi • Summation notation X X2 (X)2 XY X Y Constants Hypothetical data on family size by decade of 20th century Decade (X) 3 Family Size(Y) 5.2 X2 9 Y2 27.04 X–Y -2.2 XY 15.6 4 4.8 16 23.04 -0.8 19.2 5 3.5 25 12.25 1.5 17.5 6 2.5 36 6.25 3.5 15.0 7 2.3 49 5.29 4.7 16.1 25 18.3 138 73.87 6.7 83.4 Displaying Data The Sternberg Example • One to five digits displayed • Followed by a single digit • Was single digit in first set? • Predictions of sequential processing • Predictions of parallel processing The Following is a Simple Demonstration. Click to Begin Click Mouse 4 7 3 6 9 4 Was the single digit in the comparison set? Plotting Data • Histograms Values of dependent variable on X axis • discuss grouping or “bins” Frequency on Y axis Histogram of Sternberg’s data Histogram of Reaction Time Stem-and-Leaf Display Stem-and-leaf of RxTime Leaf Unit = 1.0 7 27 62 103 150 150 96 57 35 22 13 9 6 3 1 1 1 1 1 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 N = 300 6788999 00001112223333344444 55555566666666666777777777888899999 00000111111111111222222222233333333444444 55555556666666666777777788888888888899999999999 000000000000111111111112222222222222233333333334444444 555555556666666677777777777777889999999 0111122222222333444444 5566667788899 000112333 5678 044 558 44 5 Scatterplots • Plot two variables against each other. • Points represent coordinates on each axis. • Dependent variable on Y axis. • See next slide for example Scatterplot of Solar Radiation and Cancer Describing Distributions • Symmetry • Modality Bimodal Unimodal • Skewness Positively skewed Negatively skewed Figure 3.9 16 20 14 12 10 8 10 6 50 5. 00 5. 50 4. 00 4. 50 3. 00 3. 50 2. 00 2. 50 1. 00 1. 0 .5 00 0. 0 -.5 0 .0 -1 0 .5 -1 0 .0 -2 83 2. 43 2. 03 2. 63 1. 23 1. 89 5. 57 5. 25 5. 94 4. 62 4. 30 4. 99 3. 67 3. 35 3. 04 3. 72 2. 40 2. 09 2. 77 1. 45 1. 39 4. 07 4. 75 3. 44 3. 12 3. 80 2. 49 2. 17 2. 85 1. 54 1. 22 1. 0 .9 9 .5 7 .2 5 -.0 Score Score 3 .8 3 .4 3 .0 7 -.3 7 -.7 7 .1 -1 7 .5 -1 8 .9 -1 8 .3 -2 8 .7 -2 30 40 N = 200.00 0 N = 200.00 0 Score Score N = 200.00 0 Mean = 1.54 2 Mean = -.01 0 N = 200.00 Std. Dev = 1.79 Std. Dev = 1.02 4 30 20 20 10 10 Std. Dev = .91 Std. Dev = .73 Mean = 4.85 Mean = .96 Measures of Central Tendency Mode • The most common value • There may be several • Bimodal distribution has two distinct modes. Median • Center value in an ordered series Average of two center values for an even number of points • Median location location of central value defined as (N + 1)/2 Mean • What we normally call the “average” • Denoted as “xbar” • Calculated as X X ΣX N • This will be our most common statistic Advantages & Disadvantages • Mean Most common statistic Easily manipulated algebraically Good statistical properties Easily influenced by extreme scores • Median Slightly less desirable statistical properties than mean May not be good to ignore extreme values Measures of Variability The General Problem • Central tendency only deals with the center • Dispersion Variability of the data around something The spread of the points • Example: Mice and Music Mice and Music • Study by David Merrell • Raised some mice in quiet environment • Raised some mice listening to Mozart • Raised other mice listening to Anthrax • Dependent variable is the time to run a straight alley maze after 4 weeks. Results • Anthrax mice took much longer to run • Much greater variability in Anthrax group See following graphs for Anthrax and Mozart Both X axes are 500 units wide • We often see greater variability with larger mean Mozart Group 12 10 8 6 4 2 Std. Dev = 36.10 Mean = 114.6 N = 24.00 0 27.8 83.3 138.9 194.4 250.0 305.6 361.1 416.7 472.2 WEEK4 Anthrax Group 10 8 6 4 2 Std. Dev = 103.14 Mean = 1825.9 N = 24.00 0 1600.0 1700.0 1650.0 WEEK4 1800.0 1750.0 1900.0 1850.0 2000.0 1950.0 2050.0 Range and Related Statistics • The range Distance from lowest to highest score Too heavily influenced by extremes • The interquartile range (IQR) Delete lowest and highest 25% of scores IQR is range of what remains May be too little influenced by extremes Trimmed Samples • Delete a fixed (usually small) percentage of extreme scores • Trimmed statistics are statistics computed on trimmed samples. Deviation Scores • Definition distance between a score and a measure of central tendency usually deviation around the mean (X X ) • Importance Variance • Definitional formula ( X X ) s N 1 2 • Example See next slide 2 Calculation X 2 4 5 8 7 4 30 (X X ) -3 -1 0 3 2 -1 0 ( X X )2 9 1 0 9 4 1 24 ( X X ) 24 s 4.80 N 1 5 2 2 Standard Deviation • Definitional formula The square root of the variance ( X X ) s s N 1 2 2 Computational Formula (X )2 2 2 2 2 2 2 30 2 X 2 4 5 8 7 4 2 N 6 s N 1 5 2 4.80 ( X ) 2 X N 4.8 2.19 s N 1 2 Estimators • Mean Unbiased estimate of population mean () • Define unbiased Long range average of statistic is equal to the parameter being estimated. • Variance ( X X ) 2 s N 1 2 Unbiased estimate of 2 Cont. Estimators--cont. Using • 2 ( X X ) s2 N gives biased estimate Standard deviation • use square root of unbiased estimate. Merrell’s Music Study SPSS Printout WEEK4 Treatment Mean N Std. Deviation Quiet 307.2319 23 71.8267 Mozart 114.5833 24 36.1017 Anthrax 1825.8889 24 103.1392 755.4601 71 777.9646 Total Boxplots • The general problem A display that shows dispersion for center and tails of distribution • Calculational steps (simple solution) Find median Find top and bottom 25% points (quartiles) eliminate top and bottom 2.5% (fences) Draw boxes to quartiles and whiskers to fences, with remaining points as outliers • Boxplots for comparing groups Combined Merrell Data 3000 2000 1000 0 -1000 N = 71 WEEK4 Merrell Data by Group 3000 2000 1000 WEEK4 0 -1000 N = 23 24 24 Quiet Mozart Anthrax Treatment Condition