Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 2, 2009 Previously on “let’s learn statistics in five weeks” • the logic of research – samples, populations, and variables • descriptive and inferential statistics – statistics and parameters • understanding experiments – experimental and correlational studies – independent and dependent variables • characteristics of scores – nominal, ordinal, interval, and ratio scales – continuous and discrete Which Scale? Does the variable have an intrinsic value? NO Nominal YES Does the variable have equal values between scores? NO YES Ordinal Does the variable have a real zero point? NO Interval YES Ratio Continuous • A continuous scale allows for fractional amounts – it ‘continues’ between the whole-number amount – decimals make sense • Examples: – Height – Weight – IQ Discrete • In a discrete scale, only whole-number amounts can be measured – decimals do not make sense – usually, nominal and ordinal scales are discrete – some interval and ratio variables are also discrete • number of children in a family • Special type of discrete variable: dichotomous – only two amounts or categories – pass/fail; living/dead; male/female Today…. • • • • Why graphical representations of data? Stem and leaf plots. Box plots. Frequency – what is it – how a frequency distribution is created • Graphing frequency distributions – bar graphs, histograms, polygons • Types of distribution – normal, skewed, bimodal • Relative frequency and the normal curve – percentiles, area under the normal curve “… look at the data” (Robert Bolles, 1998) • Raw data is often messy, overwhelming, and un-interpretable. • Many data sets can have thousands of measurements and hundreds of variables. • Graphical representations of data can make data interpretable • Looking at the data can inspire ideas. What in the world could these data mean? Imagine over 30,000 observations Time Lat 930485:23:06.8600001 930497:04:34.77 930497:04:59.7599998 930497:05:46.7600002 930497:06:05.7600002 930497:06:16.7600002 930497:06:28.7599998 930497:09:31.77 930497:09:58.77 930497:10:07.77 930497:10:37.77 930497:11:38.77 Long 32.20497 32.20482 32.20487 32.20485 32.20578 32.20678 32.20698 32.20687 32.2055 32.20555 32.20687 32.20672 -111.028 -111.028 -111.028 -111.029 -111.029 -111.029 -111.028 -110.999 -110.993 -110.992 -110.986 -110.979 After plotting those data •By plotting the data and superimposing it on map data, suddenly the previous slide’s data can tell a story •Of course not all data can tell such a story • People have developed various ways to visualize their data graphically Stem and Leaf Plots 5|46799 5 6|34688 5 7|2256 4 8|148 3 9| 0 10 | 6 1 N = 18 •data - 54, 56, 57, 59, 59, 63, 64, 66, 68, 72 … •preserves the data in tact. is a way to see the distribution •numbers on the left of the line are called the stems and represent the leading edge of each of the numbers •numbers on the right of the line are called the leaves and represent the individual numbers • indicate their value by completing the stem. Box Plots •Each of the lines in a box plot represents either quartiles or the range of the data. •In this particular plot the dots represent outliers. Frequency distributions - why? • Standard method for graphing data – easy way of visualizing group data • Introduction to the Normal Distribution – underlies all of the statistical tests we will be studying this semester – understanding the concepts behind statistical testing will make life a lot easier later on Frequency Frequency - some definitions • Raw scores are the scores we initially measure in a study • The number of times a score occurs in a set is the score’s frequency • A distribution is the general name for any organized set of data • A frequency distribution organizes the scores based on each score’s frequency • N is the total number of scores in the data Understanding Frequency Distributions • A frequency distribution table shows the number of times each score occurs in a set of data • The symbol for a score’s frequency is simply f • N = ∑f Raw Scores • The following is a data set of raw scores. We will use these raw scores to construct a frequency distribution table. 14 14 13 15 11 15 13 10 12 13 14 13 14 15 17 14 14 15 Frequency Distribution Table Frequency Distribution Table - Example • Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Frequency Distribution Table - Example • Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Value 7 Frequency 1 Frequency Distribution Table - Example • Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Value 7 6 Frequency 1 1 Frequency Distribution Table - Example • Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Value 7 6 5 Frequency 1 1 3 Frequency Distribution Table - Example • Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Value 7 6 5 4 Frequency 1 1 3 2 Frequency Distribution Table - Example • Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 X 7 6 5 4 f 1 1 3 2 Learning more about our data • What are the values for N and ∑X for the scores below? 14 14 13 15 11 15 13 10 12 13 14 13 14 15 17 14 14 15 Results via Frequency Distribution Table What is N? N = ∑f Results via Frequency Distribution Table What is ∑X? Results via Frequency Distribution Table What is ∑X? (17 * 1) = 17 (16 * 0) = 0 (15 * 4) = 60 (14 * 6) = 84 (13 * 4) = 52 (12 * 1) = 12 (11 * 1) = 11 (10 * 1) = 10 __________ Total = 246 Graphing Frequency Distributions Graphing Frequency Distributions • A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis Graphing Frequency Distributions • A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis • Why? – Because it’s not easy to make sense of this: Graphing Frequency Distributions • A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis • Why? – Because it’s not easy to make sense of this: • On a scale of 0-10, how excited are you about this class: __________ 0=absolutely dreading it 10=extremely excited/highlight of my semester • Data (raw scores) 5 7 2 3 5 5 5 8 7 7 4 5 10 7 5 4 5 5 7 3 6 2 6 3 5 5 7 2 4 6 3 7 5 5 7 3 5 6 5 5 8 6 7 5 3 5 7 2 3 5 4 5 4 8 3 6 5 5 5 1 2 4 7 5 5 4 3 3 7 5 8 6 3 5 10 0 6 6 3 8 5 4 3 2 4 6 3 7 5 5 7 5 7 5 10 7 5 4 5 5 7 6 3 8 1 5 5 6 4 9 8 5 8 5 7 5 10 7 5 4 5 5 7 4 8 4 5 8 5 5 7 5 5 5 2 4 6 3 7 5 2 4 6 3 7 5 8 6 3 5 10 0 6 7 2 8 8 5 5 8 6 3 6 2 6 3 5 5 7 2 5 10 7 5 4 5 5 7 5 7 5 10 7 5 4 5 5 5 7 2 3 3 7 5 8 6 3 5 10 0 6 Graphing Frequency Distributions f 4 7 35 40 33 43 11 11 3 6 4 50 40 Fre que ncy X 10 9 8 7 6 5 4 3 2 1 0 30 20 10 0 01 12 23 34 45 56 7 6 8 7 9 8 Excit ed ab o ut co urs e (0= no ,1 0= ye s ) 10 9 11 10 Graphing Frequency Distributions • A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis • The type of measurement scale (nominal, ordinal, interval, or ratio) determines whether we use: – a bar graph – a histogram – a frequency polygon Graphs - bar graph • A frequency bar graph is used for nominal and ordinal data Graphs - bar graph • A frequency bar graph is used for nominal and ordinal data Values on the x-axis Graphs - bar graph • A frequency bar graph is used for nominal and ordinal data Frequencies on the y-axis Graphs - bar graph • A frequency bar graph is used for nominal and ordinal data In a bar graph, bars do not touch Graphs - histogram • A histogram is used for a small range of different interval or ratio scores Graphs - histogram • A histogram is used for a small range of different interval or ratio scores Values on the x-axis Graphs - histogram • A histogram is used for a small range of different interval or ratio scores Frequencies on the y-axis Graphs - histogram • A histogram is used for a small range of different interval or ratio scores In a histogram, adjacent bars touch Graphs - frequency polygon • A frequency polygon is used for a large range of different scores Graphs - frequency polygon • A frequency polygon is used for a large range of different scores In a freq. polygon, there are many scores on the x-axis Constructing a Frequency Distribution • Step 1: make a frequency table • Step 2: put values along x-axis (bottom of page) • Step 3: put a scale of frequencies along y-axis (left edge of page) • Step 4 (bar graphs and histograms) – make a bar for each value • Step 4 (frequency polygons) – mark a point above each value with a height for the frequency of that value – connect the points with lines Graphing - example • A researcher observes driving behavior on a road, noting the gender of drivers, type of vehicle driven, and the speed at which they are traveling. Which type of graph should be used for each variable? • Gender? • nominal: bar graph • Vehicle Type? • nominal: bar graph • Speed? • ratio: frequency polygon Use and Misuse of Graphs -2 600 Number of Felonies 500 400 300 200 100 0 2000 2001 2002 Year 2003 Use and Misuse of Graphs 240 600 235 500 230 400 Number of Felonies Number of Felonies • Which graph is correct? 225 220 215 300 200 100 210 0 2000 2001 2002 Year 2003 2000 2001 2002 2003 Year • Neither does a very good job at summarizing the data • Beware of graphing tricks Types of Distributions Distributions • Frequency tables, bar-graphs, histograms and frequency polygons describe frequency distributions Distributions - Why? • Describing the shape of this frequency distribution is important for both descriptive and inferential statistics • The benefit of descriptive statistics is being able to understand a set of data without examining every score Distributions : The Normal Curve • It turns out that many, many variables have a distribution that looks the same. This has been called the ‘normal distribution’. • A bell-shaped curve • Symmetrical • Extreme scores have a low frequency – extreme scores: scores that are relatively far above or far below the middle score The Ideal Normal Curve The Ideal Normal Curve Symmetrical The Ideal Normal Curve Most scores in middle range The Ideal Normal Curve Few extreme scores The Ideal Normal Curve In theory, tails never reach the x-axis Normal Curve - height How ta ll a re you (in inche s )? 40 Fre qu e n cy 30 20 10 0 5 2.5 5 5 .0 5 7 .5 6 0 .0 6 2 .5 65 .0 6 7 .5 7 0 .0 He igh t (in ch e s ) 7 2 .5 7 5 .0 77 .5 8 0 .0 Normal Curve - hours slept 60 50 Fre que ncy 40 30 20 10 0 1 0 2 1 3 2 4 3 5 4 6 5 7 6 8 7 9 8 Ho urs o f Slee p la s t nig ht 10 9 11 10 12 13 11 12 Normal Curve - GPA 50 Fre que ncy 40 30 20 10 0 1 .7 5 2 .0 0 2 .2 5 2 .5 0 2 .7 5 3 .0 0 3 .2 5 3 .5 0 3 .7 5 4 .0 0 GPA 4 .2 5 4 .5 0 Normal Distributions • While the scores in the population may approximate a normal distribution, it is not necessarily so for a sample of scores How ta ll a re you (in inche s )? (N= 10 ) 3 .0 Fre qu e n cy 2 .0 1 .0 0 .0 6 1.5 6 2 .5 6 3 .5 6 4 .5 6 5 .5 66 .5 6 7 .5 6 8 .5 He igh t (in ch e s ) 6 9 .5 7 0 .5 71 .5 7 2 .5 Skewed Distributions • A skewed distribution is not symmetrical. It has only one pronounced tail • A distribution may be either negatively skewed or positively skewed • Negative or positive depends on whether the tail slopes towards or away from zero – the side with the longer tail describes the distribution • Tail on negative side : negatively skewed • Tail on positive side : positively skewed Negatively Skewed Distributions Negatively Skewed Distributions Tail on negative side: Negatively skewed Negatively Skewed Distributions Contains extreme low scores Negatively Skewed Distributions Does not contain extreme high scores Negatively Skewed Distributions Can occur due to a “ceiling effect” Positively Skewed Distributions Positively Skewed Distributions Tail on positive side: Positively skewed Positively Skewed Distributions Contains extreme high scores Positively Skewed Distributions Does not contain extreme low scores Positively Skewed Distributions Can occur due to a “floor effect” Positively Skewed Distributions 1 00 Fre que ncy 80 60 40 20 0 1 2 3 4 Ran k in Fa m ily 5 6 Bimodal Distributions • a symmetrical distribution containing two distinct humps Bimodal - birth month Wha t m onth we re you b orn? 25 Fre qu e n cy 20 15 10 5 0 Jan Feb Ma r Ap r Ma y Ju n Ju l Au g Mon th Bo rn Se p Oc t No v Dec Distributions - data • How many alcoholic drinks do you have per week? Distributions - data • How many alcoholic drinks do you have per week? 1 00 Fre que ncy 80 60 40 20 0 .5 2 .5 4 .5 6 .5 8 .5 1 0 .5 1 2 .5 1 4 .5 1 6 .5 1 8 .5 2 0 .5 2 2 .5 2 4 .5 Alco h olic d rin ks per we e k Distributions - data • How many alcoholic drinks do you have per week? 1 00 • Positively skewed Fre que ncy 80 60 40 20 0 .5 2 .5 4 .5 6 .5 8 .5 1 0 .5 1 2 .5 1 4 .5 1 6 .5 1 8 .5 2 0 .5 2 2 .5 2 4 .5 Alco h olic d rin ks per we e k Distributions - data • How much did you spend on textbooks for this semester? Distributions - data • How much did you spend on textbooks for this semester? 60 50 Fre que nc y 40 30 20 10 0 50 1 50 1 00 2 50 2 00 3 50 3 00 4 50 4 00 5 50 5 00 6 50 6 00 Sp e nt o n Tex t b o ok s ($ ) 7 50 7 00 8 50 8 00 9 00 Distributions - data • How much did you spend on textbooks for this semester? 60 50 • Normal – one outlier Fre que nc y 40 30 20 10 0 50 1 50 1 00 2 50 2 00 3 50 3 00 4 50 4 00 5 50 5 00 6 50 6 00 Sp e nt o n Tex t b o ok s ($ ) 7 50 7 00 8 50 8 00 9 00 Kurtosis • meso- Forming chiefly scientific terms with the sense ‘middle, intermediate’ • lepto- Small, fine, thin, delicate • platy- Forming nouns and adjectives, particularly in biology and anatomy, with the sense ‘broad, flat’ Relative Frequency and the Normal Curve Relative Frequency • Another way to organize scores is by relative frequency • Relative frequency is the proportion of time that a particular score occurs – remember: a proportion is a number between 0 and 1 • Simple frequency: the number of times a score occurs • Relative frequency: the proportion of times a score occurs Relative Frequency - Why? • We are still asking how often certain scores occurred • Sometimes, relative frequency is easier to interpret than simple frequency • Example: • 82 people in the class reported drinking no alcohol weekly – Simple frequency • 0.42 of the class (42%) reported drinking no alcohol – Relative frequency Relative Frequency • The formula for a score’s relative frequency is: relative frequency = f N Relative Frequency Distribution Example • Using the following data set, find the relative frequency of the score 12 14 14 13 15 11 15 13 10 12 13 14 13 14 15 17 14 14 15 Example • The frequency table for this set of data is: 14 14 13 15 11 15 13 10 12 13 14 13 14 15 17 14 14 15 Example • The frequency for the score of 12 is 1, and N = 18 • Therefore, the relative frequency of 12 is: Example • The frequency for the score of 12 is 1, and N = 18 • Therefore, the relative frequency of 12 is: f 1 relative frequency 0.06 N 18 Relative Frequencies • We can also add relative frequencies together. – For example , what proportion of people scored a passing mark in this exam (>3): Value Frequency Relative Frequency 6 5 5/18 = 0.28 5 6 6/18 = 0.33 4 3 3/18 = 0.17 3 2 2/18 = 0.11 2 1 1/18 = 0.06 1 1 1/18 = 0.06 N=18 Total=1.00 Relative Frequencies • We can also add relative frequencies together. – For example , what proportion of people scored a passing mark in this exam (>3): 0.28+0.33+0.17=0.78 Value Frequency Relative Frequency 6 5 5/18 = 0.28 5 6 6/18 = 0.33 4 3 3/18 = 0.17 3 2 2/18 = 0.11 2 1 1/18 = 0.06 1 1 1/18 = 0.06 N=18 Total=1.00 Relative Frequency and the Normal Curve • When the data are normally distributed (as most data are), we can use the normal curve directly to determine relative frequency. • There is a known proportion of scores above or below any point • For example, exactly 0.50 of the scores lie above the mean Relative Frequency and the Normal Curve • The proportion of the total area under the normal curve at certain scores corresponds to the relative frequency of those scores. Relative Frequency and the Normal Curve • Normal distribution showing the area under the curve to the left of selected scores Percentiles • A percentile is the percent of all scores in the data that are at or below a score – Example: 98th percentile - 98% of the scores lie below this. Homework • Complete exercises 1, 6, and 9 for chapter 3. • Read chapter 4 and 5 for next week.