Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Psyc 235: Introduction to Statistics Lecture Format • New Content/Conceptual Info • Questions & Work through problems What you should have accomplished so far… • • • • • ALEKS account set up completed first assessment Worked through first section of material Spent 5+ hours on ALEKS Watched the video “What is statistics?” Any questions/problems so far? From Last week: • Definition of Statistics… C O Collecting … Organizing … D I A Displaying … Interpreting … Analyzing … Data What is Data? • Data is the generic term for numerical information that has been obtained on a set of objects/individuals etc. • Variable: Some characteristic of the objects/individuals (e.g., height) • Data: the values of a variable for a certain set of objects/individuals Two branches of statistics: Descriptive Statistics Describes a given set of data you have. Inferential Statistics Given the data you have about these people, does this say anything about other people? Today: Descriptive Statistics • Graphical Presentations of Distributions Histograms Frequency Polygons Cumulative Distributions Box-and-whisker plots • Descriptive Measures of Data Measures of Central Tendency Measures of Dispersion Organizing Data • Data from last week • Frequency Table Time Awake 6:30-7:00 7:00-7:30 7:30-8:00 8:00-8:30 8:30-9:00 9:00-9:30 9:30-10:00 10:00-10:30 10:30-11:00 Number of Students 1 1 3 2 4 5 7 4 3 6:55 7 7:30 7:30 7:45 8 8:25 8:30 8:45 8:45 8:50 9 9 9 9:15 9:25 9:30 9:30 9:30 9:30 9:30 9:45 9:45 10 10 10:15 10:25 10:30 10:45 10:50 Histograms 8 Number of Students 7 6 5 4 3 2 1 0 6:307:00 7:007:30 7:308:00 8:008:30 8:309:00 9:009:30 9:30- 10:00- 10:3010:00 10:30 11:00 Wake-Up Time Note: Use Histogram to note patterns in data. (Skew, etc.) Frequency Polygon 6:30-7:00 Number of Students 1 Frequency 0.25 0.0333 7:00-7:30 1 0.0333 7:30-8:00 3 0.1 8:00-8:30 2 0.0667 8:30-9:00 4 0.1333 9:00-9:30 5 0.1667 9:30-10:00 7 0.2333 10:00-10:30 4 0.1333 10:30-11:00 3 0.1 30 1 Total Proportion of Students Time Awake 0.2 0.15 0.1 0.05 0 6:307:00 7:007:30 7:308:00 8:008:30 8:309:00 9:009:30 Time Awake 9:30- 10:00- 10:3010:00 10:30 11:00 Cumulative Frequency Time Awake 6:30-7:00 7:00-7:30 7:30-8:00 8:00-8:30 8:30-9:00 9:00-9:30 9:30-10:00 10:00-10:30 10:30-11:00 Total Number of Students 1 1 3 2 4 5 7 4 3 30 Frequency Cumulative 0.03333333 0.0333 0.03333333 0.0667 0.1 0.1667 0.06666667 0.2333 0.13333333 0.3667 0.16666667 0.5333 0.23333333 0.7667 0.13333333 0.9000 0.1 1.0000 1 1.2000 1.0000 0.8000 0.6000 0.4000 0.2000 0.0000 6:307:00 7:007:30 7:308:00 8:008:30 8:309:00 9:009:30 Time Awake 9:30- 10:00- 10:3010:00 10:30 11:00 Box and Whisker Plots • Graphical representation of the 4 quartiles, (e.g. data is split into 4 equally sized groups) • If there are an even number of observations, let the “top” be the top half, and let the “bottom” be the bottom half. • If there are an odd number of observations, let the “top” be everything above the median and the “bottom” be everything below the median. • The first quartile is the “median of the bottom”. The third quartile is the “median of the top”. Box-and-Whisker Example 6:55 7 7:30 7:30 7:45 8 8:25 8:30 8:45 8:45 8:50 9 9 9 9:15 9:25 9:30 9:30 9:30 9:30 9:30 9:45 9:45 10 10 10:15 10:25 10:30 10:45 10:50 Median: 9:20 1st Quartile: 8:30 3rd Quartile: 9:45 Again, Note the information you can obtain by looking at this graphical representation of the data Graphical Presentations of Data • Listed Data: All data available • Frequency Table: Data frequency for each cell is available • Histograms: Data frequency for each bin is available • Polygons: Data frequency for each bin is available • Box-and-whisker plots: Summary info and data range available Less And Less Information • Often: Just summarize key features of the distribution. Describing Distributions Summary Measures Summary Measures •• Measures Tendency MeasuresofofCentral Central Tendency “Average”, “Location”, “Center” “Average”, “Location”, “Center” of the distribution. of the distribution. • Measures of Dispersion • Measures of Dispersion “Spread”, “Variability” of the distribution. “Spread”, “Variability” of the distribution. Measures of Central Tendency • Mean • Median • Mode • May already be familiar with these concepts, but I want you to think of them in relation to describing data. Mode • Most frequent observation or observation class • There can be several distinct modes • “Best guess” in single shot guessing game 19 5 12 A B C D Mode (example data) 6:55 7 7:30 7:30 7:45 8 8:25 8:30 8:45 8:45 8:50 9 9 9 9:15 9:25 9:30 9:30 9:30 9:30 9:30 9:45 9:45 10 10 10:15 10:25 10:30 10:45 10:50 Mode? 9:30 Median • Any value M for which at least 50% of all observations are at or above M and at least 50% are at or below M. • Resistant measure of central tendency (not heavily influenced by extreme values) Calculating the Median Order all observations from smallest to largest. If the number of observations is odd, it is the “middle” object, namely the [(n+1)/2]th observation. For n = 61, it is the 31st If the number of observations is even then, to get a unique value, take the average of the (n/2)th and the (n/2 +1)th observation. For = 60, it is the average of the 30th and the 31st observation. Median (example data) 6:55 7 7:30 7:30 7:45 8 8:25 8:30 8:45 8:45 8:50 9 9 9 9:15 9:25 9:30 9:30 9:30 9:30 9:30 9:45 9:45 10 10 10:15 10:25 10:30 10:45 10:50 Since there are an even number of data points, Take the average of the middle two values. Mean • Sum up all observations (say, n many) and divide the total by n. • Extreme values strongly influence the mean • Mean as the center of the value in a distribution (center of gravity) Calculating the mean • Suppose that we collect n many observations • Let denote the individual X 1 , X 2 , X 3 ,..., X n observations. Mean Mean • Sum up all observations (say, n many) and divide the total by n. X 1 X 2 ... X n 1 X X 1 X 2 ... X n n n Mathematical Notation n X i 1 i X 1 X 2 ... X n X i Mean X X 1 X 2 ... X n 1 X 1 X 2 ... X n X n n i 1 Xi n n Mean (example data) 6:55 7 7:30 7:30 7:45 8 8:25 8:30 8:45 8:45 8:50 9 9 9 9:15 9:25 9:30 9:30 9:30 9:30 9:30 9:45 9:45 10 10 10:15 10:25 10:30 10:45 10:50 6.92 7 7.5 7.5 7.75 8 8.42 8.5 8.75 8.75 8.83 9 9 9 9.25 9.42 9.5 9.5 9.5 9.5 9.5 9.75 9.75 10 10 10.25 10.42 10.5 10.75 10.83 ∑X = 273.34 X = 273.34 / 30 = 9.11 Transform back into time scale: ≈ 9:06 A few notes about summation, and implications for calculation of the mean n a a ... a na n a na i 1 If all data has the same value, a, then the mean value is also a. 10 n 1 n a i 1 1 n na a because: 9 8 7 6 5 4 3 2 1 0 1 2 3 n a na i 1 Mean 4 5 Multiplying all values by a constant aX 1 aX 2 ... aX n a X 1 X 2 ... X n n aX i 1 n a X i i i 1 If we multiply each observation by 2, then we obtain a new distribution with a different shape n 1 n 2X i 1 i 2 n 1 n X i 1 i A multiplying constant affects the mean (and the “spread”) 1 2 3 4 5 6 7 8 9 10 7 8 9 10 2X 1 2 3 4 5 6 Adding a constant to all values ( X 1 a) ( X 2 a) ... ( X n a) ( X 1 X 2 ... X n ) na ( X i a) X i na i 1 i 1 n n If we add the constant 5 to each observation, then we obtain a new distribution that is shifted to the right by 5 units n 1 n (X i 1 i 1 2 3 4 1 2 3 4 5 6 7 8 9 10 5) n 1 1 n X i n n5 X 5 i 1 A shift affects the mean (but not the “spread”) 5 6 7 8 9 10 Combining two variables ( X 1 Y1 ) ( X 2 Y2 ) ... ( X n Yn ) ( X 1 X 2 ... X n ) (Y1 Y2 ... Yn ) ( X i Yi ) X i Yi i 1 i 1 i 1 n n n Adding two variables n n ( X i Yi ) X i Yi i 1 i 1 i 1 n 1 1 ( X i Yi ) n X i n Yi X Y i 1 i 1 i 1 n 1 n n n The mean of the sum of two variables is the sum of their means Measures of Dispersion • Population Standard Deviation • Sample Standard Deviation If we want to know how much the values vary around the mean…. We could calculate how much each value varies from the mean… X X X X X 1 2 X ... X n X i Because of the way we calculate the mean, this formula gives zero no matter what data you have! Population Standard Deviation • Variance Ss 2 X X X 2 X ...X n X n 1 2 1 2 2 • Standard Deviation Ss X X X 2 X ...X n X n 1 2 1 2 2 Sample Standard Deviation • Variance s 2 X X X 2 X ...X n X n 1 2 1 2 2 • Standard Deviation s X X X 2 X ...X n X n 1 2 1 2 2 There are n-1 “degrees of freedom” (If you know the mean and n-1 observations then you can figure out the n’th observation) Computational Formulas • Note that there are computational formulas for the standard deviation. • Look for them in ALEKS and write them down. • Remember you can bring notes to your assessments For Next Week… • • • • Keep working on ALEKS Finish the descriptive statistics section Watch the second video If you can, start probability section before Jason’s lecture next week. • Remember: Office Hours and Lab are always available for you.