Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? Statistics is the science of data. Any set of data contains information about some group of individuals. Individuals are the objects described by the set of data. Individuals may be people, often referred to as subjects, or individuals may be objects. A variable is any characteristic of an individual. The variable can be quantitative, or numeric, or a variable may be categorical. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? When collecting a set of data, the statistician must ask himself/herself the following set of questions… 1. What individuals do the data describe and how many individuals appear in the data? 2. How many variables are there? What are the exact definitions of the variables? In what units is each variable recorded? 3. What is the reason the data were gathered? Do we hope to answer specific questions? Do we want to draw conclusions about individuals other than the ones that we have data for? AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? Statistical tools and ideas can help you examine data in order to describe their main features. This examination is called exploratory data analysis. There are two basic strategies that help us to organize our exploration of a set of data. • Examine each variable by itself. Then move on to study relationships among the variables. • Begin with a graph of the data. Then add numeric summaries of specific aspects of the data. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? Variables: Categorical & Quantitative Some variables, like gender, or job title place individuals into categoriesm while others like income, GPA, and height take numeric values. A categorical variable records which of several groups or categories an individual belongs to. A quantitative variable take numerical values for which it makes sense to do arithmetic operations such as adding or averaging. The distribution of the variable tells us what values the variable takes and how often it takes these values. The pattern of variation of a variable is its distribution. The distribution may be very close together, spread out, or may have no pattern. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? Types of Graphs • Dot plots • Time Plots • Income per year • Histograms • Good for showing a group of data • Ideal when using the Mean & Standard Deviation as the measures of central tendency • Stemplot • Also good for showing groups of data, but also more specific • Boxplot • Ideal when using the 5 number summary to describe the data AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? Interpreting Histograms When interpreting any distribution, always focus on center & spread The center describes the approximate mean or median of the distribution & the spread initially describes the range, or the difference between the largest and smallest data value. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? Look for any patterns or blatant deviations from a pattern. These deviations are known as outliers because they lie outside the overall pattern of the graph. consideration should be given before announcing that an observation is an outlier. There exists a specific rule for determining if an observation is really an outlier. ∴ do not state that an observation is an outlier unless you have mathematical proof! AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? Words used to describe the shape of a histogram • Uniform Distribution • Symmetric Distribution • Bimodal Distribution • Skewed Distribution AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Essential Question: What individuals do the data describe? All data described by these types of graphs are univariate data, or single variable. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.1 Displaying distributions with graphs Homework: AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Measuring the Center: The Mean 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 1 𝑥= = 𝑥𝑖 𝑛 𝑛 The measure of central tendency, or the one number that can describe a distribution, is usually the mean or the median. If the mean is used as the measure of central tendency, then the spread is described by the standard deviation. 𝑥𝑖 − 𝑥 2 𝑠= 𝑛−1 The units of mean and standard deviation is the same. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? The mean and standard deviation are known as a non-resistant measures. Non-resistant measures are not immune to extreme values like outliers. One outlier can greatly influence both the mean as well as the standard deviation AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Measuring the Center: The Median 𝑀 To find the median of a distribution: 1. Arrange all observations in order of size, from smallest to largest 2. If the number of the observations 𝑛 is odd, the median 𝑀 is the center of observations in the ordered list 3. If the number of observations 𝑛 is even, the median 𝑀 is the mean of the two center observations in the ordered list. If the median is used as the measure of central tendency, then the spread is described by the five-number summary and the quartiles. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Unlike the mean & standard deviation, the median and five-number summary are resistant measures. Extreme values or outliers have no effect on the median. They do however affect the five-number summary AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Even though the median is resistant & the mean is not to extreme/outliers, that does not imply that the median should always be used as a measure of central tendency. For symmetric distributions, the mean and median are roughly the same. In a skewed distribution, the mean is “pulled” toward the skewness. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Measuring the Spread: The Quartiles 𝑄1 & 𝑄3 𝑄1 & 𝑄3 represent the median of the lower & upper half of the data respectively. 𝑄2 is another name for the median, 𝑀 The interquartile range, or IQR, is the difference between the quartiles, or 𝑄3 − 𝑄1 and represents the range of the middle 50% of the data AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Testing for Outliers An observation is an outlier iff it lies beyond 1.5 𝑄3 − 𝑄1 from 𝑄1 or 𝑄3 or in other words… • If 𝑥𝑖 < 𝑄1 − 1.5 𝑄3 − 𝑄1 then 𝑥𝑖 is an outlier • If 𝑥𝑖 > 𝑄3 + 1.5 𝑄3 − 𝑄1 then 𝑥𝑖 is an outlier AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Boxplots/Modified Boxplots & the Five-Number Summary The modified Box-Plot also displays the outliers, if they exist. 𝑀𝑖𝑛 𝑄1 𝑀 𝑄3 𝑀𝑎𝑥 AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Measuring the Spread: The Standard Deviation For many symmetric distributions, the spread is best described by using the standard deviation, which essentially describes the mean deviations from the mean for each observation. If the observed data is normal or close to normal, the standard deviation is always used to describe spread. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? The variance 𝑠 2 of a set of observations is the average of the squares of the deviations of the observations from the mean. 𝑠2 = 𝑥1 −𝑥 2 + 𝑥2 −𝑥 2 +⋯+ 𝑥𝑛 −𝑥 2 𝑛−1 or 𝑠2 = 1 𝑛−1 The standard deviation, 𝑠 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑛 − 1 is called the degrees of freedom 𝑥𝑖 − 𝑥 𝑥𝑖 −𝑥 2 𝑛−1 2 AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? Properties of the Standard Deviation • 𝑠 measures the spread about the mean and should be used only when the mean is chosen as te measure of central tendency • If 𝑠 = 0, then there is NO spread about the mean. This occurs iff all observations are the same value. Otherwise, 𝑠 > 0. As 𝑠 increases, the spread from the mean also increases. • Like 𝑥, 𝑠 is strongly influenced by extreme values. AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Essential Question: What does it mean to be a resistant measure? When should the median be used as the measure of central tendency? When should the mean be used? • With skewed distributions, it is best to use the median & the five-number summary • If the distribution is somewhat symmetric, it is common to use the mean and standard deviation • Keep in mind that 𝑥 ≈ 𝑀 in a symmetric distribution AP Statistics Part 1: Organizing Data: Looking for patterns and departures from patterns Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Homework: