Download AP Statistics Part 1: Organizing Data: Looking for patterns and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
AP Statistics
Part 1: Organizing Data: Looking for patterns and departures from
patterns
Chapter 1: Exploring Data
1.1 Displaying distributions with graphs
Essential Question: What individuals do the data describe?
Statistics is the science of data. Any set of data contains information about some group
of individuals. Individuals are the objects described by the set of data. Individuals may be
people, often referred to as subjects, or individuals may be objects. A variable is any
characteristic of an individual. The variable can be quantitative, or numeric, or a variable
may be categorical.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1
Displaying distributions with graphs
Essential Question: What individuals do the data describe?
When collecting a set of data, the statistician must ask himself/herself the
following set of questions…
1. What individuals do the data describe and how many individuals appear in the data?
2. How many variables are there? What are the exact definitions of the variables? In
what units is each variable recorded?
3. What is the reason the data were gathered? Do we hope to answer specific questions?
Do we want to draw conclusions about individuals other than the ones that we have
data for?
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1
Displaying distributions with graphs
Essential Question: What individuals do the data describe?
Statistical tools and ideas can help you examine data in order to describe their
main features. This examination is called exploratory data analysis. There are
two basic strategies that help us to organize our exploration of a set of data.
• Examine each variable by itself. Then move on to study relationships among the
variables.
• Begin with a graph of the data. Then add numeric summaries of specific aspects of the
data.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1
Displaying distributions with graphs
Essential Question: What individuals do the data describe?
Variables: Categorical & Quantitative
Some variables, like gender, or job title place individuals into categoriesm while
others like income, GPA, and height take numeric values.
A categorical variable records which of several groups or categories an individual
belongs to. A quantitative variable take numerical values for which it makes sense to do
arithmetic operations such as adding or averaging. The distribution of the variable tells
us what values the variable takes and how often it takes these values. The pattern of
variation of a variable is its distribution. The distribution may be very close together,
spread out, or may have no pattern.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1
Displaying distributions with graphs
Essential Question: What individuals do the data describe?
Types of Graphs
• Dot plots
• Time Plots
• Income per year
• Histograms
• Good for showing a group of data
• Ideal when using the Mean & Standard Deviation as the measures of central tendency
• Stemplot
• Also good for showing groups of data, but also more specific
• Boxplot
• Ideal when using the 5 number summary to describe the data
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1
Displaying distributions with graphs
Essential Question: What individuals do the data describe?
Interpreting Histograms
When interpreting any distribution, always focus on center & spread
The center describes the approximate mean or median of the distribution & the spread initially
describes the range, or the difference between the largest and smallest data value.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1
Displaying distributions with graphs
Essential Question: What individuals do the data describe?
Look for any patterns or blatant deviations from a pattern. These
deviations are known as outliers because they lie outside the overall
pattern of the graph.
consideration should be given before announcing that an
observation is an outlier. There exists a specific rule for determining if
an observation is really an outlier. ∴ do not state that an observation is
an outlier unless you have mathematical proof!
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1 Displaying distributions with graphs
Essential Question: What individuals do the data describe?
Words used to describe the shape of a histogram
• Uniform Distribution
• Symmetric Distribution
• Bimodal Distribution
• Skewed Distribution
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1
Displaying distributions with graphs
Essential Question: What individuals do the data describe?
All data described by these types of graphs are univariate data, or single variable.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.1
Displaying distributions with graphs
Homework:
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Measuring the Center: The Mean
𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 1
𝑥=
=
𝑥𝑖
𝑛
𝑛
The measure of central tendency, or the one number that can describe a distribution, is usually
the mean or the median. If the mean is used as the measure of central tendency, then the spread
is described by the standard deviation.
𝑥𝑖 − 𝑥 2
𝑠=
𝑛−1
The units of mean and standard deviation is the same.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
The mean and standard deviation are known as a non-resistant
measures. Non-resistant measures are not immune to extreme values
like outliers. One outlier can greatly influence both the mean as well as
the standard deviation
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2
Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Measuring the Center: The Median 𝑀
To find the median of a distribution:
1. Arrange all observations in order of size, from smallest to largest
2. If the number of the observations 𝑛 is odd, the median 𝑀 is the center of observations in
the ordered list
3. If the number of observations 𝑛 is even, the median 𝑀 is the mean of the two center
observations in the ordered list.
If the median is used as the measure of central tendency, then the spread is
described by the five-number summary and the quartiles.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Unlike the mean & standard deviation, the median and five-number
summary are resistant measures. Extreme values or outliers have no
effect on the median. They do however affect the five-number summary
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Even though the median is resistant & the mean is not to
extreme/outliers, that does not imply that the median should always be
used as a measure of central tendency. For symmetric distributions, the
mean and median are roughly the same. In a skewed distribution, the
mean is “pulled” toward the skewness.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Measuring the Spread: The Quartiles 𝑄1 & 𝑄3
𝑄1 & 𝑄3 represent the median of the lower & upper half of the data
respectively. 𝑄2 is another name for the median, 𝑀
The interquartile range, or IQR, is the difference between the quartiles,
or 𝑄3 − 𝑄1 and represents the range of the middle 50% of the data
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Testing for Outliers
An observation is an outlier iff it lies beyond 1.5 𝑄3 − 𝑄1 from 𝑄1 or
𝑄3 or in other words…
• If 𝑥𝑖 < 𝑄1 − 1.5 𝑄3 − 𝑄1 then 𝑥𝑖 is an outlier
• If 𝑥𝑖 > 𝑄3 + 1.5 𝑄3 − 𝑄1 then 𝑥𝑖 is an outlier
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Boxplots/Modified Boxplots & the Five-Number Summary
The modified Box-Plot also displays the outliers, if they exist.
𝑀𝑖𝑛
𝑄1
𝑀
𝑄3
𝑀𝑎𝑥
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Measuring the Spread: The Standard Deviation
For many symmetric distributions, the spread is best described by
using the standard deviation, which essentially describes the mean
deviations from the mean for each observation. If the observed data
is normal or close to normal, the standard deviation is always used to
describe spread.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
The variance 𝑠 2 of a set of observations is the average of the squares of
the deviations of the observations from the mean.
𝑠2
=
𝑥1 −𝑥 2 + 𝑥2 −𝑥 2 +⋯+ 𝑥𝑛 −𝑥 2
𝑛−1
or
𝑠2
=
1
𝑛−1
The standard deviation, 𝑠 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝑛 − 1 is called the degrees of freedom
𝑥𝑖 − 𝑥
𝑥𝑖 −𝑥 2
𝑛−1
2
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
Properties of the Standard Deviation
•
𝑠 measures the spread about the mean and should be used only when the
mean is chosen as te measure of central tendency
• If 𝑠 = 0, then there is NO spread about the mean. This occurs iff all
observations are the same value. Otherwise, 𝑠 > 0. As 𝑠 increases, the
spread from the mean also increases.
• Like 𝑥, 𝑠 is strongly influenced by extreme values.
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Essential Question: What does it mean to be a resistant measure?
When should the median be used as the measure of central
tendency? When should the mean be used?
• With skewed distributions, it is best to use the median & the five-number summary
• If the distribution is somewhat symmetric, it is common to use the mean and
standard deviation
• Keep in mind that 𝑥 ≈ 𝑀 in a symmetric distribution
AP Statistics
Part 1: Organizing Data: Looking for patterns and
departures from patterns
Chapter 1: Exploring Data
1.2 Describing Distributions with Numbers
Homework: