Download §8.2: Getting Your Data to Shape Up §8.3: Looking at Super Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
3/14/2013
Ways to Represent Data…
There are quite a few!
Data and Variation
Let’s look at a few that we have seen, along with
some that we have seen in previous years.
• Pie Charts (We will not cover these!)
• Venn Diagrams
• Raw Data
• Here are all the first quiz scores for the 200
students enrolled in Algebra I.
• Put them in order.
How’d
they
do?
(These were in day 4 packet.)
How’d
they
do?
1
3/14/2013
• Frequency Histogram
• Stem-and-Leaf Plot
How’d
they
do?
• Same data, different interval widths
How’d
they
do?
Measures of Central Tendency
• What is the “average” versus the average?
• Average can mean different things!
– MEAN: the average of an entire set of data
– MEDIAN: the data point in the middle when a data
set is ordered from lowest to highest
– MODE: the most common occurring data value(s)
How’d
they
do?
Variation
• 2000 Batting Averages
• Highest was 0.372
What do you see?
• 1920 Batting Averages
• Highest was over 0.400
and 2 players were in
the 0.380s
• 2000 Batting Averages
• Not much variation in
data
• 1920 Batting Averages
• More variation in data
2
3/14/2013
Measuring Variation
• Five-Number Summary
– Minimum Value
– Maximum Value
– Median Value of all data
– Median of Bottom Half of Data (1st quartile)
– Median of Top Half of Data (3rd quartile)
Measuring Variation
Box and Whisker Plots
• Here is a plot of the exam data from before.
• Dots are outliers (more than 1.5 times the
distance from Q1 to Q3).
• How’d they do?
Standard Deviation
• Calculate the Mean.
• Find out how far each value is from the mean.
• How far on average is each value from the
mean?
• This is called the deviation from the mean.
Don’t worry! You do NOT need to know this equation! I will show
you how to find this using the graphing calculator!
Look back at our data…
• The standard deviation of 1920 batting
averages is 0.050 and of 2000 batting averages
is 0.038. Smaller standard deviation implies
the data is more tightly grouped.
Look back at our data…
• The standard deviation of exam scores is
14.782. (The higher deviation is due to outliers and the
skew of the data. Outliers affect the mean as well.)
3
3/14/2013
Shapes of Graphs
• Graphs can be skewed one direction or the other.
• Graphs of batting averages and height were
symmetrical around the central value.
• Exam scores were not symmetrical since most
students scored higher. This is skewed to the left
(where the tail is). This is called a negative skew.
• A graph skewed to the right means the tail is on
the right side of the graph. This is called a
positive skew.
Housing Prices
• Skewed to the
right.
• Mean pulled in
direction of
skew relative
to median.
• Mean is
HIGHER than
median.
• Exam
scores
• Data is
skewed to
the left.
• Mean is
LOWER
than
median.
Example #3
• The following histogram shows the exam
scores for 30 students in a freshman
accounting class. Estimate the mean of these
scores. Is the standard deviation of these
scores likely to be closer to 12 or to 25?
Answer to Example #3
• The mean score is approximately 70 The
standard deviation is more likely to be closer
to 12 because about half of the scores are
within 10 of 70 and the other half are further
than 10 but less than 30 away therefore it
seems more likely that the standard deviation
would average out to close to 12 rather than
25.
4
3/14/2013
SAT Scores
The Bell Curve
• Most famous of the shapes is the bell-shaped
curve, aka normal curve, aka normal
distribution, aka Gaussian distribution.
• Appears often in nature and in mathematics.
• Lots of formulas to describe it and analyze it.
• Let’s look at some examples!
• What do you see?
• Bimodal distribution – often experienced on test scores.
Students who know what they are doing come exam time
and students who do NOT know.
Why should we expect bells?
• Around the mean, there should be an
expected amount of variation above and
below. The more the variation, the less likely
it is. Thus we have a cluster in the middle and
approximately the same in high and low ends.
5
3/14/2013
Normal Curves and Standard Deviation
• 68% of the data differ from the mean by less
than one standard deviation.
• 95% of the data differ from the mean by less
than two standard deviations.
• 99.7% of the data differ from the mean by less
than three standard deviations.
***You MUST memorize this chart!!!
Example #1
• All freshmen entering NHS have their heads
measured for the beanies they are required to
wear. One year the head circumference data
had a normal distribution with mean 55 cm
and standard deviation 1.7 cm. What
percentage of the students that year had a
head circumference between 53.3 cm and
56.7 cm? What percentage had circumference
above 58.4 cm?
Example #2
• The average high temperature in Anchorage,
Alaska, in January is 21ºF with a standard
deviation of 10º. The average high
temperature in Honolulu in January is 80ºF
with a standard deviation of 8º. In which
location would it be more unusual to have a
day in January with a high of 57ºF?
Answer to Example #1
• For data with a normal distribution, about 68% of the
values differ from the mean by less than one standard
deviation. The normally distributed head measurements
have mean 55 cm and standard deviation 1.7 cm, so heads
within one standard deviation of the mean will measure
between 55 - 1.7 = 53.3 cm and 55 + 1.7 = 56.7 cm. Thus
approximately 68% of the freshmen have head
circumferences between 53.3 and 56.7 cm. A head
measuring more than 58.4 cm is more than 3.4 cm, or two
standard deviations, above the mean. For the second
question, recall that approximately 95% of the values in a
normal distribution are within two standard deviations, so
only 5% lie above or below those limits. Thus, in this case,
roughly 5%/2 = 2.5% of the freshmen will have head
circumferences measuring more than 58.4 cm.
Answer to #2
• A January temperature of 57° would be more
unusual in Anchorage. This temperature is
within three standard deviations (3 * 8° = 24°)
of the mean (80°) in Honolulu but is outside
the range of three standard deviations (3 * 10°
= 30°) of the mean (21°) in Anchorage.
6