Download Statistics - Analyzing Data by Using Tables and Graphs A bar graph

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Statistics: Analyzing Data by Using
Tables and Graphs
1.8; 1.9; 5.7 & 13.3
CCSS:
N-Q (1-3); S-ID 1
Mathematical Practice
1. Make sense of problems, and persevere in solving them.
2. Reason abstractly and quantitatively.
3. Construct viable arguments, and critique the reasoning
of others.
4. Model with mathematics.
5. Use appropriate tools strategically.
6. Attend to precision.
7. Look for, and make use of, structure.
8. Look for, and express regularity in, repeated reasoning.
Statistics- Definitions
A population is the collection of all the data that
could be observed in a statistical study.
A sample is a collection of data chosen from the
population of interest. It is some smaller portion of
the population.
An inference is a decision, estimate, prediction, or
generalization about a population based on
information contained in a sample from that
population.
Statistics- Examples
Population
All NCU students
enrolled during
summer 2004
All voters
in the 2004
election
Sample
500 NCU students
enrolled during
summer 2004
2500 voters
in the 2004
election
Inference
The mean time to
drive to NCU is
24 minutes
About 45%
of voters
favor Amanda.
SHAPES
• Skewed Right: Most of the data is concentrated
to the left of the graph (tail point to the right)
• Skewed Left Most of the data is concentrated to
the right of the graph (tail points to the left)
• Symmetric: The majority of the data is
concentrated in the center of the graph (shaped
like a bell)
Center and Spread
• Center: the value that divides the
observations so that about half have
smaller values
• Spread: the smallest and larges values
expressed in an interval
The Arithmetic Mean
• This is the most popular and useful
measure of central location
Sum of the observations
Mean =
Number of observations
 This is often called the average.
Useful Notation
x: lowercase letter x - represents any measurement in
a sample of data.
n: lowercase letter n – number of measurements in a
sample
x
∑: uppercase Greek letter sigma – represents sum
∑x: - add all the measurements in a sample.
x : – lowercase x with a bar over it – denotes the
sample mean
Measures of Center
n
1) Sample Mean: x 
x
i 1
n
i
where n is the sample size.
2) Sample Median:
First, put the data in order.
Then,
median =

the middle number for odd sample sizes
the average of the two middle values for
even sample sizes
The Arithmetic Mean
• Example 1
The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33,
14, 8, 0, 9, 22 hours. Find the mean time on the Internet.
x
10
 i 1 xi
10

0x1  7x2
 ...  22
x10
 11.0
10
The Median
• The Median of a set of observations is the value
that falls in the middle when the observations
are arranged in order of magnitude.
.
Comment
Suppose only 9 adults were sampled
Find the median of the time on the internet
(exclude, say, the longest time (33))
for the 10 adults of example 3.1
Even number of observations
0, 0, 5,
0, 7,
5, 8,
7, 8,
9, 12,
14,14,
22,22,
33 33
8.59,, 12,
Odd number of observations
0, 0, 5, 7, 8 9, 12, 14, 22
Examples – Time to Complete an Exam
A random sample of times, in minutes, to complete a statistics
exam yielded the following times. Compute the mean and
median for this data.
33, 29, 45, 60, 42, 19, 52, 38, 36
n
The mean is x 
x
33  29    36 354


 39.3 minutes
9
9
n
i 1
i
Recall, we must rank (sort) the data before finding the median.
19, 29, 33, 36, 38, 42, 45, 52, 60
Since there are 9 (odd) data points, the 5th point is the median.
The median is 38 minutes.
Examples – Miles Jogged Last Week
A random sample of 12 joggers were asked to keep track of
the distance they ran (in miles) over a week’s time.
Compute the mean and median for this data.
5.5, 7.2, 1.6, 22.0, 8.7, 2.8, 5.3, 3.4, 12.5, 18.6, 8.3, 6.6
n
x
x
i 1
n
i

5.5  7.2    6.6 102.5

 8.54 miles
12
12
Examples – Miles Jogged Last Week (Cont)
A random sample of 12 joggers were asked to keep track of
the distance they ran (in miles) over a week’s time.
Compute the mean and median for this data.
5.5, 7.2, 1.6, 22.0, 8.7, 2.8, 5.3, 3.4, 12.5, 18.6, 8.3, 6.6
Recall, we must rank (sort) the data before finding the median.
1.6, 2.8, 3.4, 5.3, 5.5, 6.6, 7.2, 8.3, 8.7, 12.5, 18.6, 22.0
Since there are 12 (even) data points, the median is the
average of the 6th and 7th points.
6.6  7.2
 6.9 The median is 6.9 miles.
2
Statistics - Analyzing Data by Using Tables and Graphs
A bar graph compares different categories of
numerical information, or data, by showing each
category as a bar whose length is related to the
frequency.
Bar graphs can also be used to display multiple
sets of data in different categories at the same
time.
Graphs with multiple sets of data always have a
key to denote which bars represent each set of
data.
Vocabulary
• Bar graph: compares different
categories of numerical information,
of data.
90
80
70
60
East
West
North
50
40
30
20
10
0
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Statistics - Analyzing Data by Using Tables and Graphs
Another type of graph used to display data is a
circle graph.
A circle graph compares parts of a set of data as a
percent of the whole set.
The percents in a circle graph should always have
a sum of 100%.
Circle graph: compares parts of a set of data
as a percent of the whole set.
National Traffic Survey
3%
Not sure
26% Same
63% Worse
Statistics - Analyzing Data by Using Tables and Graphs
Another type of graph used to display data is a
line graph.
Line graphs are useful when showing how a set of
data changes over time.
They can also be helpful when making
predictions.
Line graph: numerical data displayed
to show trends or changes over time.
Cable Television Systems,
1995-2000
Systems (in thousands)
11.2
11.0
10.8
10.6
10.4
10.2
‘95
‘96
‘97
Year
‘98
‘99
‘00
Statistics - Analyzing Data by Using Tables and Graphs
Type of Graph Bar graph
When to Use
To compare
different
categories of
data
Circle graph Line graph
To show data
as parts of a
whole set of
data
To show the
change in
data over
time
Frequency Chart
• A Frequency Chart is a table that breaks data down into equal
intervals and then counts the amount data in each interval.
• A Frequency Chart is often used to sort a list of data to
make a Histogram.
• Make a Frequency Chart to display the data below:
90, 85, 78, 55, 64, 94, 68, 83, 84, 71, 74, 75, 99, 52, 98, 84, 73, 96, 81, 58, 97, 75, 80, 78
Interval
Frequency
of Data
50-59 60-69 70-79 80-89 90-99
3
2
7
6
6
Don’t forget little
things…like labels
and equal intervals!
Creating a Histogram
Interval
50-59
60-69
70-79
80-89
90-99
Frequency
of Data
3
2
7
6
6
10
Math Test Scores
Frequency
8
6
4
2
50-59
60-69
70-79
80-89
Test Scores
90-99
100-109
Histograms vs. Bar Graphs
•
•
Many people confuse histograms with a bar graph.
A histogram looks very similar to a bar graph. There are two
big differences between a histogram and a bar graph.
1. A bar graph compares items in categories while a
histogram displays one category broken down into
intervals. For example:
– A bar graph would compare…the number of apples, to
the number of oranges, to the number of bananas at a
grocery store.
– A histogram would compare…the number of people
who eat 0-4 apples a week, to the number that eat 5-9,
to the number who eat 10-14.
Histograms vs. Bar Graphs
2. The bars on a histogram touch. The bars found on a bar
graph do not touch.
– Why do you think that the bars will touch on a
histogram?
– It will make intervals of data easier to compare.
20
Skewed to the left
Frequency
15
10
5
0
Data
25
Skewed to the right
Frequency
20
15
10
5
0
Data
60
Symmetric
50
Frequency
40
30
20
10
0
Data
Mean and Median Comparisons
•If the data is symmetric, the mean and the median are
approximately the same.
•If the data is skewed to the right, the mean is larger than the
median.
•If the data is skewed to the left, the mean is smaller than the
median.
9
20
10
8
10
6
Frequency
Frequency
7
5
4
3
5
2
1
0
0
-2
-1
0
1
Symmetric
mean = -0.0373
median = -0.0173
2
0
0
5
10
15
20
25
30
35
40
Positive Skew
mean = 10.71
median = 7.75
45
-10
-8
-6
-4
-2
0
2
4
6
Negative Skew
mean = 4.829
median = 6.629
8
10
Relationship among Mean, Median, and
Mode
• If a distribution is symmetrical, the
mean, median and mode coincide
• If a distribution is asymmetrical, and
skewed to the left or to the right, the
three measures differ.
A positively skewed distribution
(“skewed to the right”)
Mode Mean
Median
Standard Deviation
• The standard deviation of a set of observations is the square root of the
variance . Another measure of where a value x lies in a distribution is
its deviation from the mean
deviation from the mean = value – mean = x - x
Sample standard deviation : s 
s 
2

n
i 1
( xi  x )
n 1
2
s
2
Statistics - Analyzing Data by Using Tables and Graphs
Some ways a graph can be misleading:
• Numbers are omitted on an axis, but no break is
shown.
• The tick marks on an axis are not the same
distance apart or do not have the same-sized
intervals.
• The percents on a circle graph do not have a
sum of 100.
Misleading Histograms
• What does it the word “misleading” mean?
– Deceptive or intentionally create a false impression.
• Types of Misleading Histograms
– Combing Intervals: The amount of data in each interval
can make a histograms look different.
– Stretched Graphs: Graphs might be stretched vertically so
that data looks larger.
– Excluded Intervals: Intervals may be skipped on the x or
y-axis to make the data look smaller.
Investigating Scatter Plots
• Scatter plots are similar to line graphs in
that each graph uses the horizontal ( x )
axis and vertical ( y ) axis to plot data
points.
• Scatter plots are most often used to show
correlations or relationships among data.
Investigating Scatter Plots
Weight Loss Over Time
How shirts affect salary
250
500000
200
400000
Salary
Weight
150
Weight
100
300000
200000
100000
50
0
1
0
2
4
6
8
10
12
Da ys w orke d out pe r month
5
7
100
80
60
40
20
0
0
0.5
1
1.5
2
2.5
3
Time in hours
3.5
9
11
Shirts Owned
How Study Time Affects Grades
120
Overall grade
0
3
4
4.5
5
13
15
17
Investigating Scatter Plots
• Positive correlations occur when two
variables or values move in the same
direction.
– As the number of hours that you study
increases your overall class grade increases
Investigating Scatter Plots – Positive
Correlation
How Study Time Affects Grades
120
Overall grade
100
80
60
40
Study Time
Class Grade
0
55
0.5
61
1
67
1.5
73
2
81
2.5
89
3
91
3.5
93
4
95
4.5
97
20
0
0
0.5
1
1.5
2
2.5
3
Time in hours
3.5
4
4.5
5
Investigating Scatter Plots
• Negative Correlations occur when
variables move in opposite directions
– As the number of days per month that you
exercise increases your actual weight
decreases
Investigating Scatter Plots – Negative Correlation
Work out time
Weight Loss Over Time
250
200
Weight
150
Weight
100
50
0
0
2
4
6
8
10
Days w orked out per month
12
Weight
0
200
0.5
205
1
190
1.5
195
2
180
2.5
190
3
170
3.5
177
4
160
4.5
170
5
150
5.5
168
6
140
6.5
150
7
130
7.5
170
8
120
8.5
130
9
110
9.5
115
10
100
10.5
120
Investigating Scatter Plots – No Correlation
How does your wardrobe affect your
salary
number of shirts
owned
80
Salary
60
40
20
0
0
10
20
30
Number of shirts owned
40
50
salary
1
1
2
0
3
50
4
30
5
25
6
17
7
2
8
40
9
8
10
25
11
12
12
7
13
19
14
55
15
71
16
9
Line of Best Fit
• A line of best fit is a line that best represents the data on
a scatter plot.
• A line of best fit may also be called a trend line since it
shows us the trend of the data
– The line may pass through some of the points, none
of the points, or all of the points.
– The purpose of the line of best fit is to show the
overall trend or pattern in the data and to allow the
reader to make predictions about future trends in the
data.
Things to remember
• A scatter plot with a positive correlation
has X and Y values that rise together.
• A scatter plot with a negative correlation
has X values that rise as Y values
decrease
• A scatter plot with no correlation has no
visible relationship
• The line of best fit is the line that best
shows the trend of the data
Scatterplots
• Remember, when looking at scatterplots,
look for:
– Association (or direction)
– Form
– Strength
– Outliers
Strength
• Strength:
– At one extreme, the points
appear to follow a single
stream (whether straight,
curved, or bending all over
the place).
– At the other extreme, the
points appear as a vague
cloud with no discernable
trend or pattern.
–Note: the strength (r).
Form
• Form:
150
Standard Fare
– If there is a straight line
(linear) relationship, it will
appear as a cloud or
swarm of points stretched
out in a generally
consistent, straight form.
– If the relationship isn’t
straight, but curves, while
still increasing or
decreasing steadily, we
can often find ways to
make it more nearly
straight.
100
50
0
0
1000
2000
Distance
3000
2000 Presidential Election
(Outliers)