Download Unit V (5.1 – 5.2): Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Descriptive Statistics
Essential Ideas for Chapter 14
Statistics and Data Analysis
The Process of Statistics
Step 1: Identify a Research Objective
• Researcher must determine question
he/she wants answered - question must be
detailed.
• Identify the group to be studied. This
group is called the population.
• An individual is a person or object that is a
member of the population being studied.
Variables are the characteristics of the
individuals within the population.
Qualitative variables allow for classification of
individuals based on some attribute or characteristic
(gender, eye color, etc.) Numbers can be qualitative if
used for identification purposes (phone number, SSN,
etc.)
Quantitative variables provide numerical
measures of individuals.
•discrete variables – values are derived from counting
•continuous variables – values are derived from
measuring
Stem-and-Leaf Plots (Discrete Data)
The stem of the graph will consist of
the leading digits. The leaf of the
graph will be the rightmost digit. The
choice of the stem depends upon the
class width desired.
Stem and Leaf Plot of Ages of Best
Actor, 1928 - 2002
2
3
4
5
6
7
9
011122334455556777888888899
000011111222333334455666777888999
11223556669
012
6
Note: This stem and leaf plot has a class width of 10
Split Stem and Leaf Plot of Ages of
Best Actor, 1928 - 2002
2
2
3
3
4
4
5
5
6
6
7
9
0111223344
55556777888888899
0000111112223333344
55666777888999
11223
556669
012
6
Note: This stem and leaf plot has a class width of 5
Bar Graphs
Bar graphs are used to summarize both qualitative and
quantitative data. Bar graphs are constructed by labeling
each category or class of data on a horizontal axis and
the frequency or relative frequency of the category on the
vertical axis. A rectangle of equal width is drawn for each
category whose height is equal to the category's
frequency or relative frequency. There should be gaps
between the bars when summarizing qualitative data.
Bar graphs summarizing quantitative data are called
histograms. There should be no gaps between the bars
in a histogram.
A frequency distribution lists the number
of occurrences for each category of data.
Constructing a Frequency Distribution and
Bar Graph for Qualitative Data - M&M
colors
Yellow
Orange
Brown
Green
Green
Blue
Brown
Red
Brown
Brown
Orange
Brown
Red
Brown
Red
Green
Brown
Red
Green
Yellow
Yellow
Red
Red
Brown
Orange
Yellow
Orange
Red
Orange
Blue
Brown
Red
Yellow
Brown
Red
Brown
Yellow
Yellow
Brown
Yellow
Yellow
Blue
Green
Yellow
Orange
Frequency Distribution
Constructing a Frequency Distribution and
Histogram for Discrete Quantitative Data
The following data represent the number of
available cars in a household based on a random
sample of 50 households. Construct a frequency
and relative frequency distribution.
3
4
1
3
2
0
2
1
3
3
1
2
3
2
2
2
2
2
1
1
1
1
4
2
2
1
2
1
2
2
1
2
2
0
1
2
0
1
3
1
0
2
2
2
3
2
4
2
2
5
Frequency Distribution
Number of Available
Cars
Tally
Frequency
0
IIII
4
1
IIIII IIIII III
13
2
IIIII IIIII IIIII
IIIII II
22
3
IIIII II
7
4
III
3
5
I
1
Constructing a Frequency Distribution and
Histogram for Continuous Quantitative
Data
Frequency distributions and histograms for
continuous data are constructed by
separating the data using intervals of
numbers called classes. Our Excel project
will demonstrate how to create a frequency
distribution and histogram for continuous
data.
Line Graphs
A graph that uses a broken line to illustrate how
one quantity changes with respect to another is
called a line graph.
Line graphs are often used to represent
changes over time. Time is plotted on the
horizontal axis and the corresponding values
of the variable on the vertical axis. Lines are
then drawn connecting the points.
Line Graph
Circle Graphs
A circle graph or pie chart is a circle
divided into sectors. Each sector
represents a category of data. The
area of each sector is proportional to
the frequency of the category.
Measures of Central Tendency
The mean of a data set is the sum of all the values of
the variable in the data set divided by the number of
observations. We use
to denote the sample mean.
x
The median of a data set is the value that lies in the
middle of the data when arranged in ascending order.
That is, half the data is below the median and half the
data is above the median. We use M to represent the
median.
The mode of a data set is the most frequent
observation of the variable that occurs in the data set.
If there is no observation that occurs with the most
frequency, we say the data has no mode.
Measures of Central Tendency
Find the mean, median, and mode of the data set:
{5, 3, 8, 5, 9}
Mean:
Median:
x
= (5 + 3 + 8 + 5 + 9)/5 = 30/5 = 6
1) Arrange the data in ascending order
3
5
5
8
9
2) Locate the middle value which is 5 so M = 5
Mode:
The most frequently occurring data value
is 5 (it occurs twice) so the mode = 5
Measures of Central Tendency
Find the mean, median, and mode of the data set:
{10, 5, 4, 7, 1, 9}
x
Mean:
= (10 + 5 + 4 + 7 + 1 + 9)/6 = 36/6 = 6
Median:
1) 1) Arrange the data in ascending order
1
4
5
7
9
10
2) Locate the middle value. Since the middle falls between
two data values (5 and 7), the median is the mean of these
two values. Thus, M = (5 + 7)/2 = 12/2 = 6
Mode: There is no mode
Measures of Dispersion
The range, R, of a variable is the difference between the
largest data value and the smallest data values. That is
Range = R = Largest Data Value – Smallest Data Value
The variance and standard deviation are measures that
use all the numbers in the data set to give information
about the dispersion. The steps for finding the sample
variance and standard deviation are given on page 233.
Find the range, variance, and standard
deviation for the data set: {5, 3, 8, 5, 9}
Range = 9 – 3 = 6
To find the variance, we must first find the mean. From a previous
slide, we found the mean
=6
x
Subtracting the mean from each data value, squaring the
difference, summing the squares, and dividing by one less than
the number of data values gives us the sample variance
Sample Variance = [(5 – 6)2 + (3 – 6)2 + (8 – 6)2 + (5 – 6)2 +
(9 – 6)2] ÷ 4 = 6
Taking the square root of the sample variance gives us the
sample standard deviation s = √6 = 2.449
Note: It is coincidental that the mean, range, and sample variance for this data set
are equal to 6
Measures of Position
The median divides the data into two equal
parts, with half the values above the median
and half the values below the median, so the
median is called a measure of position.
Percentiles divide the data into 100 equal
parts. For example, when a person takes the
SAT, the score is recorded as a percentile. A
person who scores in the 92nd percentile
means that the score was better than 92% of
those who took the SAT.
The most common percentiles are quartiles.
Quartiles divide data sets into fourths or four equal
parts.
The 1st quartile, denoted Q1, divides the bottom 25%
the data from the top 75%.
•The 3rd quartile divides the bottom 75% of the data
from the top 25% of the data
The Five-Number Summary
Min Q1 Median
Q3 Max
Steps for Drawing a Boxplot
Step 1: Draw vertical lines at Q1, M, and Q3. Enclose these vertical
lines in a box.
Step 2: Draw a line from Q1 to the smallest data value (minimum).
Draw a line from Q3 to the largest data value (maximum).