Download Distribution Terminology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
Advancing with Quantitative Data
Symmetrical
refers to data in which both sides
are (more or less) the same when
the graph is folded vertically down
the middle
bell-shaped is a special type
has a center mound with two
sloping tails
Uniform
refers to data in which every
class has equal or
approximately equal
frequency
Skewed (left or right)
refers to data in which one
side (tail) is longer than the
other side
the direction of skewness is
on the side of the longer tail
Skewness
The following picture is an example of what kind of
distribution?
a. right skewed
b. left- skewed
c. symmetric
Bimodal (multi-modal)
refers to data in which two
(or more) classes have the
largest frequency & are
separated by at least one
other class
Approximately Normal
 A phrase used to illustrate a
bell shaped curve (unimodal with minimal
skewness)
Outlier
 When examining data- Look for outliers!!!
 Outlier- observations that lie outside the overall
pattern of a distribution.
 FORMULA:
 An observation is considered an outlier it falls more
than 1.5 X IQR below Q1 or above Q3.
Mean
 Mean= (1/n)Σxi
 The mean is a measure of center in a distribution
 The mean is nonresistant to very large or very small
observations
Median
Steps to finding the Median of a distribution:
1. Re-arrange the observations from smallest to largest
2. If number of observations is odd, the median is the
center observation
3. If number of observations is even, the median is the
average of the two observations in the center
 The median is resistant to very large or very small
observations
Standard Deviation (s)
 s measures the spread about the mean and should only
be used when the mean is chosen as the measure of
center
 s=0 only when there is no spread. This happens only
when all observations are the same value. *as
observations become more spread out about their
mean, s gets larger.
 s is strongly influenced by outliers
Variance (s²)
 Variance- the average of the squares of the deviations
of the observations from their mean.
s² = 1
Σ (xi – x)²
n–1
Standard deviation is the square root of variance
Purpose of a graph
 to help understand the data
 Look for an overall pattern and for striking deviations from







the pattern
An outlier in any graph of data is an individual observation that
falls outside the overall pattern of the graph
Overall Pattern of a Distribution
To describe the overall pattern of a distribution:
Give the center and spread
See if the distribution has a simple shape that you can describe
in a few words.
Midpoint is the value with half the observations taking smaller
values and half taking larger values.
Spread is measured from the smallest to largest (ignoring
outliers).
Key Terms
 When describing a distribution, you should always
include at minimum:
 Center
 Shape
 Spread
Examples
 Which of the following numbers are outliers?
4 7 7 7 8 9 9 15
 Which of the following situations does standard
deviation equal zero?
a. When there is an outlier
b. When all the observations are less than zero.
c. When all the numbers are greater than zero.
d. When there is no spread (all numbers are the same
value)
Examples Continued:
• A numerical summary should report which of the
following?
a. center, spread, variablility
b. mean, median, mode
c. Standard deviation
d. IQR
 What is the relationship between standard deviation and
variance?
a. Standard deviation is the square root of variance.
b. Standard deviation is 2 times the variance
c. They are the same value
d. They are both equal to zero.
Examples Continued:
What strikes you as the most distinctive difference
among the distributions of exam scores in classes A, B,
&C?
The distribution of a set of data describes how the data is
spread out.
Two distributions can be compared using one of the three
averages and the range.
For example, the number of cars sold by two salesmen each
day for a week is shown below.
Matt
5
7
6
5
7
8
6
Jamie
3
6
4
8
12
9
8
Who is the better salesman?
Matt
5
7
6
5
7
8
6
Jamie
3
6
4
8
12
9
8
To decide which salesman is best let’s compare the mean
number cars sold by each one.
Matt:
44
5+7+6+5+7+8+6
=
Mean =
= 6.3 (to 1 d.p.)
7
7
Jamie:
3 + 6 + 4 + 8 + 12 + 9 + 8
50
Mean =
=
= 7.1 (to 1 d.p.)
7
7
This tells us that, on average, Jamie sold more cars each day.
Matt
5
7
6
5
7
8
6
Jamie
3
6
4
8
12
9
8
Now let’s compare the range for each salesman.
Matt:
Range = 8 – 5 = 3
Jamie:
Range = 12 – 3 = 9
The range for the number of cars sold each day is smaller for
Matt. This means that he is a more consistent or reliable
salesman.
We could argue that Jamie is better because he sells more
on average, or that Matt is better because he is more
consistent.