Download Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Study of Measures of Dispersion and Position
DATA
QUALITATIVE
QUANTITATIVE
DISCRETE
CONTINUOUS
Cannot be given a numerical value
Examples:
Gender, nationality, television show preference
Can be given and analyzed as numerical
values
Examples: test scores, weights of objects,
hours studied



A type of data is discrete if there are only a finite number of values
possible or if there is a space on the number line between each 2
possible values.
Example: A 5 question quiz is given in a Math class. The number of
correct answers on a student's quiz is an example of discrete data. The
number of correct answers would have to be one of the following : 0, 1,
2, 3, 4, or 5. There are not an infinite number of values, therefore this
data is discrete. Also, if we were to draw a number line and place each
possible value on it, we would see a space between each pair of values.
Example. In order to obtain a driver’s license a person must pass a
written exam. How many times it would take a person to pass this test
is also an example of discrete data. A person could take it once, or
twice, or 3 times, or… . So, the possible values are 1, 2, 3, … . There are
infinitely many possible values, but if we were to put them on a number
line, we would see a space between each pair of values.





Continuous data makes up the rest of numerical data. This is a
type of data that is usually associated with some sort of physical
measurement.
Ex. The height of individuals is an example of continuous data. Is
it possible for a person to be 5'2" tall? Sure. How about 5'2.5" ?
How about 5'2.525“? The possibilities depends upon the
accuracy of our measuring device.
One general way to tell if data is continuous is to ask yourself if
it is possible for the data to take on values that are fractions or
decimals. If your answer is yes, this is usually continuous data.
Ex. The length of time a battery works is an example of
continuous data. Could it work 200 hours? How about
200.7? 200.7354?


Many continuous variables have data
distributions that are bell-shaped
Ex: heights of adults, body temperature of
animals, cholesterol levels of adults


Ex: data collected
a) height of 100 women
b) Increase the sample size and decrease the intervals
c) Continue to increase and decrease
d) Normal distribution for the entire population
A normal
distribution
is
symmetric
about the
mean.
 What
are some other
univariate data (data with one
variable) that can be modeled
using a normal distribution?
-when the majority of the
values fall to the right of the
mean
-the mean is to the left of the
median, and the mean and the
median are to the left of the
mode
NEGATIVELY SKEWED
-When the majority of the data
values fall to the left of the
mean
-The mean falls to the right of
the median, and both the mean
and the median fall to the right
of the mode
POSITIVELY SKEWED
Visually, what indicates the direction of the skewness?
Skewness is the degree of departure from symmetry of a distribution. A
positively skewed distribution has a "tail" which is pulled in the positive
direction. A negatively skewed distribution has a "tail" which is pulled in
the negative direction.
Kurtosis is the degree of peakedness of a
distribution. A normal distribution is a
mesokurtic distribution. A pure leptokurtic
distribution has a higher peak than the normal
distribution and has heavier tails. A pure
platykurtic distribution has a lower peak than
a normal distribution and lighter tails.
Representing and Interpreting Data
Data can be represented by measures
of central tendency and measures of
variation, such as range, quartiles
and the interquartile range.
Measures of variation show the spread of the
data. Quartiles and the interquartile range
describe the spread in the middle half of the
data. Mean absolute deviation, variance and
standard deviation describe the spread of the
data around the mean. Two sets of data may
have the same range and mean, but the spread
of the data can be very different.





Summation Notation
µ = arithmetic mean of a population
= arithmetic mean of a sample
= Variance
 = standard deviation
Statistical Notations
Mean Absolute Deviation (MAD)
Definition: The average of the absolute
values of the differences between the
mean and each value in the data set.
(Determines the average distance from an
occurrence to the mean)
Step 1: Find the mean
Step 2: Find the sum of the absolute values of
the differences between each value in
the set of data and the mean.
Step 3: Divide by the number of values in the set.
Formula:
n
 x 
i 1
i
n
Calculating the Mean Absolute Deviation (MAD)
The top 10 finishing times (in
seconds) for runners in two men’s
races are as given. The times in a
100 meter dash are in set A and the
times in a 200 meter dash are in set
B.
Compare the spread of data
for the two sets using the
range and the mean absolute
deviation.
n
MAD =
 x 
i 1
i
n
Set A
Set B
10.62
21.37
10.94
21.40
10.94
11.23
10.92
22.23
11.05
22.34
11.13
22.34
11.15
22.36
11.28
22.60
11.29
22.66
11.32
22.73
Set A
10.62
10.94
10.94
10.92
n
 x 
i 1
i
n
11.05
11.13
11.15
11.28
11.29
11.32
Mean = ______
Set B
21.37
21.40
11.23
22.23
22.34
22.34
22.36
22.60
22.66
22.73
n
 x 
i 1
i
n
Mean = ______




For any bell-shaped curve, approximately:
68% of the values fall within one standard deviation of the mean in
either direction
95% of the values fall within 2 standard deviations of the mean in
either direction
99.7% of the values fall within 3 standard deviations of the mean in
either direction




to determine the spread of data. If the variance
or standard deviation is large, the data are more
dispersed. This information is useful in
comparing two (or more) data sets to determine
which is more (most) variable).
to determine the consistency of a variable. For
example, in the manufacture of fittings, such as
nuts, bolts, the variation in the diameters must
be small, or the parts will not fit together.
to determine the number of data values that fall
within a specified interval in a distribution.
The variance and standard deviation are used
quite often in inferential statistics.
___________________________
N
-The average of the squares of the distance
each value is from the mean.
-Calculates the area under the curve
-for this information to be useful, we need
to find the distance from the mean by
taking the square root… which gives us….
-Is the square root of the
variance
-“the average distance values fall
from the mean
-It measures the variability, by
summarizing how far individual
data values are from the mean.
SET Numbers
Mean
Standard
Deviation
1
100,100,100,100,100
100
0
2
90,90,100,110,110
100
10
Given the definition, what is the formula
for standard deviation:
n

x  
i 1
2
i
n
Standard Deviation of a
Population Data Set
Example: Renee surveyed his classmates to find out how many hours
of exercise each student did per week. Find the standard
deviation of the data set to the nearest tenth:
3, 10, 11, 10, 9, 11, 12, 8, 11, 8, 7, 12, 11, 11, 5
Example: Josie wants to see if she is charging enough for a babysitting
job. She charges $7.50 per hour. She surveyed her friends
to see what they are charging per hour.
The results are: $8, $8.50, $9, $7.50, $10, $8.25 and $8.75.
Determine the mean absolute deviation and use the result
to determine if Josie should change her babysitting rate.
Explain your reasoning.
Example: Mr. Martin keeps track of the amount of text messages his
son sends per month. He feels that his son should spend
less time texting and more time studying his algebra. As an
incentive to do more studying, Mr. Martin has agreed to
purchase a new computer for his son if he texts more than 1
standard deviation away from the mean. Use the current data
set to determine to determine the amount of texts
Mr. Martin’s son may send.
Month
Messages
October
985
November
1005
December
1100
January
950
February
1200
March
1010
Can measure in terms of actual data distance
units from the mean.
Measure in terms of standard deviation units
from the mean.
xi  

 z-score  standard measure
Why do that?
So we can compare elements from two
different data sets relative to the position
within their own data set.
Consider this problem…
 Amy scored a 31 on the mathematics
portion of her 2009 ACT® (µ=21 σ=5.3).
 Stephanie scored a 720 on the mathematics
portion of her 2009 SAT® (µ=515 σ=116.0).
Whose achievement was higher on the
mathematics portion of their national
achievement test?
Amy
Stephanie
1.89 vs. 1.77
What Does This Mean?