Download LocationVariation.s03

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Parameters and Statistics
A statistic is a descriptive measure
computed from a sample of data.
A parameter is a descriptive measure
computed from an entire population of data.
home
back
next
Measures of Central Tendency
- Arithmetic Mean -
The arithmetic mean of a set of data is
the sum of the data values divided by
the number of observations.
home
back
next
Sample Mean
If the data set is from a sample, then the
sample mean, , is:
X
n
X
x
i 1
n
i
x1  x2    xn

n
home
back
next
Population Mean
If the data set is from a population, then
the population mean,  , is:
N
x
x1  x2    xn


N
N
i 1
i
home
back
next
Measures of Central Tendency
- Median -
An ordered array is an arrangement of data
in either ascending or descending order.
Once the data are arranged in ascending
order, the median is the value such that 50%
of the observations are smaller and 50% of
the observations are larger.
home
back
next
Measures of Central Tendency
- Median -
If the sample size n is an odd number, the
median, Xm, is the middle observation. If the
sample size n is an even number, the
median, Xm, is the average of the two middle
observations. The median will be located in
the 0.50(n+1)th ordered position.
home
back
next
Measures of Central Tendency
- Mode -
The mode, if one exists, is the most
frequently occurring observation in
the sample or population.
home
back
next
Shape of the Distribution
The shape of the distribution is said
to be symmetric if the observations
are balanced, or evenly distributed,
about the mean. In a symmetric
distribution the mean and median are
equal.
home
back
next
Shape of the Distribution
A distribution is skewed if the observations are
not symmetrically distributed above and below
the mean. A positively skewed (or skewed to the
right) distribution has a tail that extends to the
right in the direction of positive values. A
negatively skewed (or skewed to the left)
distribution has a tail that extends to the left in
the direction of negative values.
home
back
next
Shapes of the Distribution
Frequency
Symmetric Distribution
10
9
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
Negatively Skewed Distribution
Positively Skewed Distribution
12
12
10
10
8
8
Frequency
Frequency
9
6
4
6
4
2
2
0
0
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
home
8
9
back
next
Measures of Variability
- The Range -
The range is in a set of data is
the difference between the
largest and smallest observations
home
back
next
Measures of Variability
- Sample Variance -
The sample variance, s2, is the sum of the squared
differences between each observation and the
sample mean divided by the sample size minus 1.
n
s 
2
 (x  X )
i 1
2
i
n 1
home
back
next
Measures of Variability
- Short-cut Formulas for s2
Short-cut formulas for the sample variance, s2, are:
( xi ) 2
xi 

n
2
i 1
s 
n 1
n
or
s2 
2
2
x

n
X
i
n 1
home
back
next
Measures of Variability
- Population Variance The population variance, 2, is the sum of the
squared differences between each observation and
the population mean divided by the population size,
N.
N
 
2
 (x  )
i 1
2
i
N
home
back
next
Measures of Variability
- Sample Standard Deviation -
The sample standard deviation, s, is the positive
square root of the variance, and is defined as:
n
s s 
2
 (x  X )
i 1
2
i
n 1
home
back
next
Measures of Variability
- Population Standard Deviation-
The population standard deviation, , is
N
  
2
 (x  )
i 1
2
i
N
home
back
next
The Empirical Rule
(the 68%, 95%, or almost all rule)
•
•
•
For a set of data with a mound-shaped histogram, the
Empirical Rule is:
approximately 68% of the observations are contained with a
distance of one standard deviation around the mean;  1
approximately 95% of the observations are contained with a
distance of 2 standard deviations around the mean;  2
almost all of the observations are contained with a distance
of three standard deviation around the mean;  3
home
back
next
Coefficient of Variation
The Coefficient of Variation, CV, is a measure
of relative dispersion that expresses the
standard deviation as a percentage of the
mean (provided the mean is positive).
The sample coefficient of variation is
s
CV  100
X
if X  0
home
back
next
Coefficient of Variation
The population coefficient of variation is

CV  100

if   0
home
back
next
Percentiles and Quartiles
Data must first be in ascending order.
Percentiles separate large ordered data sets
into 100ths. The Pth percentile is a number
such that P percent of all the observations are
at or below that number.
Quartiles are descriptive measures that
separate large ordered data sets into four
quarters.
home
back
next
Percentiles and Quartiles
The first quartile, Q1, is another name for the
25th percentile. The first quartile divides the
ordered data such that 25% of the observations
are at or below this value. Q1 is located in the
.25(n+1)st position when the data is in ascending
order. That is,
(n  1)
Q1 
ordered position
4
home
back
next
Percentiles and Quartiles
The third quartile, Q3, is another name for the
75th percentile. The first quartile divides the
ordered data such that 75% of the observations
are at or below this value. Q3 is located in the
.75(n+1)st position when the data is in
ascending order. That is,
3(n  1)
Q3 
ordered position
4
home
back
next
Interquartile Range
The Interquartile Range (IQR) measures
the spread in the middle 50% of the data;
that is the difference between the
observations at the 25th and the 75th
percentiles:
IQR  Q3  Q1
home
back
next
Related documents