Download Rules of Data Dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Rules of Data Dispersion
• By using the mean and standard deviation, we
can find the percentage of total observations
that fall within the given interval about the
mean.
Rules of Data Dispersion
• Empirical Rule
• Chebyshev’s Theorem
(IMPORTANT TERM: AT LEAST)
Empirical Rule
Applicable for a symmetric bell shaped
distribution / normal distribution.
There are 3 rules:
i. 68% of the observations lie in the interval
(mean ±SD)
ii. 95% of the observations lie in the interval
(mean ±2SD)
iii. 99.7% of the observations lie in the interval
(mean ±3SD)
Empirical Rule
Empirical Rule
• Example: 95% of students at school are
between 1.1m and 1.7m tall. Assuming this
data is normally distributed can you calculate
the mean and standard deviation?
Empirical Rule
Empirical Rule
• The age distribution of a sample of 5000
persons is bell shaped with a mean of 40 yrs
and a standard deviation of 12 yrs. Determine
the approximate percentage of people who
are 16 to 64 yrs old.
Chebyshev’s Theorem
Chebyshev’s Theorem
• Applicable for any distribution /not normal
distribution
1
(1

) of the observations will be in
• At least
k
the range of k standard deviation from mean
where k is the positive number exceed 1 or
(k>1).
2
Chebyshev’s Theorem
• Example
Assuming that the weight of students in this
class are not normally distributed, find the
percentage of student that falls under 2SD.
Chebyshev’s Theorem
• Consider a distribution of test scores that are
badly skewed to the right, with a sample
mean of 80 and a sample standard deviation
of 5. If k=2, what is the percentage of the data
fall in the interval from mean?
Measures of Position
To describe the relative position of a certain
data value within the entire set of data.
•z scores
•Percentiles
•Quartiles
•Outliers
Quartiles
• Divide data sets into fourths or four equal
parts.
Boxplot
IQR  Q3  Q1
Lower Fence  Q1  1.5( IQR)
Upper Fence  Q3  1.5( IQR)
Boxplot
Outliers
• Extreme observations
• Can occur because of the error in
measurement of a variable, during data entry
or errors in sampling.
Outliers
Checking for outliers by using Quartiles
Step 1: Determine the first and third quartiles of data.
Step 2: Compute the interquartile range (IQR).
IQR  Q3  Q1
Step 3: Determine the fences. Fences serve as cutoff
points for determining outliers.
Lower Fence  Q1  1.5( IQR)
Upper Fence  Q3  1.5( IQR)
Step 4: If data value is less than the lower fence or
greater than the upper fence, considered outlier.
Related documents