Download File - Mr. Valsa`s Math Page

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
1-4: Review of histograms, normal curve, empirical rule, central tendancy, spread
1-4
Vocabulary:
Frequency Histogram: A graph that displays the classes on the horizontal axis and the frequencies of the
classes on the vertical axis. Bars DO touch.
- Frequencies represented by bars
Relative-frequency Histogram: A graph that displays the classes on the horizontal axis and the relative
frequencies of the classes on the vertical axis. Bars DO touch.
- Relative-frequency Histograms are better since the vertical scale is always the same (0 to 1). These
types of histograms are easy to compare.
Measures of Center
Mode – Most common or popular; used for categorical or numerical data
Advantage: Easy to calculate, Can used for qualitative data to see most popular response/ choices
Disadvantage: Not aimed at finding the center; Not always relevant for quantitative data
Choose- When we want to know what shows up the most
Mean – The average of every value in the set of data; used for only quantitative data
Advantage: Useful when there are NO outliers, quick to calculate
Disadvantage: affected by outliers (large outlier skews the mean high; small outlier skews the mean low),
may not be representative; cannot determine outliers
Choose – When we want to know the middle taking into account how big and small numbers are.
Median – The middle number of a set of data; useful when there are outliers; only for quantitative data
Advantage: Takes all numbers into account; Can determine if numbers are above or below 50% of the
data; Not affected by outliers
Disadvantage: Tedious to
calculate; cannot determine
outliers
Choose – When we want to
know the middle and there may
be an outlier affecting the mean.
Measures of Spread
Range – Difference between the minimum & maximum
Advantage: How far apart the numbers are; Easy to compute
Disadvantage: What numbers are in the data set; only takes two numbers into account
IQR (Inner Quartile Range) – Difference between Q3 and Q1.
Advantage: Not affected by outliers; shows where the middle 50% of the data is
Disadvantage: Doesn’t take into account the first 25% of data or the last 25% of data
Standard Deviation – Measures how far, on average, numbers are from the mean; useful when the Mean is
the best measure of center
Advantage: Takes into account all observations
Disadvantage: Easily affected by observations
Example
- Select an appropriate measure of center. . .
a) A student takes four exams in a biology class. His grades are 88, 75, 95, and 100. Which measure of
center is the student likely to report?
b) The National Association of REALTORS publishes data on resale prices of U.S. homes. Which measure
of center is most appropriate for such resale prices?
c) The 2005 Boston Marathon had two categories of official finishers: male and female, of which there
were 10,894 and 6,655, respectively. Which measure of center should be used?
- Of the mean, median, and mode, the mode is the only measure of center that can be used for
QUALITATIVE DATA.
Practice
H) A retail store had total sales of $436, $650, $530, $500, $650, $489, and $423 last week. Which
measure of center would make the store’s sales last week appear the MOST PROFITABLE? Explain.
I) The heights of 5 starting players on the basketball team at Goose Neck High School are 75in, 74in, 73in,
70in, and 68in. Decide which measure of center best describes the data. Explain.
J) The monthly car payments for Brandon’s last 8 customers were $266, $285, $285, $285, $315, $325,
$344. Decide which measure of center best describes the data. Explain.
K) At the end of the winter, the seven sweaters left in stock at the GAP Outlet store were brown, orange,
green, orange, brown, orange, orange, and orange. Decide which measure of center is appropriate for this
data. Explain.
L) The profits at 5 schools that sold Gems & Jewels products were $318.22, $440.79, $607.16, $1,090.38,
$4,790.15. Decide whether the mean or median is the best measure of center for this data. Explain.
Rule of 3 Standard Deviations (Empirical Rule)
- Almost all of the observations in any data set lie within three standard deviations to either side of the
mean.
- A data set that varies a lot will have a large standard deviation, so 3 SD from the mean will be extensive
- A data set that varies very little will have a small standard deviation, so 3 SD from the mean will be
narrow.
If the distribution of data is bell-shaped, the Empirical Rule says that . . .
o 68% of observations lie within 1 standard deviation
o 95% of observations lie within 2 standard deviations
o 99.7% of our observations are within 3 standard deviations.
Normal Distribution – Bell shaped curve that is centered about the mean. The curve is approximately
symmetrical with its spread determined by its standard deviations.