Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Name: _______________________________ Period ______ HW: 2.1 #1-5, 8
2.2 #2, 4, 5
Measure of Central Tendency, Box Plots, Histograms
Measure of Spread: Range and Interquartile Range.
Measures of Center:
(1) Mean = Average
(2) Median = Middle Value
(3) Mode = The MOST often occurring data value (you can have two or more modes)
Example: Find the mean, median and mode of each data set.
A. {10, 5, 3, 5, 7, 7, 8, 7, 9, 2}
B. {5.3, 8.4, 5.3, 9.2, 10.6, 9.2}
Box-Plot (Box & Whisker Plot):
These plots show how data are distributed compared to the median.
Box Plots Summarize 5
Statistics
(1) Minimum = ______
(2) Quartile 1 (median of lower half) = ____
(3) Median = ______
(4) Quartile 2 (median of upper half) = ____
(5) Maximum = ______
Example E: Select the data set that matches each box plot
Data set a: {29,16,20,28,5,50,15}
Data set c: {21,12,33,44,26,15,36}
Data set b: {30,18,22,28,31,15,50}
Data set d: {48,41,35,12,15,19,26}
Example: Give the five number summary for each data set and then sketch a Box-Plot.
{10, 8, 6, 4, 2}
{0, 30, 45, 50, 75, 80, 95}
Measure of Spread:
(1) Range = Maximum – Minimum
(2) Interquartile Range (IQR) = Quartile 3 – Quartile 1
Based on the median
(3) Standard Deviation (Based on the mean…we will talk about this next class.)
The IQR is a BETTER measure of spread because it does not depend on
extreme values in the data (max or min). If there are outliers in the data set,
the IQR should be used to measure spread. Recall that outliers are pieces
of data that are either very large (Bill Gate’s House Price) or very small when
compared to other data in the sample.
Example: For each data set, find the median, the range, and the IQR
{18, 13, 15, 24, 20}
{356, 211, 867, 779, 101, 543}
Histograms
Percentile Rank is a value that represents the percent of the data values that are below a given
value.
Bar Graphs use columns to show how the “category” data are distributed between different
categories.
Histograms use columns to show how the “numerical” data are distributed between different
intervals. The width of the intervals is called the bin width.
Bar Graph (categorical data)
Histogram (numerical data)
Example: The following data represent the ages of family members attending a family reunion.
{9, 5, 25, 29, 40, 48, 63, 56, 3, 32, 38, 53, 79, 0, 85, 87, 12, 14, 32, 5, 54, 67, 78, 75}
(a) Find the percentile rank of the family member who is 14 year old.
(b) Draw a histogram for these data with 9 bins.
Data that is symmetric is balanced around the center (mean or median).
Skewed data are more spread out on one side of the center than the other side.
Right skewed distributions have most of the
data the left of the mean.
mode < Median < mean
Left skewed distributions have most of the
data the right of the mean.
Mean < Median < mode
Example: For each histogram, give the bin width and the number of values in the data set.
Then, draw vertical lines where the mode, median and mean would be (estimate).
Finally, decide if the distribution is skewed left, skewed right or symmetric.
Bin width =
n=
Bin width =
n=
Bin width =
n=
What bin is the:
What bin is the:
What bin is the:
Median
Median
Median
Mode
Mode
Mode
Mean
Mean
Mean
Is the distribution skewed right or
left or is it symmetric?
Is the distribution skewed right or
left or is it symmetric?
Is the distribution skewed right or
left or is it symmetric?
Example: Find each percentile rank.
(a) 460 out of 1000 students scored at least 30 points out of 0 on a standardized test
Find the percentil rank of a student who scored 30 points on the test.
(b) 76 out of 200 people living alone spend $800 a month or more on rent. Find the percentile
rank of a person who spends $800 a month on rent.