Download 02 - summarizing distributions review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
02 - SUMMARIZING DISTRIBUTIONS REVIEW:
The word average is derived from the French for avarie, which refers to the money that shippers contributed to help compensate for
losses suffered by other shippers whose cargo did not arrive safely (they shared the losses).
Know the following terms/concepts:
Median: middle number of a data set (if two middle numbers exists, find the mean of those values)
Mean: sum of data set divided by the number of items
Mode: most frequently occurring number
Resistant Measure: numbers that are not affected by outliers (median and quartiles)
Range: difference between the largest and smallest values
Interquartile Range (IQR): Q3-Q1
Variance: determined by averaging the squared difference of all the values from the mean
Standard Deviation: the square root of the variance (the average distance of data to the mean)
Percentile: indicates what percentage of all values that fall below the value under consideration
z-score: the number of standard deviations a particular value is from the mean z = (x-μ)/σ
*If you solve the z-score formula for x, you can find a corresponding raw score.
Empirical Rule: (68-95-99.7) 68% of data falls within 1 standard deviation of the mean, 95% of data falls within 2 standard deviations,
99% of the data falls within 3 standard deviation. (Only for bell-shaped data)
To determine if a value from a data set is an outlier, multiple the IQR by 1.5. Add this value to Q 3 and subtract it from Q1. Any values
that fall beyond the cut-off points are considered outliers. Outliers can be shown on a modified boxplot, in which we do not include
them in the 5-number summary calculation but represent them with and “x” on the graph.
Changing Data:
Adding X to all of the data will change the mean, median, and range but not the standard deviation.
Multiply X to all of the data will change the mean, median, range, and standard deviation.
*If you are not sure how, create a data set and play with the numbers.
Multiple Choice Questions:
1) Which of the following are true statements?
I The range of the sample data set is never greater than the range of the population. Every in samp are in pop, Range = Max - Min
II The interquartile range is half the distance between the first quartile and the third quartile. IQR = Q3-Q1
III While the range is affected by outliers, the interquartile range is not. IQR is resistant
a) I only
b) II only
c) III only
d) I and II
e) I and III
2) Dieticians are concerned about sugar consumption in teenagers’ diets (a 12-ounce can of soda typically has 10 teaspoons of sugar).
In a random sample of 55 students, the number of teaspoons of sugar consumed for each student on a randomly selected day is
tabulated. Summary statistics are noted below:
Min = 10
Max = 60
First Quartile = 25
Third Quartile = 38
Median = 31
Mean = 31.4
n = 55
s = 11.6
Which of the following is a true statement?
a) None of the values are outliers.
b) The value 10 is an outlier, and there can be no others.
c) The value 60 is an outlier, and there can be no others.
d) Both 10 and 60 are outliers, and there may be others.
e) The value 60 is an outlier, and there may be others at the high end of the data set.
IQR = 38 – 25 = 13
Outlier = 13(1.5) = 19.5
Low end = 25 – 19.5 = 5.5 If x < 5.5 It’s an Outlier
High End = 38 + 19. 5 = 57.5 If x > 57.5 It’s an Outlier
Outliers: 60 (the Max)
Refer to the following five boxplots for the next three questions.
3) To which of the above boxplots does the following histogram correspond? Skew Left
a) A
b) B
c) C
d) D
e) E
4) To which of the above boxplots does the following histogram correspond? More in Middle (Box smaller than whisker)
a) A
b) B
c) C
d) D
e) E
5) To which of the above boxplots does the following histogram correspond? More on edges (Box larger than Whisker)
a) A
b) B
c) C
d) D
e) E
6) Below is a boxplot of yearly tuition and fees of all four year colleges and universities in a Western state. The low outlier is from a
private university that gives full scholarships to all accepted students, while the high outlier is from a private college catering to the very
rich.
Removing both outliers will effect what changes, if any, on the mean and median costs for this state’s four year institutions of higher
learning? High outlier is further from the rest of the data than the low outlier
a) Both the mean and median will be unchanged.
b) The median will be unchanged, but the mean will increase.
c) The median will be unchanged, but the mean will decrease.
d) The mean will be unchanged, but the median will increase.
e) Both the mean and the median will change.
7) Suppose the average score on a national test is 500 with a standard deviation of 100. If each score is increased by 25, what are the
new mean and standard deviation? Whole distribution is moved up 25, but not any more (or less) spread out
a) 500, 100
b) 500, 125
c) 525, 100
d) 525, 105
e) 525, 125
8) Suppose the average score on a national test is 500 with a standard deviation of 100. If each score is increased by 25%, what are
the new mean and standard deviation? The spread between each value increase by 25%, Therefore, the Standard Deviation increases
25%
a) 500, 100 b) 525, 100 c) 625, 100 d) 625, 105 e) 625, 125
9) Which of the following are true statements?
I If the sample has variance zero, the variance of the population is also zero. (All the values IN THE SAMPLE are the same. Not Pop)
II If the population has variance zero, the variance of the sample is also zero. (All the values are the same)
III If the sample has variance zero, the sample mean and the sample median are equal. (All the values IN THE SAMPLE are the same.)
a) I and II
b) I and III
c) II and III
d) I, II, and III
e) None of the above gives the complete set of true responses.
10) When there are multiple gaps and clusters, which of the following is the best choice to give an overall picture of the distribution?
a) Mean and standard deviation
d) Stemplot or histogram
b) Median and IQR
c) Boxplot w/ 5-number summary
e) None of the above are really helpful in showing gaps and clusters.
11) The 70 highest dams in the world have an average height of 206 meters with a standard deviation of 35 meters. The Hoover and
Grand Coulee dams have heights of 221 and 168 meters, respectively. The Russian dams, the Nurek and Charvak, have heights with
𝑥−𝜇
z-scores of 2.69 and -1.13, respectively. List the dams in order of ascending size. 𝑧 =
𝜎
Hoover: 221, Grand Coulee 168,
a) Charvak, Grand Coulee, Hoover, Nurek
d) Grand Coulee, Charvak, Nurek, Hoover
Nurek 300.15,
Charvak 166.45
b) Charvak, Grand Coulee, Nurek, Hoover
e) Grand Coulee, Hoover, Charvak, Nurek
c) Grand Coulee, Charvak, Hoover, Nurek
For the final two questions, use the graph to the right. The graph shows cumulative proportions plotted against grade point averages for
a large public high school.
12) What is the median grade point average?
50% = 2.4
a) 0.8
c) 2.4
e) 2.6
b) 2.0
d) 2.5
13) What is the interquartile range?
25% = 1.8,
a) 1.0
c) 2.4
e) 4.0
b) 1.8
d) 2.8
75% = 2.8
14) FRQ: The summary statistics for the number of inches of rainfall in Los Angeles for 117 years, beginning in 1877, are shown below.
a) Describe a procedure that uses these summary statistics to determine whether there are outliers.
b) Are there outliers in these data? _______
Justify your answer based on the procedure that you describe in part (a).
c) The news media reported that in a particular year, there were only 10 inches of rainfall. Use the information provided to comment on
this reported statement.
IQR = 19.250 – 9.680 = 9.57
Outlier= 1.5 * 9.57 = 14.355
High End = 33.605 (Max of 38.180 is an outlier)
Low End = -4.657
(Min is 4.850. No Low Outlier)
In 25% of the years there are less than 10 inches of rain. 10 inches is not that far below the expected
amount of rain. In addition, 10 inches in with in 1 standard deviation of the mean.
1) e
2) e
3) a
4) b
5) c
6) c
7) c
8) e
9) c (if the variance of the pop. is zero, all values are the same)
10) d
11) a
12) c
13) a
14) There is at least one outlier on the high side. Since Q1 = 9.68, more than 25% of the years had less than 10 inches of rain.
Hence, 10 inches of rain is not an unusual value. You could also mention the standard deviation as well (z = -0.732).