Download REVIEW - CBSD.org

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
NAME: _______________________________
Ch. 1 & 2 review packet
MULTIPLE CHOICE:
1. Were the extinctions that occurred in the last ice age more frequent among species of animals with large
body sizes? A researcher gathers data on the average body mass (in kilograms) of all species known to have
existed at that time. These measurements are values of
(a) a categorical variable.
(b) a quantitative variable.
(c) an invalid variable.
(d) a margin of error.
2. An example of a categorical variable is
(a) the name of the college a student attends.
(b) a student's weight in kilograms.
(c) a student's class rank, such as 25th out of 364.
(d) a student's sex (male or female).
(e) Both (a) and (d).
The stock market did well during the 1990s. Here are the percent total returns (change in price plus dividends
paid) for the Standard & Poor's 500 stock index:
The next five questions are related to this situation.
3. The median return during this period is
(a) 5.5
(b) 20.07
(c) 23.0
(d) 25.8
(e) 28.6
4. The third quartile of these returns is
(a) 7.6
(b) 30.5
(c) 31.1
(d) 31.7
(e) 33.4
5. The mean return is
(a) 20.07
(b) 20.69
(d) 25.8
(e) 33.4
(d) 20.07
(e) 25.8
(c) 22.3
6. The standard deviation of the returns is
(a) 13.75
(b) 13.98
(c) 14.74
7. A well-drawn histogram should have
(a) bars all the same width
(b) no space between bars (unless a class has no observations)
(c) a clearly marked vertical scale
(d) all of these
Here is a table of the undergraduate enrollment at a large state university, broken down by class:
8. To make a correct graph of the distribution of students by class, you could use
(a) a bar graph.
(b) a pie chart.
(c) a histogram.
(d) all of (a), (b), and (c).
(e) (a) or (b), but not (c).
9. For a distribution that is skewed to the right, usually
(a) the mean will be larger than the median
(b) the median will be larger than the mean
(c) the first quartile will be larger than the third quartile
(d) the standard deviation will be negative
(e) the minimum will be larger than the maximum
10. Here is a set of data: 1300, 18, 25, 19, -7, 24. Which observation is the outlier?
(a) 1300
(b) 25
(c) 19
(d) -7
(e) 24
11. You have calculated that the value of the standard deviation for a set of data is -2.5. You can conclude that
(a) the data are skewed to the left
(b) the data are skewed to the right
(c) there is very little variation in the data
(d) all of the observations in the data set are negative
(e) you made an error in your calculations
12. You calculate that the standard deviation of a set of observations is 0. This tells you that
(a) you made an arithmetic mistake.
(b) all the observations have the same value.
(c) there is no straight-line association.
(d) the mean must also be 0.
(e) Both (b) and (d) are true.
13. The five-number summary of a distribution consists of
(a) mean, median, standard deviation, and two quartiles.
(b) minimum, maximum, mean, median, and standard deviation.
(c) minimum, maximum, median, and two quartiles
(d) mean, standard deviation, correlation, and two quartiles
14. The standard deviation is a measure of
(a) the center of a distribution
(b) the variability of a distribution
(c) the association between two variables
(d) the standardized value of a variable
15. The box in the center of a boxplot marks
(a) the full range covered by the data
(b) the range covered by the middle half of the data
(c) the range covered by the middle three-quarters of the data
(d) the span one standard deviation on each side of the mean
(e) the span two standard deviations on each side of the mean
16. In a class of 31 students, how many would be above the third quartile?
(a) 31
(b) 16
(c) 15
(d) 8
(e) 7
FREE RESPONSE:
1) What is a distribution of a variable?
A picture that shows what values the variable takes and how often it takes those values
2) What types of values can standard deviation take?
Positive values from 0 to infinity
3) If I have a standard deviation of 0, what does this mean?
All the data points are the same number (like 5, 5, 5, 5, 5, 5, 5)
4) What percentage of the data is above Q3?
25%
5) What percentage of the data is below Q1?
25%
6) What percentage of the data is inside the box of a boxplot?
50%
7) What is a census?
When a researcher attempts to contact every individual in the population
8) What is the difference between an observational study & an experiment?
Observational study does not influence/change the individuals in any way- just looks for information that
already exists. Experiments impose a treatment on the individuals, changing them in some way in order to
measure the response.
9) Identify the following variables as categorical or quantitative
a. Number of calculator games on each students calculator
b. What type of car each student drives (brand of car, Toyota, Honda, etc.)
c. GPA
d. Number of mp3 songs each student has on their computer
e. Whether a student is for or against the death penalty
f. Amount of time spend studying for the Ch. 4 test
g. Letter grade on the chapter 8 test
h. Number of majors offered at a given college
10)
Quant
Cat
Quant
Quant
Cat
Quant
Cat
Quant
The following is info on hotels in the area.
exercise internet
room
room
price restaurants service room rate
Hotel
Pool
Comfort Inn
out
y
0
1
y
$149
Fairfield Inn
in
y
0
1
n
$119
Baymont Inn
out
y
0
1
y
$60
Chase Hotel
out
n
15
0
n
$139
Courtyard Marriott
in
y
0
1
dinner
$114
Hilton
in
y
10
2
y
$156
Best Western
in
y
9.95
2
y
$145
a. Identify the individuals
HOTELS
b. Identify the variables measured and label each as categorical or quantitative
Pool
Categorical
Exercise room
Cat
Internet Price
Quant
Restaurants
Quant
Room service
Cat
Room rate
Quant
11) For the following, identify the population, sample, individuals, and variables as clearly as possible:
a. A truckload of apples arrives at an apple juice production plant. The plant’s quality control
team selects 3 large buckets of apples from various locations in the truck. These apples are
inspected carefully, and the entire truckload is either accepted or rejected based on the 3
buckets.
Population = All apples that arrive at the plant
Sample = the 3 large buckets taken
Individuals = apples
Variables = the quality of the apple
b. I decide to survey local residents about their thoughts on the President. I send out 100 surveys
to local households in Bucks County and ask the adults to fill it out and send it back.
Population = All Bucks County residents
Sample = the 100 adults in the chosen households
Individuals = adults
Variables = thoughts on the president
c. A study on cancer patients took a random sample of 80 patients from the lung cancer ward at
Temple University Hospital. They then tested their reactions to a new drug to see if there were
any improvements on their conditions.
Population = All cancer patients
Sample = the 80 selected lung cancer patients at Temple University Hospital
Individuals = cancer patients
Variables = their condition
12) Identify the following as an experiment or an observational study:
a. Before a new variety of frozen muffins is put on the market, it is subjected to extensive taste
testing. People are asked to taste the new muffin and a competing brand and say which one
they prefer.
EXPERIMENT
b. A teacher wants to know about how her students spend their time after school. She passes out
a survey that asks numerous questions about this topic. The survey is anonymous.
OBSERVATIONAL STUDY
c. Teenage drivers have long been blamed for being bad drivers, and prone to accidents. To
decide whether this is a fair statement, a teenage statistics student goes to a local police
precinct and obtains a list of all accidents in the past 3 years. She then selects 100 of these
accidents and looks to see the age of the person at fault (teenage or adult).
OBSERVATIONAL STUDY
d. There has long been a debate about whether whitening toothpaste does as good of a job as
whitening strips. A researcher has 20 people use whitening strips for 5 days and another 20
people use whitening toothpaste for 30 days, and then compares their beginning tooth shade
to their new tooth shade.
EXPERIMENT
13) For each graph state whether the mean is higher, lower, or about the same as the median. Explain.
HIGHER
HIGHER
ABOUT THE SAME
14) Below is a two-way table for Gender vs. View of Death Penalty for a sample.
a.
b.
c.
d.
e.
f.
What percent of the class are males?
12/28 = 42.86%
What percent of the class are For the death penalty?
19/28 = 67.86%
What percent of the class are female and For the death penalty? 10/28 = 35.71%
What percent of the females are For the death penalty?
10/16 = 62.5%
What percent of the class are females, given that they are For the d.p.? 10/19 = 52.63%
What percent of the females are Against the death penalty?
6/16 = 37.5%
15) Below is a Two-Way table for Gender vs Hair Color
a.
b.
c.
d.
e.
What percent of the males have brown hair?
10/14 = 71.43%
What percent of those that have brown hair are males?
10/20 = 50%
What percent of the class are males and have brown hair?
10/34 = 29.41%
What percent of females have blond hair?
7/20 = 35%
Of only the males find the percent of each hair color.
Brown
Black
Blond
Males
1/14 = 7.14%
3/14 = 21.43%
10/14 = 71.43%
f. Of only the females find the percent of each hair color.
Brown
Black
Females
3/20 = 15%
7/20 = 35%
Blond
10/20 = 50%
16) Given the following info about the mean and median, decide the shape of the distribution of the data,
and pick a measure of center to report.
a. Mean = 170 Median = 80
Shape = RIGHT SKEW
Center = MEDIAN
b. Mean = 100
Median = 180
Shape = LEFT SKEW
Center = MEDIAN
c. Mean = 79
Median = 85
Shape = SYMMETRIC
Center = MEAN
17) Below is a list of the number of pieces of Halloween candy that 50 randomly selected children got on
Halloween night. This data is in the list CANDY.
62
62
63
65
66
66
67
70
71
72
72
73
73
73
74
75
75
76
76
76
76
77
77
77
78
78
78
79
80
80
81
81
82
82
85
86 91
88 95
89 96
89 99
91 100
106
111
112
120
130
a. Create a relative frequency histogram of the data below. Be sure to label all your axes!
42%
22%
14%
60
70
80
90
CANDY
100
200
300
b. Based on the shape of the distribution, what measures of center & spread would you report?
Since the distribution is right skewed, I would report the MEDIAN & IQR as the center &
spread.
c. Calculate the following statistics
mean
st. dev. n
min
82.02
14.78
50
62
Q1
med
Q3
max
IQR
73
78
89
130
16
d. Test for outliers using the 1.5*IQR rule. Show all work, and list any outliers you find.
A = 73 – (1.5*16) = 49
B = 89 + (1.5*16) = 113
Normal data = (49, 113)
Outliers: 120, 130
e. Create a boxplot of the data.
62
73
78
89
CANDY
112
120
130
f. Describe the distribution below (shape, center, spread, etc.)
Right skewed, unimodal. Center at median of 78 pieces, IQR of 16 pieces, and range of (62,
130) pieces. There are two outliers at 120 and 130 pieces.
18) The data below is in the list DATA2.
82 98 106 108
85 99 106 109
89 100 106 110
90 102 107 111
90 103 108 111
112
112
113
113
113
115
116
116
118
118
119
119
119
119
119
122 129 108 111 118
122 91 108 112 118
122 95 126 113
124 103 127 115
126 104 120 121
a. Create a stemplot, splitting your stems. (Sort the list in the calculator to help you)
Key:
8|2 = 82
b. Calculate the following statistics
mean
st. dev. n
min
110.46
10.94
57
82
Q1
med
Q3
max
IQR
105
112
119
129
14
c. Describe the distribution (shape, center, spread)
Left Skewed, unimodal. Center at median of 112 units, IQR of 14 units, and range of (82, 129)
units.
19) The data in the lists YOGA3 and YOGA8 (below) are the ages of participants in a 3pm & 8pm Yoga class.
YOGA3
35 57 68 75 68 53 64 73 84
38 59 68 76 75 55 68 75 85
40 60 69 77 51 62 73 77
YOGA8
18 19 22 28 56 18 20 24 31 44
18 19 23 29 71 18 20 25 31 44
18 19 24 29 43 19 20 25 39 19
a. Find the following summary statistics for the 3pm class and the 8pm class.
mean
st. dev. n
min
Q1
med
Q3
max
IQR
YOGA3
64.81
13.38
26
35
57
68
75
85
18
YOGA8
27.77
12.75
30
18
19
23.5
31
71
12
b. Create parallel boxplots of the two yoga classes.
YOGA 3
YOGA 8
18 19 23.5
31
35
44
56 57
68 71 75
85
c. Write a few sentences to compare the shape, center, and spread of the two plots.
The shape of the 3pm class is slightly left skewed, while the 8pm class is right skewed. Both
are unimodal. The 8pm class has 2 outliers at 56 and 71 years old.
The center for the 3pm class is at the median of 68 years old, which is much higher than the
median of the 8pm class which is 23.5 years old.
The IQR of the 3pm class is 18 years old, which is higher than the IQR of the 8pm class which
is 12 years old. The range of the 3pm class is (18, 71) years old which is wider than the range
of the 8pm class which is (35, 85) years old.
d. What age was the top 25% of YOGA3 above? How about the top 25% of YOGA8?
Top 25% of YOGA3 was above 75 years old, top 25% of Yoga8 is above 31 years old.
e. Where are the middle 50% of YOGA3 ages ? How about the middle 50% of YOGA8 ages?
Middle 50% of YOGA3 was btw (19, 31) and the middle 50% of YOGA8 was btw (57, 75).
20) During a budget meeting, local school board members decided to review class
size information to determine if budgets were correct. Summary statistics are
shown in the table.
a. Find the IQR
11 students
b. Check for outliers using the 1.5 rule. Show work.
A = 29 – (1.5*11) = 12.5
Normal data = (12.5, 56.5)
B = 40 + (1.5*11) = 56.5
NO OUTLIERS
x
S
min
Q1
med
Q3
Max
33.39
5.66
17
29
36
40
48
c. Based on the statistics, what shape does the distribution have?
Because the mean & median are close, the distribution is roughly symmetric
d. Describe the distribution.
Roughly symmetric, mean of 33.39 students, standard deviation of 5.66 students, range of
(17, 48) students.
21) Describe the following distributions (shape center, spread, etc.).
a. Plot1
Mean = 65
S = 18.5
Min = 12
Q1 = 34
Med = 67
Q3 = 91
Max = 117
10 20 30
40
50
60
70
80
90
Symmetric, bimodal
center at the mean of 65 units
standard deviation of 18.5 units
range of (12, 117) units
100 110 120
b. Plot 2
Mean = 71
S = 17.8
Min = 11
Q1 = 63
Med = 91
Q3 = 103
Max = 119
10 20 30
40
50
60 70
80
90
Right skewed, unimodal
center at the median of 91 units
IQR of 40 units
range of (11, 119) units
100 110 120
c. Plot 3
Mean = 52
S = 16.7
Min = 17
Q1 = 39
Med = 58
Q3 = 79
Max = 107
10 20 30
40
50
60
70
80
90
100 110 120
Symmetric, unimodal
center at the mean of 52 units
standard deviation of 16.7 units
range of (17, 107) units