Download /Users/heather/Desktop/website files/Math 201/m201ex1samplesol.nb

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Sample Exam #1, Math 201
1.
Use the data set given below to answer all of the following questions.
14.0, 18.4, 21.6, 22.1, 23.8, 24.3, 25.9, 26.5, 27.5,
29.2, 29.3, 29.4, 29.7, 29.8, 30.2, 30.8, 31.9, 33.5
HaL Use the statistical capability of your scientific calculator to find the mean, standard deviation,
and variance of the data:
êêx = 26.55
s = 5.05083 variance = 25.5109
HbL
Find by hand the first quartile Q1 , median M , third quartile Q3 , and the IQR.
Q1 = 23.8
M = 28.35
Q3 = 29.8
IQR = 6
The numbers are in order and there are 18 pieces of data so the median is the average of the 9th and 10th
27.5+29.2
pieces of data. M = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅ = 28.35
2
Q1 is the median of the first half of the data, or the 5th piece of data and Q2 is the median of the second
half of the data, or the 14th piece of data.
HcL
Create a boxplot for these data:
30
25
20
15
HdL Create a split–stemplot for these data:
First, round to the nearest whole number
1 4
1 8
2 2244
2 668999
3 000124
3
HeL
Which measure would be a better measure for the center of this distribution? Justify your choice.
Since the distribution is skewed and not very symmetric the median is the best measure for the center.
2.
The histogram below shows the distribution of a set of observations:
40
35
30
25
20
15
10
5
80
90
100
110
120
130
140
150
HaL Is the distribution symmetric, skewed to the left, or skewed to the right?
The distribution is skewed to the left.
HbL Is the mean less than or greater than the median?
Since the mean follows the skew, the mean is less than the median.
HcL How many data values are in the data set?
Adding
up
the
height
of
each
bar,
we
get:
2 + 3 + 4 + 6 + 16 + 25 + 43 + 42 = 141
HdL Use the histogram to accurately estimate the median.
The
position
of
the
median
is
found
n+1
141+1
142
ÅÅÅÅ2ÅÅÅÅÅÅ = ÅÅÅÅÅÅÅÅ2ÅÅÅÅÅÅÅ = ÅÅÅÅ2ÅÅÅÅÅ = 71
is the 71st entry, so M is between 130 and 140.
by:
3. Use Table A to answer the following questions. Find the proportion of observations from a standard
Normal distribution that satisfies each of the following statements.
HaL
z ¥ 3.13
-3 -2 -1
0
1
2
3
From Table A, we find the p-value to be 1 - .9991 = .0009 or .09%
HbL
-1.25 § z § 0.54
-3 -2 -1
0
1
2
3
We look up both -1.25 and 0.54 in Table A. Using these we find the p-value to be .7054-.1056=.5998
or 59.98%
HcL
58% of the observations are greater than what z value?
HcL
58% of the observations are greater than what z value?
-3
-2
-1
0
1
2
3
Since Table A uses the area to the left, we subtract 100%-58% and we look up 42% or .42 in table A to
find the z value. The closest we can get is .4168, which gives z = -.21
4.
Estimate the mean and standard deviation for the normal distribution whose density curve is shown.
m = ___16____ (This is the center.)
s = ____3____ (This is the distance from the center to the inflection points, ie. the steepest point on
the curve.)
5
10
15
20
25
The scores on the math section of the SAT test for Washington students (2006) are normally distributed
5.
with mean 532 and standard deviation 103.
HaL
What proportion of students received a score between 500 and 650?
500
650
First we'll find the z values for 500 and 650.
500-532
z1 = ÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅ º -.31
103
650-532
z2 = ÅÅÅÅÅÅÅÅ103
ÅÅÅÅÅÅÅÅÅÅÅÅ º 1.15
Using table A the number of students scoring less than 500 is 37.83% and the number of students
scoring less than 650 is 87.49%. So the number of students scoring between 500 and 650 is 87.4937.83=49.66%.
HbL
83% of the test scores were less than what value?
x
HbL
83% of the test scores were less than what value?
x
The percentage closest to 83% from table A is 83.15 %. This gives a z value of z = .96. We'll use this to
solve
for
the
test
score.
x-532
.96 = ÅÅÅÅÅÅÅÅ
Å
ÅÅÅ
Å
Å
103
98.88 = x - 532
x º 631
So 83% of the test scores were less than 631.
6.
Match the scatterplot with the correlation values given below:
HaL
HbL
HcL
HdL
HeL
HfL
Scatterplot #1
Scatterplot #2
Scatterplot #3
Scatterplot #4
Scatterplot #5
Scatterplot #6
r = -0.64
goes with Scatterplot #__4_
r = 1.00
goes with Scatterplot #__6_
r = -0.11
goes with Scatterplot #__1_
r = 0.68
goes with Scatterplot #__3_
r = 0.59
goes with Scatterplot #__5_
r = -0.92
goes with Scatterplot #__2_
7.
A new teacher is analyzing whether or not there is an association between scores earned by students on
their first exam in the course and the course grade earned by students at the end of the term. Exams are scored
using a 100 point scale (0 to 100 points) and course grades use a 100% scale (0% to 100%). There are 35
students in the course.
HaL Decide which variable, Exam 1 Score or Course Grade, is the explanatory variable and which is
the response variable. Circle the scatterplot below that matches your decision.
Explanatory Variable: ___Exam 1 Score______
Response Variable:
___Course Grade______
Course Grade
1
0.9
0.8
HaL Decide which variable, Exam 1 Score or Course Grade, is the explanatory variable and which is
the response variable. Circle the scatterplot below that matches your decision.
Explanatory Variable: ___Exam 1 Score______
Response Variable:
___Course Grade______
Course Grade
1
0.9
0.8
0.7
0.6
0.5
50
60
70
80
90
Exam 1
100
HbL Find the equation of the regression line y` = a + b x. The mean and standard deviation for the
Course Grade variable is 0.766 and 0.123 The mean and standard deviation for the Exam 1 Score
variable is 83.943 and 11.295 The correlation is 0.7845 Be very sensitive to roundoff errors.
sy
We'll use the formula, y` = a + b x, with b = r ÅÅÅÅ
ÅÅ and a = êêy - b êê
x.
sx
0.123
b = .7845 H ÅÅÅÅÅÅÅÅ
Å
ÅÅÅ
Å
ÅÅ
Å
L
º
.008543
11.295
a = 0.766 - .008543 H83.943L º .04887
So the regression line is given by y` = .04887 + .008543 x.
HcL Predict the course grade for a student who scores a 91 on their first exam. Use the equation of the
regression line found in (b).
y` = .04887 + .008543 H91L º .826
HdL There is an obvious outlier present in the data set - what is its coordinate on the scatterplot?
Describe what happened to the student represented by the outlier.
The coordinate is approximately H96, .56L.
The student did well on the first exam, scoring a 96 out of 100, but didn't pass the class with a 56%
overall.
HeL What proportion of the 35 students earned an A for the course? Any Course Grade between 90%
and 100% would be assigned the A letter grade.
4
There are 4 of students that have scores above .9. So ÅÅÅÅ
ÅÅ º .114 or 11.4 % of the students earned an A.
35
Depending on the quarter and instructor some of the previous exercise may not appear until exam 2.
8.
Use the data set to answer the following questions: 2,2,2,4,4,5,5,5,7,7,7,7,8,11,11
HaL
Find the five number summary for the given data.
There are 15 pieces of data, so the median is the 8th or middle piece.
M=5
The median of the first half of the data is the 4th piece.
Q1 = 4
The median of the second half of the data is the 12th piece.
Q3 = 7
This gives a five number summary of Min = 2, Q1 = 4, M = 5, Q3 = 7, Max = 11
HbL
Create a boxplot for the data.
10
8
6
4
2
9.
For the data set from the previous problem, describe the distribution of the data and determine if the five
number summary was the best representation of the spread.
The distribution is skewed and so the five number summary is the best representation of the spread, because
mean and standard deviation are better suited for symmetric distributions.
10.
Create a split stemplot for the following data and describe the distribution:
11,12,16,19,22,23,25,25,26,28,29,30,32,34,38,38
1 12
1 69
2 23
2 55689
3 024
3 88
11. For the data set in the previous problem determine the best summary and give justification for your
answer. (Just state the type of summary, don't compute it.)
Either summary could be justified. It is single peaked and somewhat symmetric, so mean and standard deviation could be used.
On the other hand, there is a little bit of skewness, so the five number summary may be more desirable.
12. The length of human pregnancies from conception to birth varies according to a distribution that is
approximately normal with mean 266 days and standard deviation 16 days. Use the 68-95-99.7 rule to answer
the following questions.
HaL
Between what values do the lengths of the middle 99.7% of all pregnancies fall?
The middle 99.7% of the pregnancies will fall within 3 standard deviations from the mean.
266 ± 3 H16L
266 ± 48
218 to 314 days
That is, 99.7% of the pregnancies will fall between 218 and 314 days.
The middle 99.7% of the pregnancies will fall within 3 standard deviations from the mean.
266 ± 3 H16L
266 ± 48
218 to 314 days
That is, 99.7% of the pregnancies will fall between 218 and 314 days.
HbL
218 234 250 266 282 298 314
How long are the longest 2.5% of all pregnancies?
The longest 2.5% of all pregnancies will fall above 2 standard deviations from the mean.
266 + 2 H16L = 298
So the longest 2.5% of all pregnancies last 298 or more days.
218 234 250 266 282 298 314
13.
Use table A to answer the following questions.
HaL
What percentage of human pregnancies last less than 270 days?
270-266
z = ÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅ = .25
16
218 234 250 266 282 298 314
From table A, we get PHz § .25L = .5987
HbL
What percentage of human pregnancies last between 250 and 270 days?
218 234 250 266 282 298 314
270-266
ÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅ
16
z1 =
= .25
PHz1 § .25L = .5987
250-266
z2 = ÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅ = -1
16
PHz2 § -1L = .1587
218 234 250 266 282 298 314
270-266
ÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅ
16
z1 =
= .25
PHz1 § .25L = .5987
250-266
z2 = ÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅ = -1
16
PHz2 § -1L = .1587
So the percentage of human pregnancies between 250 and 270 days is .5987 - .1587 = .44.
14.
Below how many days do 67% of all human pregnancies last?
Using table A, we'll find the value of z that corresponds to 0.67.
We find z = .44.
x-266
Solving .44 = ÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅ for x, we get x = 266 + 16 H.44L = 273.04 days. So x should be less than 274 days.
16
Use the histogram to answer the following questions:
15.
Frequency
Histogram
9
8
7
6
5
4
3
2
1
0
Frequency
20
25
30
35
40
More
Bin
HaL
Describe the distribution of the data set.
The distribution is single peaked with a slight skew to the right.
HbL
How many observations are represented by the histogram?
4 + 7 + 8 + 4 + 2 = 25
HcL
Find the median and mean on the histogram and justify your answers.
Median º 27 (Find either the 13th entry or the point where the areas on either side are equal.)
Mean º 31 (The average gets pulled towards the skew, so it should be more than the median.)
Note: Actual answers may vary, but the relationships described above need to be true.
16. The following table gives information about a sample of sports cars that were test driven. Determine
who the individuals are in the study, what the variables are, and whether each variable is categorical or quantitative.
City mpg Highway mpg color
Audi TT Quattro
20
28
white
BMW M Coupe
17
25
black
Ford Thunderbird
17
23
red
The individuals are the cars being tested. The variables are city mpg, highway mpg and color.
The two mpg variables are quantitative and color is categorical.
17. Compute the mean and standard deviation for the city mpg for all the cars in the study from problem 9.
êê
20+17+17
x = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅ = 18
3
2
########
H20-18L2 +H17-18L2 +H17-18L
s = "################################
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅ#######
ÅÅÅÅÅÅ = 1.73205
2