Download Stat Test 1:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Name:
Practice Test 1
Instructions: Do ALL of the following questions. Each question is worth 10 points (a total of 120
points). Budget your time wisely, not spending too much time on low point questions and not being
able to finish high point questions. It is probably best to look over all questions before you start.
You want to make sure to have enough time to do all the calculation questions.
Note: The actual test will only be 100 points, not 120.
Formulae:
Class width = ( Largest data value – Smallest data value ) / number of classes
CV 


z 
x

Position of Q1: L  .25 * n
Position of pth percentile: L 
Position of Q3: L  .75 * n
p
*n
100
Finding the percentile of a given data value: Percentile  100 *
IQR  Q3  Q1
LOB  Q1  1.5IQR
yˆ  a  bx
residual  y  yˆ
Chebyshev’s Inequality: 1  1 / K 2
(number of values less than x)  .05
n
UOB  Q3  1.5IQR
1. A restaurant wants to use a sample to gather information from its patrons.
A) What is a sample? Why use samples?
B) Describe (not just name, but describe) 2 different ways the restaurant could properly conduct
a sample
C) Describe one catastrophically bad way the restaurant could conduct a sample, and why would
it be catastrophic?
2. A study is conducted to examine the relationship between the consumption of red wine and the
occurrence of heart disease among people over the age of 50. A cohort of 250 people is selected,
and the subjects record their wine red wine consumption over then next 10 years. The heart-health
of each subject is also monitored.
A) What type of study is this?
B) A negative correlation is found between “the quantity consumed of red wine” and a
measured presence of heart disease. What does this mean?
C) One reason this relationship is exists could be because consuming red wine causes a change
to person’s heart-health. Offer another explain of why this relationship might exist using the
idea of confounding factors.
3. The Nike Corporation commissions a study that looks at ownership of athletic footwear. A
survey is done looking at a treatment variable of “number of athletic shoes” own by minors and an
outcome of “hours spent in physical activity each day”. The study finds a statistically significant
positive correlation between the number of athletic shoes a child owns and the amount of time they
spend in physical activity each day.
A) What does it mean there’s a “positive correlation” between the number of athletic shoes and
physical activity of a child?
B) What types of variables are “number of athletic shoes” and “time spent in physical activity”?
C) Why is there is serious potential for bias in this study?
D) The study concludes that if the government bought every child two extra pairs of athletic
shoes, it would increase the amount of physical activity by 1.3 hours per day. Explain why
this conclusion might be incorrect.
4. Financial economists have shown that if you buy a portfolio of stocks, say 20 different stocks,
instead of just one, the mean return of your financial investment will stay the same, but the standard
deviation will decrease significantly.
A) What does that mean that the standard deviation of your return on investments will decrease?
B) Assume the mean return to stocks is 7%, and the standard deviation of portfolio of stocks is
5%,and the frequency histogram of returns is approximately bell-shaped, what fraction of
returns would be within 1 standard deviation of the mean? How much are returns that are 1
standard deviation from the mean?
C) How much would be returns that are 2 standard deviations above the mean? Using the
empirical rule, how often would we expect to see returns 2 standard deviations above the
mean?
5. Formulae
A) What is the mathematical formula for a population mean, and explain it
B) What is the mathematical formula for a sample standard deviation, and explain it.
6. Cumulative Frequency Distribution
The following is the relative cumulative frequency distribution of how many units students have
successful completed who have been at Cabrillo for at least 1 year (called “returning students).
Fictional data.
Number of
units
successful
completed
Relative
Cumulative
Frequency
0-5
6-11
12-17 18-23 24-29 30-35 36-41 42-47 48-53 54-59
3%
10%
21%
34%
48%
64%
77%
87%
92%
95%
60+
?
A) What is value for 60+ units?
B) What fraction (in percentage) of returning students have completed at least 30 units?
C) What fraction of returning students have completed less than 24 units?
D) What fraction of returning students have completed from 42 through 47 units?
7. The following is the HDL cholesterol levels of 40 randomly selected women (data were sorted)
27
37
57
66
28
40
58
70
30
45
61
72
32
47
62
73
34
48
63
73
36
49
63
74
37
53
64
80
37
53
64
80
37
54
64
81
37
56
65
84
A) What is the median level of cholesterol?
B) What is the location of the first quartile?
C) What is the location of the 33rd percentile?
D) What is the value of the 33rd percentile?
E) A cholesterol rate of 72 is what percentile?
F) Assume the population mean is 55, and the population variance is 270. What is the z score of the
maximum HDL cholesterol observation in the sample ( = 84)?
8.
20
16
12
8
4
0
5
10
15
20
25
30
35
40
Murder Rate of a sample of major cities (per
100,000)
45
50
Use the frequency histogram above that shows the results of a sample of the murder rate (per
100,000 people) of major US cities.
A) How many cities were sampled?
B) What is the relative frequency of cities with a murder rate of 20 to less than 25 murders?
C) What is the cumulative frequency of a murder rate from 15 to less than 20?
D) How many cities had a murder rate of at least 25?
E) How many cities had a murder rate less than 15?
F) Describe the shape of the histogram
9 & 10 data, Student Scores Data. The following are the final course scores for a sample of
students taking statistics (fictional data):
69, 65, 54, 56, 76, 60, 81, 87, 68, 84, 87, 60, 43, 68, 74, 77, 68, 70, 73, 79, 72, 62, 59, 29, 43, 55,
54, 11, 73, 59, 71, 63, 66, 79, 52, 73, 62, 75, 91, 84, 63, 90, 93, 37, 77, 66, 73, 65, 67, 77
9. Calculating basic summary statistics
A) Calculate all measures of center for the student scores:
B) Calculate the standard deviation and variance for the student scores:
C) Find the Five-Number Summary of the student scores:
D) Draw a modified boxplot of the student scores:
0
10
20
30
40
50
60
70
80
90
100
10. (continued from 9) Draw a histogram of the scores with a class width of 10 and starting at zero.
Label the top of each column with the frequency. What shape does the histogram have?
0
10
20
30
40
50
60
70
80
90
100
Data for 11 & 12 Student scores 2. The next questions use the following sampled data:
Stat score is the score the student received in the statistics class
Age is the age of the student
Algebra score is the score the student received in their intermediate algebra class.
All data is fictitious.
Important: keep track of which variable is the dependent and which is the explanatory.
Stat Score
43
54
56
59
62
66
69
77
77
79
79
81
84
93
Algebra
Score
Age
20
22
33
37
29
26
33
20
27
31
18
21
23
19
50
53
58
64
69
71
68
79
74
75
85
77
87
96
11. Regression 1
A) Regress Stat Score (Y) on Age (X). Write the estimated regression equation (report the
calculated a & b in equation form)
B) Interpret the X-Coefficient
C) Interpret the Y-axis intercept
C) Predict the Statistics Score if a Student is 15 years old
D) Predict the Statistics Score if a Student is 55 years old
12. Regression 2
A) Regress Stat Score (Y) on Algebra Score (X) and write the regression equation:
B) Interpret the Y-axis intercept
C) Calculate and interpret the X-axis intercept
D) If a student wants to earn a score of 100 in Statistics, what will they need to achieve in
Algebra?
E) Draw the estimated regression line on the following graph:
100
90
80
Stat Score
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
Algebra Score
END OF TEST
70
80
90
100
Definitions
Be able to do two things:
1) Define the following terms
2) Explain why it’s important or why we want to know it.
3) There are other aspects of these words/concepts that you will need to know that is not part of
the definition (such as how to calculate it) and is not part of this worksheet.
Population
Sample
Simple Random Sample
Types of Samples:
Sample of Convenience
Stratified
Cluster
Systematic
Voluntary Response
Statistic
Parameter
Qualitative Variables
Nominal Variables
Ordinal Variables
Quantitative Variables
Discrete Variables
Continuous Variables
Randomized Experiment
Observational Study
Prospective
Retrospective
Cross-section
Outcome
Treatment
Double-Blind
Confounding
Bias
Frequency
Frequency Distribution
Relative Frequency Distribution
Cumulative Frequency Distribution
Relative Cumulative Frequency Distribution
Mean
Median
Mode
Mid-Range
Range
Variance
Standard Deviation
Empirical Rule
Chebyshev’s Inequality
z-score
First Quartile
Third Quartile
Percentile
Positive Association (or relation)
Negative Association
Correlation Coefficient
Regression Line