Download Unit 3 Statistics Review Sheet

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Categorical variable wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Unit 3 Statistics Review Sheet
What you need to know:
ONE VARIABLE STATISTICS:
*types of variables: categorical, quantitative discrete, quantitative continuous
*how to make a frequency table and how to calculate/estimate the mean from a frequency table or
graph
*how and when to make one variable graphs (histograms, boxplots, normal distributions)
*when to use median vs. mean (and which type of spread)
*calculate 5-number summary (min, Q1, med, Q3, max), mean and standard deviation (by calculator)
*how to create and interpret a cumulative frequency graph (find the mean, Q1, Q3, IQR, percentiles)
*draw and calculate normal distributions using normalcdf or invnorm for data values that are not on a
standard deviation
*Empirical Rule: 68 (within 1 s.d) -95 (within 2 s.d.) -99 (within 3 s.d.)
TWO VARIABLE STATISTICS:
* the formula for Pearson’s product moment correlation coefficient and how to find the coefficient of
determination (can use calculator)
*to interpret these values above as strong or weak..and….positive or negative correlation
*sketch a best fit line by hand – making sure it goes through the point (𝑥̅ , 𝑦̅)
*interpolation vs. extrapolation (and reliability)
*how to write a regression line using the least squares regression formula y  y 
s xy
s x 2
( x  x ) (this
and put it in the form y=mx+b (remember if sxy is not given, then it is appropriate to
use the calculator LinReg to find the best-fit line or the correlation coefficient.)
formula will be given on the test)
*predict using a sketched best fit line or using a least squares regression line
*calculate expected values, degrees of freedom, and chi square statistic
*run a Chi-Squared test and know when to reject using the chi square statistic and p-value
*know when it is appropriate to combine data values to run a Chi-Squared test (several of your
expected values are less than 5)
Part I: one variable statistics
1.
An examination worth 100 marks was given to 800 biology students. The cumulative frequency graph
for the students’ results is shown below.
a.
Find the number of students who scored
45 marks or less for the test.
b. Find the median score.
c. Between what values do the middle 50%
of test results lie?
d. What percentage of students obtained a mark of 55 or more?
e. What percentage of students obtained a mark between 40 to 70 marks?
f.
If a “distinction” is awarded to the top 10% of students, what score is required to receive this
honour?
g.
What approximate mark is the 38th percentile?
h. Given the lowest mark was 5 and the greatest mark was 98, represent the information on a box and
whisker plot using the scale below.
2. The figure below shows the lengths in centimetres of fish found in the net of a small trawler.
11
10
9
8
7
Number of 6
fish
5
4
3
2
1
(a) Find the total number of fish in the net.
(b)
Find (i) the modal length interval;
(ii) the interval containing the median length;
(iii) an estimate of the mean length.
0
–1
10 20 30 40 50 60 70 80 90
110 120 130
Length (cm)
The fishing company must pay a fine if more than 10 of the catch have lengths less than 40cm.
(c)
Do a calculation to decide whether the company is fined.
3. The weights of bags of rice are normally distributed with a mean of 1003 grams and a standard deviation
of 2 grams.
a.
Draw a normal distribution diagram to illustrate this information.
b. Between what data values does 68% of the data fall within?
c. Find the probability that a bag of rice weights less than 999 grams.
d. The manufacturer states that the bags of rice weighs 1 kg. Find the probability that a bag of rice is
underweight?
e. 400 bags of rice are weighed. Calculate the expected number of bags of rice that are underweight.
f.
5% of the bags weight more than p grams. Find the value of p.
g. Find the bag weight in the 30 percentile.
Part II: two variable statistics
4. A homebuyer makes a table of the prices of 2 bedroom flats in Greater Bristol, and their distances from the
city centre.
a.
Draw a scatter diagram of the data by hand.
b. Given sxy = –141.08, find r and r2.
c. Comment on the relationship between a flat’s distance from the centre of Bristol.
d. Find (𝑥̅ , 𝑦̅) of the data, draw that on your scatter diagram, and draw a best fit line on your graph.
e. Using your line in d., predict the price of a 2 bedroom flat that is 9 km from the city centre.
f.
Find the least squares regression line and put into the form y=mx+b (show work).
g. Using your line in f., predict the price of a 2 bedroom flat that is 9 km from the city centre.
h. Compare your part e.) and part g.) answers – how close were they? Which one is more accurate?
5.
(a)
For his Mathematical Studies project, Marty set out to discover if stress was related to the amount of
time that students spent travelling to or from school. The results of one of his surveys are shown in the
table below.
Travel time (t mins)
↓
Number of students
moderate stress
high stress
t  15
9
5
18
15  t  30
17
8
28
30  t
18
6
7
low stress
He used a χ2 test at the 5 level of significance to find out if there was any relationship between
student stress and travel time.
(i)
Write down the null and alternative hypotheses for this test.
(2)
(ii)
Write down the table of expected values. Give values to the nearest integer.
(3)
#5 continued…
(iii) According to your answer in part (ii), would it be necessary to combine data? Why or why not?
(2)
(iv) Show that there are 4 degrees of freedom.
(1)
(v)
Calculate the χ2 statistic for this data.
(2)
The χ2 critical value for 4 degrees of freedom at the 5 level of significance is 9.488.
(vi)
What conclusion can Marty draw from this test? Give a reason for your answer.
(2)
(12 marks)
6. A number of employees at a factory were given x additional training sessions each. They were then timed on
how long (y seconds) it took them to complete a task. The results are shown in the scatter diagram below. A
list of descriptive statistics is also given.
14
n = 9,
sum of x values:  x = 54,
sum of y values:  y = 81,
time taken (seconds)
12
10
mean of x values: x = 6,
mean of y values: y = 9,
8
6
standard deviation of x: sx = 1.94,
standard deviation of y: sy = 2.35,
4
2
covariance: sxy = –3.77.
0
2
4
6
8
10
number of additional training sessions
(a)
Determine the product-moment correlation coefficient (r) for this data.
(2)
(b)
What is the nature of the relationship between the amount of additional training and the time taken to
complete the task?
(2)
(c)
(i)
Determine the equation of the linear regression line for y on x.
(ii)
Find the expected time to complete the task for an employee who only attended three additional
training sessions. Is this prediction reliable? Why or why not?
(4)
(Total 8 marks)
7. The veterinarian has gathered the following data about the weight of dogs and the weight of their puppies.
Dog
Puppy
Total
Heavy
Light
Heavy
36
27
63
Light
22
35
57
Total
58
62
120
The veterinarian wishes to test the following hypotheses.
H0: A puppy's weight is independent of its parent's weight.
H1: A puppy's weight is related to the weight of its parent.
(a)
The table below sets out the elements required to calculate the  2 value for this data.
fo
fe
fe – fo
(fe – fo)2
(fe – fo)2 / fe
heavy/heavy
36
30.45
–5.55
30.8025
1.012
heavy /light
27
32.55
5.55
30.8025
0.946
light/heavy
22
27.55
5.55
30.8025
1.118
light/light
35
a
b
c
d
(i)
Write down the values of a, b, c, and d.
(4)
(ii)
2
What is the value of  calc
for this data?
(1)
(iii)
How many degrees of freedom exist for the contingency table?
(1)
(iv)
Write down the critical value of  2 for the 5% significance level.
(1)
(b)
Should H0 be accepted? Explain why.
(2)
(Total 9 marks)