Download File - Ms. Wiestling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Eigenstate thermalization hypothesis wikipedia , lookup

Foundations of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
This test compares 2 types of data to see whether one is independent of the other or if there is some link between them. For
example, is favorite musical style independent of age, or can we see a link between the two?
Here is an example question:
For his Mathematical Studies Project a student gave his classmates a questionnaire to fill out. The results for the
question on the gender of the student and specific subjects taken by the student are given in the table below, which is a
2 × 3 contingency table of observed values.
History
Biology
French
Totals
Female
22
20
18
(60)
Male
20
11
9
(40)
Totals
(42)
(31)
(27)
TOTAL 100
Step 1: State the null hypothesis and the alternative/alternate hypothesis.
H 0  null hypothesis  : The variables are independent.
H1  alternative/alternate hypothesis  : The variables are NOT independent.
Do not say that they are dependent, although sometimes the IB questions will ask you if they are dependent.
H0: Subject taken is independent of gender
H1: Subject taken is not independent of gender
Step 2: Find the chi-square calculated value.
a.
We make another similar table showing expected values:
Female
Male
History
Biology
French
p
18.6
16.2
16.8
12.4
10.8
The expected values for each cell are calculated from values in the first table as follows:
To find p : 60 x 42 x 100 = 25.2
100 100
Note how the horizontal and vertical totals relating to cell p are used. This is similar to multiplying
independent events in probability.
P: probability of female x probability of history x total
b. Use the formula to calculate the Chi Squared value
χ2 = Σ (fo – fe)2
fe
fo  observed frequencies & fe  expected frequencies


For each cell in the table, subtract the expected value from the observed value and square the
answer. Divide this by the expected value
Add together these values for all the cells.
c.
Calculator way: We find the Chi Squared value by putting the values from the table of observed values
and the table of expected values into the calculator.
1)
2)
3)
4)
5)
6)
7)
Choose MATRIX (2nd x^-1) and go to EDIT
Make sure your matrix is the right size (here we have a 2x3)
Enter your Observed (don’t include the total columns) values in Matrix A
Choose STAT and go to TESTS
Scroll down to χ2-Test and press ENTER
Choose Calculate.
The calculator finds the expected values and stores them in Matrix B
Step 3: Find the chi-square critical value.
a.
We must find the number of degrees of freedom. This is simple:
(number of rows – 1) x (number of columns – 1), here (2-1) x(3-1) = 1 x 2 = 2
b. Note the level of significance. This is given in the IB exam but you have to decide which level to use in
your project. The most common levels are 1%, 5%, and 10%.
c.
Use the level of significance to find the critical value from the table of data in the data booklet:
 Go down the left column until you reach the number of degrees of freedom (here 2)
 Go across until you reach the column represented by 1 – significance level. For example go across
until you reach the column represented by 0.95 and read off your value.
(note: we use 0.95 because there is usually a 5% significance level and 0.95 is 1 – 5%)

The level of significance indicates the level of “error” we are willing to accept in making our conclusion.
A 1% level of significance means that with an average of 1 time out of 100 we rejected the
have rejected it.
H 0 when we shouldn’t
The smaller the level of significance, the more statistically significant our conclusion will be. We usually use a
level of significance of 5%, but sometimes we will use 1% or 10%.
Step 4: Compare the chi-square calculated value to the chi-square critical value.
a.
If the χ2 value is less than the critical value, we fail to reject the null hypothesis.

In other words, If
2
2
calc
 crit , then we fail to reject the null hypothesis (we cannot accept).
This means that there is not enough evidence to justify a rejection of H 0 and so we would
conclude that the variables are independent.
b. If the χ2 value is more than the critical value, we reject the null hypothesis.

In other words, If
2
2
calc
 crit , then we reject
the null hypothesis. This means that there is
enough evidence to justify rejecting H 0 and this would indicate that the variables are NOT
independent. We choose H1 with the understanding that we have not proved H1 beyond a
reasonable doubt.
For example: “Since 1.78 < 5.99 Fail to Reject H0 – Subject taken is independent of gender”
Using the p-value to decide whether to reject or fail to reject:
a. Calculate the p-value the same way you find the chi-square calculated value (Step 2)
b. If the p-value is less than the significance level then we reject the null hypothesis.
c. If the p-value is more than the significance level then we fail to reject the null hypothesis.
For example p-value= 0.411. So we reject the null hypothesis because 0.411 < 0.05.
Name ______________________________________________ Date ______________ Period _________
Ibms
Statistics review
Probability and Statistics
1.
A country motel has room for 80 people. The manager keeps records of the number of guests staying
at the motel over the summer period of 90 days. The results are shown below.
Number of Guests
1 – 10
11 – 20
21 – 30
31 – 40
41 – 50
51 – 60
61 – 70
71 – 80
Frequency (days)
8
11
14
17
20
9
7
4
a) Calculate an estimate of:
i)
ii)
iii)
the mean
median
standard deviation
b) State the modal group
c) Construct a histogram and frequency polygon of the data. Use a scale of 1 cm to represent
10 guests on the horizontal axis and 1 cm to represent 1 day on the vertical axis.
d) Prepare a cumulative frequency curve of the data. Use a scale of 1 cm represents 5 guests on the
horizontal axis and 1 cm represents 5 days on the vertical axis. Use your curve to find:
i)
ii)
iii)
2.
Julie examines a new variety of bean and does a count on the number of beans in 33 pods.
Her results were: 5, 8, 10, 4, 2, 12, 6, 5, 7, 7, 5, 5, 5, 13, 9, 3, 4, 4, 7, 8, 9, 5, 5, 4, 3, 6, 6, 6, 6, 9, 8, 7, 6
a)
b)
c)
d)
e)
3.
the number of nights when the motel had 30 or less guests
the number of nights when the motel had less than 45 guests
The motel will “break even” if it has less than 30 guests for no more than 25% of the
time. Did the motel break even?
Find the mean, median, mode, standard deviation, lower and upper quartiles for this set of data.
Find the interquartile range of the data set.
What are the lower and upper boundaries for the outliers?
Are there any outliers?
Draw a box and whisker plot of the data set.
Eight sample values are: 6, a, 7, a, 4, b, 6 and 8 where a and b are single digit numbers
and the mean is 7.
a) Show that a and b have two possible solutions.
b) If there is a single mode, what is the median?
The diagram shows the distribution of weekly income of the employees of a small company.
Frequency
4.
4
2
0
200
400
600
800
Weekly Income ($)
a)
b)
c)
d)
5.
i)
How many employees are recorded?
Estimate the mean weekly income.
Estimate the median wage.
If the owner of the company, who receives $1680 per week, is added to the data,
which of these measures will be affected most; median or mean?
A local cricket club keeps a record of the number of runs their star batsman,
Izzy Wacket scores, and the number of runs the team scores during 10 matches
in a season.
Match
Runs Izzy scores
(x)
Runs the team scores
(y)
1
2
3
4
5
6
7
8
9
10
52
65
13
120
105
24
48
140
20
76
201
180
270
320
295
260
195
402
84
306
(a)
Calculate the mean number of runs Izzy scores over the 10 matches.
(b)
Calculate the mean number of runs the team scores over the 10 matches.
(c)
(i)
Find the standard deviation of Izzy’s runs.
(ii)
Given that sxy  2550 , find the equation of the line of regression
y on x.
(iii)
In the 11th match Izzy scores 90 runs. Use your equation
to estimate how many runs you would expect the team to score.
(i)
One match has caused the gradient to become smaller than it
perhaps should be. Which match do you think this is? Justify
your answer.
(ii)
Explain what effect this will have on the answer to part (c).
(d)
(ii)
A survey into the votes of 400 men and 400 women was carried out in Ohio and
the results are shown below.
(a)
Democrat
Republican
Women
250
150
Men
220
180
It is assumed that the vote cast is independent of gender. To prove this
a  2 test of independence is conducted. The first part is a table of
expected values drawn below.
Democrat
Republican
Women
a
b
Men
c
165
Find the values of a, b, and c that are missing in the table above.
6.
(b)
Calculate the  2 test statistic.
(c)
Are the votes cast independent of gender at the 5% level of significance?
Justify your answer.
A sample of twelve pairs of numbers is described by the following statistics:
x  9.8
y  101.8 S xy  25.64 S x  4.66 S y  5.93
The product moment correlation coefficient between the variables is r  0.93 .
a)
If the value of x decreases will the value of y decrease, increase or remain the same?
State a reason for your answer.
b)
Find the equation of the regression line for y on x, in the form y  mx  c .