Download answers for test 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
1
NAME
____________________
STUDENT #____________________
SC/BIOL 2060 3.0 STATISTICS FOR BIOLOGISTS
TEST #1
Feb 9, 2017
(TOTAL PAGES = 7)
(TOTAL MARKS = 45)
(TIME = 60 MINS)
INSTRUCTIONS:
1) WRITE YOUR NAME AND STUDENT NUMBER ON THE TEST
PAPER
2) PLEASE KEEP YOUR TEST PAPER TO YOURSELF!
3) ANSWER QUESTIONS DIRECTLY ON THE
TEST PAPER USING THE SPACE PROVIDED
4) READ QUESTIONS CAREFULLY, AND THINK CAREFULLY
BEFORE ANSWERING, PROVIDING THE BEST ANSWER.
Be concise. Do not pad your answers with extra
words or marks may be deducted.
5) YOU MAY USE A non-programmable CALCULATOR.
You cannot use any statistical functions on your
calculator.
6) BUDGET YOUR TIME APPROPRIATELY.
2
1. For each of the following: i) name the variables measured and ii) state the type of
variables that have been studied in the experiment. Where more than one variable has
been studied indicate which, if any, is the response versus the explanatory variable.
Provide only short answers as below. (9 marks)
(Example answers: Variable: femur length – numerical-continuous, response variable
Variable: eye colour – categorical-nominal, explanatory variable).
a) You sample randomly 20 tadpoles and measure their length in millimeters.
Length: numerical-continuous
b) You count the number of people named “Mary”, “Peter” or “Paul” on the York
campus.
Names: categorical-nominal
c) To one possible cause of mercury contamination you measure the levels of mercury
(in micrograms) for 10 fish sampled adjacent to a gold mine and compare them to levels
for 10 fish sampled 20 km upstream of the mine.
Mercury level: numerical-continuous response variable
Location of fish: categorical-nominal explanatory variable
d) You wish to explore whether genetic modification using the BT toxin gene, protects
plants against fungal attack. You randomly sample and count the number of genetically
modified plants that are infected (or not) and the number of non-genetically modified
plants infected (or not) in a large field.
Fungal attack: categorical-nominal response variable
Plant type: categorical-nominal explanatory variable
e) You explore factors that might cause diabetes in mice. You randomly assign 80 mice
to each of the following treatment combinations for 1 year, and then determine if they
have diabetes or not:
80 receive high sugar and high fat diet
80 receive high sugar and low fat diet
80 receive low sugar and high fat diet
80 receive low sugar and low fat diet
Diabetes: categorical-nominal response variable
Sugar consumption: categorical-nominal explanatory variable
Fat consumption: categorical-nominal explanatory variable
3
2. Estimate the mean, median, 1st and 3rd quartiles and ~95% confidence limits of the
mean, for following data sets. Identify all extreme values, if there are any.
(8 marks)
a) Data: 30, 20, 60, 100, 10, 40 RANKED: 10 20 30 40 60 100 n = 6
Mean =43.3
Median = 35.0 it’s the mean of the middle two numbers, (30+40)/2)
1st Quartile = 20 it’s the middle number of the lower half of the data
3rd Quartile =60 it’s the middle number of the lower half of the data
s2 ={ ∑ 𝑥 2 − (∑ 𝑥)2 /𝑛}/(n-1), giving s2 =(16600 – 260 2/6)/5
so s2 = 1066.67, and s =32.6599 and standard error of mean = 13.3333
Approximate 95% confidence interval =
Upper limit given by 43.3333 +2 x 13.3333 = 70.0
Lower limit given by 43.3333 - 2 x 13.3333 = 16.7
Extreme values we need IQR = 60-20 = 40.
So values greater than 60+1.5x40 =120 would be extreme.
And values less than 20 - 1.5x40 =-40 would be extreme.
List extreme values NO EXTREME VALUES
b) Data: 3, -1, 20, -4, 2, -3, -2, RANKED: -4 -3 -2 -1 2 3 20
n=7
Mean = 2.14
Median = -1 it’s the middle number since n is odd
1st Quartile = -3 (middle of the lower half of data)
3rd Quartile = 3 (middle of upper half of data)
s2 ={ ∑ 𝑥 2 − (∑ 𝑥)2 /𝑛}/(n-1), giving s2 =(443 – 15 2/7)/6
so s2 = 68.47619048, and s = 8.275 and standard error of mean = 3.12767
Approximate 95% confidence limits = _____________________________
Upper limit given by 2.14 +2 x 3.12767= 8.4
Lower limit given by 2.14 -2 x 3.12767 =-4.1
Extreme values we need IQR = 3 – (-3) = 6.
So values greater than 3+1.5x6 =12 would be extreme.
And values less than -3 - 1.5x6 =-12 would be extreme.
List extreme values = 20
4
3. Answer the following based upon the boxplot shown below. (4 marks)
The arrow labeled A) points to Q3 or upper quartile
The arrow labeled B) points to MEAN
The arrow labeled C) points to and extreme point or outlier
What, if anything, can be inferred about the shape of the distribution of the height of
Aardvarks from the graph below:
The distribution is likely skewed to the left (or negatively skewed).
A
B
C
Sample of Aardvarks n = 55
4. A biologist obtains a mean of 𝑥 =100 mm and a standard deviation of s = 30.0 mm
from a random sample of n = 50. They decide to convert mm to cm. What is the value of
the mean and variance after this transformation to cm. (2 marks)
To convert mm to cm need to divide by 10 for the mean, and for standard deviation.
So 𝑥 = 100/10 = 10.0 cm
s = 30/10 = 3, so variance s2 = 32 = 9.0
5
5. If you flip four coins. What is the probability that all four show “heads”? (1 mark)
For 1 toss Pr [H] =0.5. assuming independence, Pr[4 H] = 0.54 = 1/16 or 0.0625
6. A couple intends to have 5 children. What is the probably that they will have 2 male
and 3 female children? (3 marks).
use binomial distribution. Where n=5, prob male p=0.5, X=2 males
Pr[X] = n!/[n!(n-x!)] x px x (1-p)n-x
Pr[X=2 males] = 5!/(2!3!) x 0.52 x 0.53 = 0.3125
So probability of 2 male and 3 female offspring is 0.3125
7. Describe in words, how you would generate the sampling distribution of the Median?
(2 marks)
Obtain a random sample of fixed size n from a population.
Estimate median which is essentially the middle of the ranked data.
Repeat this procedure numerous (infinitely many) times.A histogram plot of these
Medians is the sampling distribution of the Median.
8. For each of the following, state the null (Ho) and alternative (Ha) hypotheses and
clearly STATE if Ha is one-sided or two-sided. (4 marks)
a) You wish to determine if running speed differs if people wear nikke versus addiddas
shoes. You measure running speed of 10 random people wearing nikke and 10 wearing
addiddas.
Ho: Nike-wearers speeds equals Addiddas-wearers speed
Ha: Nike-wearers speeds does not equal Addiddas-wearers speed.
2-sided.
b. You suspect that expression of a gene for chlorophyll synthesis is greater for plants
grown in shade versus full sun. You measure the expression of this gene in 20 plants
grown in shade and 20 grown in full sun light (after randomly allocating them to these
treatments).
Ho: gene expression in shade = gene expression in sun
Ha: gene expression in shade > gene expression in sun
1-sided
6
9. Four-leaf clovers are normally very rare. You are told that the proportion of four-leaf
clovers in an unusual population of clover is 0.3, but you suspect the proportion is much
lower. To test this hypothesis you sample 20 plants, and find 2 are four-leaf, while the
other 18 are the usual three-leaf form.
Test this hypothesis.
(6 marks)
Ho: the proportion of 4-leaf clovers, p = 0.3
Ha: p < 0.3
Include calculations in here
Estimate of proportion 4-leaf is p̂ = 2/20 = 0.1, so they are less than expected, so I’ll compute
probably of a result as extreme or more extreme, that is, prob of 2, 1, or 0 4-leaf clovers
n = 20, p = 0.3, binomial distribution equation is :
Pr[X] = n!/[n!(n-x!)] x px x (1-p)n-x
Pr[x = 2] = 20!/[2!(18!)] x .32 x (.7)18 = 0.027845873
Pr[x = 1] = 20!/[1!(19!)] x .31 x (.7)19 = 0.006839337
Pr[x = 0] = 20!/[0!(20!)] x .30 x (.7)20 = 0.000797923
Since test is one sided just sum probs to give P-value = 0.035
Decision made about the null hypothesis and a short (1 sentence) statement of conclusions.
Since the P-value = 0.035 is less than α = 0.05, we reject the null hypothesis, Ho.
We conclude the proportion of 4-leaf clovers is less than 0.3 and it appears to be p̂ = 0.1
7
10) Write a single complete SAS program to obtain descriptive statistics (e.g. mean,
variance, median, Q1, Q3, etc.) for each of two groups of rats for which you measure the
time it takes each rat to run a maze. One group of rats (group C) was the control group,
while the other (group B) was forced to listen to Hotline Bling in the maze. The time
taken to run the maze (in seconds) is listed below as is the group (B or C) to which each
rat belonged. Include the data in your program exactly as it is written below. In addition,
have the SAS program convert the letter C to the word CONTROL, and the letter B to the
word BLING (6 marks).
C
B
C
B
C
B
10
8
36
4
41
5
DATA RATS;
INPUT MAZE $ TIME;
IF MAZE = 'C' THEN TREAT ='CONTROL';
IF MAZE = 'B' THEN TREAT ='BLING';
OR YOU CAN USE THE FOLLOWING
IF MAZE = 'C' THEN TREAT ='CONTROL';
ELSE TREAT = 'BLING';
DATALINES; (OR CARDS;)
C 10
B 8
C 36
B 4
C 41
B 5
;
PROC SORT;
BY TREAT;
PROC UNIVARIATE;
BY TREAT;
RUN;