Download Quiz#3 Key

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
Biometry (BIOL4090) Quiz #3.
Student name: ______________KEY________________
This 30-minute quiz is worth 5 points. Show all your work to get partial (full) credit. You may use a
calculator, but not a smart phone. You may also leave calculations as ratios if necessary. I have extra
paper, if you need some. Write your name on every page and staple them together with this cover page.
1) Define the following terms, making sure you include the terms in parenthesis (+0.25 each):
-
random sampling (probability): Note: provide both criteria (+0.25 each)
Random sampling occurs when every member of a “biological population” has an equal and
independent probability of being sampled, and thus of becoming part of the “statistical
population”
-
median (percentile): The value separating the higher half and the lower half of a sample, a
population, or a probability distribution. The median is the 50% percentile – or the midpoint of
the distribution.
-
mean (sum): The mean of a series of numbers is equal to
the sum of the values, divided by the number of observations.

X
n
i 1
Xi
n
-
standard deviation (variance): The square root of the variance
-
coefficient of variation (ratio): Ratio of standard deviation, divided by the mean (SD / mean).
The coefficient of variation (CV) quantifies the variability in the data, standardized by the value
of the mean. The CV is often expressed as a percentage, by multiplying the SD / mean ratio by
100 % (CV = 100* (SD / mean)).
-
skewness (distribution):
Skewness (or skew) quantifies
the symmetry of a distribution;
whether the mass of the
distribution is symmetrical
about its center point. For
symmetrical distributions, skew
(or skewness) = 0.
Note: for a normal distribution,
kurtosis = 0.
1
 kurtosis (distribution): Measures the degree to which observations cluster in the tails
or the center of the distribution, compared to a normal distribution
of the same mean and S.D. Note: for a normal distribution, kurtosis = 0.
Positive kurtosis:
Less values in tails and more values
close to mean. Leptokurtic.
Negative kurtosis:
More values in tails and less values close
to mean. Platykurtic.
2) Briefly explain when (under what circumstances) you would use the mean or the median to describe
the central tendency (location) of a dataset. Be as specific as you can (+0.25 for each).
The mean is used when the data follow a normal distribution, or when we have a reason to believe that
the population the data belong to are normal (e.g., we are measuring a continuous variable – like height
or weight - that should follow a normal distribution). The mean should be used when we are describing
datasets with large enough sample sizes (n > 25) and whenever there are no large outliers in the
distribution because the value of the mean is heavily influenced by these extreme values.
The median is used when we are not sure that the data – or the population the data belong to – follow a
normal distribution. In particular, the median is ideal whenever we have large outliers (unusually large
or small values), because these extreme values do not influence the value of the median very strongly.
3) Briefly explain the reason why the statement ”mean +/- S.D.” is correct, but the statement “mean +/variance” is not correct. Be as specific as you can (+0.25 for each).
The mean and the SD have the same units (of whatever variable you are characterizing), and they can
be added (mean +/- SD) to describe the spread of the distribution or used in a ratio (CV = mean / SD) to
characterize the variability in the data.
The variance has different units from the mean
because it squares the values. Thus, if the mean
is measuring “units”, the variance is measuring
“squared units”. To fix this problem, we
calculate the SD, by taking the square root off
the variance. The SD measures “units”.
2
4) Report the following ten parameters for these samples: median, mode, mean, S.D., variance, CV,
skewness, 50% percentile, 20% percentile, 80% percentile. Show all of your work to get full (partial)
credit: (+0.1.each parameter). Use the back sheet of paper if you need extra space.
Dataset A: 1,2,3,4,5
Median: 3 (50% percentile of dataset: 2 values larger and 2 values smaller). 50% percentile: 3
20% percentile: 1, since 1 is the smaller value and there are 20% of 1s in the dataset (1 of 5)
80% percentile: 4, since there are 20% of 1s, 2s, 3s, 4s, 5s.
Mode: There is no mode. All five values are equally frequent.
Their frequency is 1. Their relative frequency is 20% (1 / 5)
Mean: Sum of values (1+2+3+4+5) and divide by the number of samples (5) = 15 / 5 = 3
S.D.:
CV =
The square root of the variance. Sqrt (2.5) = 1.58
100* (SD / mean) = 100 * (1.58 / 3) = 52.67% (0.5267 also works)
Variance: Sum of the squared deviations, divided by the degrees of freedom (4).
= [ (1-3) ^2 + (2-3) ^2 + (3-3) ^2 + (4-3) ^ 2 + (5-3) ^2 ] / 4
= ( 4 + 1 + 0 + 1 + 4) / 4 = 10 / 4 = 2.5
Skewness: The distribution is symmetrical, thus, skew = 0.
Dataset B: 3,3,3,3,3
Median: 3 (All values are the same)
50% percentile: median = 3 (All values are the same)
20% percentile: 3 (All values are the same)
80% percentile: 3 (All values are the same)
Mode: The mode is 3. This is the most frequent value (100% of the data are 3s).
Mean: Sum of values (3+3+3+3+3) and divide by the number of samples (5) = 15 / 5 = 3
S.D.:
CV =
The square root of the variance. Sqrt (0) = 0
100* (SD / mean) = 100 * (0 / 3) = 0.0% (0 also works)
Variance: Sum of the squared deviations, divided by the degrees of freedom (4).
= [ (3-3)^2 + (3-3) ^ 2 + (3-3) ^2 + (3-3) ^ 2 + (3-3) ^2 ] / 4
= ( 0+0+0+0) / 4 = 0 / 4 = 0
Skewness: The distribution is symmetrical, thus, skew = 0
3