Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Biometry (BIOL4090) Quiz #3. Student name: ______________KEY________________ This 30-minute quiz is worth 5 points. Show all your work to get partial (full) credit. You may use a calculator, but not a smart phone. You may also leave calculations as ratios if necessary. I have extra paper, if you need some. Write your name on every page and staple them together with this cover page. 1) Define the following terms, making sure you include the terms in parenthesis (+0.25 each): - random sampling (probability): Note: provide both criteria (+0.25 each) Random sampling occurs when every member of a “biological population” has an equal and independent probability of being sampled, and thus of becoming part of the “statistical population” - median (percentile): The value separating the higher half and the lower half of a sample, a population, or a probability distribution. The median is the 50% percentile – or the midpoint of the distribution. - mean (sum): The mean of a series of numbers is equal to the sum of the values, divided by the number of observations. X n i 1 Xi n - standard deviation (variance): The square root of the variance - coefficient of variation (ratio): Ratio of standard deviation, divided by the mean (SD / mean). The coefficient of variation (CV) quantifies the variability in the data, standardized by the value of the mean. The CV is often expressed as a percentage, by multiplying the SD / mean ratio by 100 % (CV = 100* (SD / mean)). - skewness (distribution): Skewness (or skew) quantifies the symmetry of a distribution; whether the mass of the distribution is symmetrical about its center point. For symmetrical distributions, skew (or skewness) = 0. Note: for a normal distribution, kurtosis = 0. 1 kurtosis (distribution): Measures the degree to which observations cluster in the tails or the center of the distribution, compared to a normal distribution of the same mean and S.D. Note: for a normal distribution, kurtosis = 0. Positive kurtosis: Less values in tails and more values close to mean. Leptokurtic. Negative kurtosis: More values in tails and less values close to mean. Platykurtic. 2) Briefly explain when (under what circumstances) you would use the mean or the median to describe the central tendency (location) of a dataset. Be as specific as you can (+0.25 for each). The mean is used when the data follow a normal distribution, or when we have a reason to believe that the population the data belong to are normal (e.g., we are measuring a continuous variable – like height or weight - that should follow a normal distribution). The mean should be used when we are describing datasets with large enough sample sizes (n > 25) and whenever there are no large outliers in the distribution because the value of the mean is heavily influenced by these extreme values. The median is used when we are not sure that the data – or the population the data belong to – follow a normal distribution. In particular, the median is ideal whenever we have large outliers (unusually large or small values), because these extreme values do not influence the value of the median very strongly. 3) Briefly explain the reason why the statement ”mean +/- S.D.” is correct, but the statement “mean +/variance” is not correct. Be as specific as you can (+0.25 for each). The mean and the SD have the same units (of whatever variable you are characterizing), and they can be added (mean +/- SD) to describe the spread of the distribution or used in a ratio (CV = mean / SD) to characterize the variability in the data. The variance has different units from the mean because it squares the values. Thus, if the mean is measuring “units”, the variance is measuring “squared units”. To fix this problem, we calculate the SD, by taking the square root off the variance. The SD measures “units”. 2 4) Report the following ten parameters for these samples: median, mode, mean, S.D., variance, CV, skewness, 50% percentile, 20% percentile, 80% percentile. Show all of your work to get full (partial) credit: (+0.1.each parameter). Use the back sheet of paper if you need extra space. Dataset A: 1,2,3,4,5 Median: 3 (50% percentile of dataset: 2 values larger and 2 values smaller). 50% percentile: 3 20% percentile: 1, since 1 is the smaller value and there are 20% of 1s in the dataset (1 of 5) 80% percentile: 4, since there are 20% of 1s, 2s, 3s, 4s, 5s. Mode: There is no mode. All five values are equally frequent. Their frequency is 1. Their relative frequency is 20% (1 / 5) Mean: Sum of values (1+2+3+4+5) and divide by the number of samples (5) = 15 / 5 = 3 S.D.: CV = The square root of the variance. Sqrt (2.5) = 1.58 100* (SD / mean) = 100 * (1.58 / 3) = 52.67% (0.5267 also works) Variance: Sum of the squared deviations, divided by the degrees of freedom (4). = [ (1-3) ^2 + (2-3) ^2 + (3-3) ^2 + (4-3) ^ 2 + (5-3) ^2 ] / 4 = ( 4 + 1 + 0 + 1 + 4) / 4 = 10 / 4 = 2.5 Skewness: The distribution is symmetrical, thus, skew = 0. Dataset B: 3,3,3,3,3 Median: 3 (All values are the same) 50% percentile: median = 3 (All values are the same) 20% percentile: 3 (All values are the same) 80% percentile: 3 (All values are the same) Mode: The mode is 3. This is the most frequent value (100% of the data are 3s). Mean: Sum of values (3+3+3+3+3) and divide by the number of samples (5) = 15 / 5 = 3 S.D.: CV = The square root of the variance. Sqrt (0) = 0 100* (SD / mean) = 100 * (0 / 3) = 0.0% (0 also works) Variance: Sum of the squared deviations, divided by the degrees of freedom (4). = [ (3-3)^2 + (3-3) ^ 2 + (3-3) ^2 + (3-3) ^ 2 + (3-3) ^2 ] / 4 = ( 0+0+0+0) / 4 = 0 / 4 = 0 Skewness: The distribution is symmetrical, thus, skew = 0 3