Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BIOSTAT - 2 • The final averages for the last 200 students who took this course are 90 79 100 78 55 90 93 81 76 98 78 55 67 64 60 82 52 86 79 74 80 63 53 51 63 85 61 75 69 77 79 63 57 63 72 87 56 54 65 65 76 82 57 52 90 82 50 50 97 60 79 89 53 69 50 68 86 73 97 52 Are you worried? 84 80 50 92 94 59 67 81 66 59 74 87 52 66 89 68 53 88 51 90 53 93 65 61 63 51 85 69 73 57 90 71 94 92 51 90 61 92 52 71 58 80 72 61 94 54 52 73 53 90 94 53 76 83 95 79 61 70 54 83 68 73 89 57 56 57 61 68 80 91 87 67 60 51 60 92 63 79 71 79 73 50 81 84 74 81 81 91 63 85 75 54 80 95 67 95 82 91 57 85 92 74 98 81 90 86 82 65 75 83 74 77 72 65 59 83 87 79 69 89 70 50 86 56 98 73 94 76 74 51 79 57 74 97 84 63 71 89 84 57 BIOSTAT - 2 • Why not sort grades from highest to lowest [ordered array] 100 98 98 98 97 97 97 95 95 95 94 94 94 94 94 93 93 92 92 92 92 92 91 91 91 90 90 90 90 90 90 90 90 89 89 89 89 89 88 87 87 87 87 86 86 86 86 85 85 85 85 84 84 84 84 83 83 83 83 82 82 82 82 82 81 81 81 81 81 81 80 80 80 80 80 79 79 79 79 79 79 79 79 79 78 78 77 77 76 76 76 76 75 75 75 74 74 74 74 74 74 74 73 73 73 73 73 73 72 72 72 71 71 71 71 70 70 69 69 69 69 68 68 68 68 67 67 67 67 66 66 65 65 65 65 65 64 63 63 63 63 63 63 63 63 61 61 61 61 61 61 60 60 60 60 59 59 59 58 57 57 57 57 57 57 57 57 56 56 56 55 55 54 54 54 54 53 53 53 53 53 53 52 52 52 52 52 52 51 51 51 51 51 51 50 50 50 50 50 50 • Is this a more meaningful way to present the data? BIOSTAT - 2 • Why not group the data into grades of A, B, C, D, and F [frequency distribution] • That means we need to count the number of grades between 90 and 100, 80 and 89, etc. • Go to “Tools”, “Data Analysis (might have go to Tools, Add-Ins, and click on the 2 Data Analysis modules), Histogram, and follow directions. BIOSTAT - 2 • Input range: sweep all your data • Bin range: sweep the cell boundaries you input somewhere on your spreadsheet – cell widths should normally be equal. 50 60 70 80 90 100 • Now click on Cumulative % and Chart Output [this will plot your histogram] • OK BIOSTAT - 2 • Output: Bin Frequency Cumulative % 50 6 3.00% 60 43 24.50% 70 36 42.50% 80 45 65.00% 90 45 87.50% 100 25 100.00% More 0 100.00% 50 200.00% 100.00% 0.00% 0 50 60 70 80 90 10 M 0 or e Frequency Histogram Frequency Cumulative % Bin • Histogram does not look right? BIOSTAT - 2 • Fix histogram by eliminating gaps between cells. 50 200.00% 100.00% 0.00% 0 50 60 70 80 90 10 M 0 or e Frequency Histogram Frequency Cumulative % Bin • Find “format data series” and “gap width”. How you do this depends on version of Excel you have. Note angle on labels for X-axis. BIOSTAT - 2 • Unfortunately grades of 50 were not included in cells 50-59. That’s because Excel counts based on the following Bins 50 60 70 80 90 100 Actual Cell < 50 > 50 - 60 > 60 - 70 > 70 - 80 > 80 - 90 > 90 - 100 Bin Frequency Cumulative % 50 6 0.03 60 43 0.245 70 36 0.425 80 45 0.65 90 45 0.875 100 25 1 More 0 1 BIOSTAT - 2 • Following bins seem to work Actual Grades 0-49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 100 100 98 98 98 97 97 97 95 95 95 94 94 94 94 94 93 93 92 92 92 92 92 91 91 91 90 90 90 90 90 90 90 90 89 89 89 89 89 88 87 87 87 87 86 86 86 86 85 85 85 85 84 84 84 84 83 83 83 83 82 82 82 82 82 81 81 81 81 81 81 80 80 80 80 80 79 79 79 79 79 Actual Cells < 49.9 >49.9 - 59.9 >59.9 - 69.9 >69.9 - 79.9 >79.9 - 89.9 >89.9 - 100 79 79 79 79 78 78 77 77 76 76 76 76 75 75 75 74 74 74 74 74 Bin 49.9 59.9 69.9 79.9 89.9 100 More 74 74 73 73 73 73 73 73 72 72 72 71 71 71 71 70 70 69 69 69 Frequency Cumulative % 0 0 45 0.225 38 0.415 42 0.625 42 0.835 33 1 0 1 69 68 68 68 68 67 67 67 67 66 66 65 65 65 65 65 64 63 63 63 63 63 63 63 63 61 61 61 61 61 61 60 60 60 60 59 59 59 58 57 57 57 57 57 57 57 57 56 56 56 55 55 54 54 54 54 53 53 53 53 53 53 52 52 52 52 52 52 51 51 51 51 51 51 50 50 50 50 50 50 BIOSTAT - 2 • Final frequency table and histogram Actual Grades 50 - 59 60 - 69 70 - 79 80 - 89 90 - 100 Total = Frequency 45 38 42 42 33 200 Relative Frequency 0.225 0.19 0.21 0.21 0.165 1 Percent 22.5% 19.0% 21.0% 21.0% 16.5% 100.0% 50 200.00% 100.00% 0.00% 0 49 59 .9 69 .9 79 .9 89 .9 .9 10 M 0 or e Frequency Histogram Bin Frequency Cumulative % BIOSTAT - 2 • Other statistical software will do the same thing, but you should always try out a small test case of data just to make sure that data is being placed into the proper cells. BIOSTAT - 2 • Some key decisions: – How many cells should you have [we had 5 cells in this example]. In general, you would have between 5 and 25 cells. The more data you have, the more cells you would want to use. – How do you determine the Bin Ranges? Most statistical software will determine these bin ranges for you, but they might not be “neat” numbers. In this case, if you did not input specific bin ranges, you would get Bin Frequency 50 62.5 75 87.5 More 6 49 53 53 39 BIOSTAT - 2 • Problems – Work problems 2.3.1and 2.3.5 – Look at data for problems 2.3.6 and 2.3.9 BIOSTAT - 2 • Numerical Techniques: – Measures of Central Tendency [Location] • Arithmetic Mean • Median • Mode • Measures of Dispersion [Variability] – Range – Variance – Standard Deviation Measures of Central Location… • The arithmetic mean, a.k.a. average, shortened to mean, is the most popular & useful measure of central location. • It is computed by simply adding up all the observations and dividing by the total number of observations: Mean = Sum of the observations Number of observations Arithmetic Mean… Population Mean Sample Mean Measures of Central Location… • The median is calculated by placing all the observations in order; the observation that falls in the middle is the median. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd) Sort them bottom to top, find the middle: 0 0 5 7 8 9 12 14 22 Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even) Sort them bottom to top, the middle is the simple average between 8 & 9: 0 0 5 7 8 9 12 14 22 33 median = (8+9)÷2 = 8.5 Measures of Central Location… • The mode of a set of observations is the value that occurs most frequently. • A set of data may have one mode (or modal class), or two, or more modes. If no values occur more than one time each, it is said that the data has no mode. Measures of Variability… • Measures of central location fail to tell the whole story about the distribution; that is, how much are the observations spread out around the mean value? For example, two sets of class grades are shown. The mean (=50) is the same in each case… But, the red class has greater variability than the blue class. Range… • The range is the simplest measure of variability, calculated as: • Range = Largest observation – Smallest observation • E.g. • Data: {4, 4, 4, 4, 50} Range = 46 • Data: {4, 8, 15, 24, 39, 50} Range = 46 Variance… • Variance and its related measure, standard deviation, are arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures. • Population variance is denoted by • (Lower case Greek letter “sigma” squared) • Sample variance is denoted by • (Lower case “S” squared) Statistical Symbols Size Mean Variance Population Sample N n Variance • Population Variance: • Sample Variance: Sample Mean & Variance… Sample Mean Sample Variance Sample Variance (shortcut method) Standard Deviation… • The standard deviation is simply the square root of the variance, thus: • Population standard deviation: • Sample standard deviation: Excel Computations from Previous Data • Data: 100 98 98 98 97 97 97 95 95 95 94 94 94 94 94 93 93 92 92 92 92 92 91 91 91 90 90 90 90 90 90 90 90 89 89 89 89 89 88 87 87 87 87 86 86 86 86 85 85 85 85 84 84 84 84 83 83 83 83 82 82 82 82 82 81 81 81 81 81 81 80 80 80 80 80 79 79 79 79 79 79 79 79 79 78 78 77 77 76 76 76 76 75 75 75 74 74 74 74 74 74 74 73 73 73 73 73 73 72 72 72 71 71 71 71 70 70 69 69 69 69 68 68 68 68 67 67 67 67 66 66 65 65 65 65 65 64 63 63 63 63 63 63 63 63 61 61 61 61 61 61 60 60 60 60 59 59 59 58 57 57 57 57 57 57 57 57 56 56 56 55 55 54 54 54 54 53 53 53 53 53 53 52 52 52 52 52 52 51 51 51 51 51 51 50 50 50 50 50 50 Excel Computations from Previous Data • Formulas: Mean = Median = Mode = Variance = Std. Dev. = =AVERAGE(A1:J20) =MEDIAN(A1:J20) =MODE(A1:J20) =VAR(A1:J20) =STDEV(A1:J20) [Excel will show only one mode, if you have more than one mode] • Results: Mean = Median = Mode = Variance = Std. Dev. = 73.11 74 79 200.62 14.16 • Work Problem 2.5.7