Download BIOSTAT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BIOSTAT - 2
• The final averages for the last 200
students who took this course are
90
79
100
78
55
90
93
81
76
98
78
55
67
64
60
82
52
86
79
74
80
63
53
51
63
85
61
75
69
77
79
63
57
63
72
87
56
54
65
65
76
82
57
52
90
82
50
50
97
60
79
89
53
69
50
68
86
73
97
52
Are you worried?
84
80
50
92
94
59
67
81
66
59
74
87
52
66
89
68
53
88
51
90
53
93
65
61
63
51
85
69
73
57
90
71
94
92
51
90
61
92
52
71
58
80
72
61
94
54
52
73
53
90
94
53
76
83
95
79
61
70
54
83
68
73
89
57
56
57
61
68
80
91
87
67
60
51
60
92
63
79
71
79
73
50
81
84
74
81
81
91
63
85
75
54
80
95
67
95
82
91
57
85
92
74
98
81
90
86
82
65
75
83
74
77
72
65
59
83
87
79
69
89
70
50
86
56
98
73
94
76
74
51
79
57
74
97
84
63
71
89
84
57
BIOSTAT - 2
• Why not sort grades from highest to lowest [ordered
array]
100
98
98
98
97
97
97
95
95
95
94
94
94
94
94
93
93
92
92
92
92
92
91
91
91
90
90
90
90
90
90
90
90
89
89
89
89
89
88
87
87
87
87
86
86
86
86
85
85
85
85
84
84
84
84
83
83
83
83
82
82
82
82
82
81
81
81
81
81
81
80
80
80
80
80
79
79
79
79
79
79
79
79
79
78
78
77
77
76
76
76
76
75
75
75
74
74
74
74
74
74
74
73
73
73
73
73
73
72
72
72
71
71
71
71
70
70
69
69
69
69
68
68
68
68
67
67
67
67
66
66
65
65
65
65
65
64
63
63
63
63
63
63
63
63
61
61
61
61
61
61
60
60
60
60
59
59
59
58
57
57
57
57
57
57
57
57
56
56
56
55
55
54
54
54
54
53
53
53
53
53
53
52
52
52
52
52
52
51
51
51
51
51
51
50
50
50
50
50
50
• Is this a more meaningful way to present the data?
BIOSTAT - 2
• Why not group the data into grades of A,
B, C, D, and F [frequency distribution]
• That means we need to count the number
of grades between 90 and 100, 80 and 89,
etc.
• Go to “Tools”, “Data Analysis (might have
go to Tools, Add-Ins, and click on the 2
Data Analysis modules), Histogram, and
follow directions.
BIOSTAT - 2
• Input range: sweep all your data
• Bin range: sweep the cell boundaries you
input somewhere on your spreadsheet –
cell widths should normally be equal.
50
60
70
80
90
100
• Now click on Cumulative % and Chart
Output [this will plot your histogram]
• OK
BIOSTAT - 2
• Output:
Bin Frequency Cumulative %
50
6
3.00%
60
43
24.50%
70
36
42.50%
80
45
65.00%
90
45
87.50%
100
25
100.00%
More
0
100.00%
50
200.00%
100.00%
0.00%
0
50
60
70
80
90
10
M 0
or
e
Frequency
Histogram
Frequency
Cumulative %
Bin
• Histogram does not look right?
BIOSTAT - 2
• Fix histogram by eliminating gaps between cells.
50
200.00%
100.00%
0.00%
0
50
60
70
80
90
10
M 0
or
e
Frequency
Histogram
Frequency
Cumulative %
Bin
• Find “format data series” and “gap width”. How
you do this depends on version of Excel you
have. Note angle on labels for X-axis.
BIOSTAT - 2
• Unfortunately grades of 50 were not
included in cells 50-59. That’s because
Excel counts based on the following
Bins
50
60
70
80
90
100
Actual Cell
< 50
> 50 - 60
> 60 - 70
> 70 - 80
> 80 - 90
> 90 - 100
Bin Frequency Cumulative %
50
6
0.03
60
43
0.245
70
36
0.425
80
45
0.65
90
45
0.875
100
25
1
More
0
1
BIOSTAT - 2
• Following bins seem to work
Actual Grades
0-49
50 - 59
60 - 69
70 - 79
80 - 89
90 - 100
100
98
98
98
97
97
97
95
95
95
94
94
94
94
94
93
93
92
92
92
92
92
91
91
91
90
90
90
90
90
90
90
90
89
89
89
89
89
88
87
87
87
87
86
86
86
86
85
85
85
85
84
84
84
84
83
83
83
83
82
82
82
82
82
81
81
81
81
81
81
80
80
80
80
80
79
79
79
79
79
Actual Cells
< 49.9
>49.9 - 59.9
>59.9 - 69.9
>69.9 - 79.9
>79.9 - 89.9
>89.9 - 100
79
79
79
79
78
78
77
77
76
76
76
76
75
75
75
74
74
74
74
74
Bin
49.9
59.9
69.9
79.9
89.9
100
More
74
74
73
73
73
73
73
73
72
72
72
71
71
71
71
70
70
69
69
69
Frequency Cumulative %
0
0
45
0.225
38
0.415
42
0.625
42
0.835
33
1
0
1
69
68
68
68
68
67
67
67
67
66
66
65
65
65
65
65
64
63
63
63
63
63
63
63
63
61
61
61
61
61
61
60
60
60
60
59
59
59
58
57
57
57
57
57
57
57
57
56
56
56
55
55
54
54
54
54
53
53
53
53
53
53
52
52
52
52
52
52
51
51
51
51
51
51
50
50
50
50
50
50
BIOSTAT - 2
• Final frequency table and histogram
Actual Grades
50 - 59
60 - 69
70 - 79
80 - 89
90 - 100
Total =
Frequency
45
38
42
42
33
200
Relative Frequency
0.225
0.19
0.21
0.21
0.165
1
Percent
22.5%
19.0%
21.0%
21.0%
16.5%
100.0%
50
200.00%
100.00%
0.00%
0
49
59 .9
69 .9
79 .9
89 .9
.9
10
M 0
or
e
Frequency
Histogram
Bin
Frequency
Cumulative %
BIOSTAT - 2
• Other statistical software will do the same
thing, but you should always try out a
small test case of data just to make sure
that data is being placed into the proper
cells.
BIOSTAT - 2
• Some key decisions:
– How many cells should you have [we had 5 cells in
this example]. In general, you would have between 5
and 25 cells. The more data you have, the more cells
you would want to use.
– How do you determine the Bin Ranges? Most
statistical software will determine these bin ranges for
you, but they might not be “neat” numbers. In this
case, if you did not input specific bin ranges, you
would get
Bin
Frequency
50
62.5
75
87.5
More
6
49
53
53
39
BIOSTAT - 2
• Problems
– Work problems 2.3.1and 2.3.5
– Look at data for problems 2.3.6 and 2.3.9
BIOSTAT - 2
• Numerical Techniques:
– Measures of Central Tendency [Location]
• Arithmetic Mean
• Median
• Mode
• Measures of Dispersion [Variability]
– Range
– Variance
– Standard Deviation
Measures of Central Location…
• The arithmetic mean, a.k.a. average,
shortened to mean, is the most popular &
useful measure of central location.
• It is computed by simply adding up all the
observations and dividing by the total
number of observations:
Mean =
Sum of the observations
Number of observations
Arithmetic Mean…
Population Mean
Sample Mean
Measures of Central Location…
• The median is calculated by placing all
the observations in order; the observation
that falls in the middle is the median.
Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd)
Sort them bottom to top, find the middle:
0 0 5 7 8 9 12 14 22
Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even)
Sort them bottom to top, the middle is the
simple average between 8 & 9:
0 0 5 7 8 9 12 14 22 33
median = (8+9)÷2 = 8.5
Measures of Central Location…
• The mode of a set of observations is the
value that occurs most frequently.
• A set of data may have one mode (or
modal class), or two, or more modes. If no
values occur more than one time each, it
is said that the data has no mode.
Measures of Variability…
• Measures of central location fail to tell the
whole story about the distribution; that is,
how much are the observations spread out
around the mean value?
For example, two sets of class
grades are shown. The mean
(=50) is the same in each case…
But, the red class has greater
variability than the blue class.
Range…
• The range is the simplest measure
of variability, calculated as:
• Range = Largest observation –
Smallest observation
• E.g.
•
Data: {4, 4, 4, 4, 50}
Range
= 46
•
Data: {4, 8, 15, 24, 39, 50}
Range = 46
Variance…
• Variance and its related measure, standard
deviation, are arguably the most important
statistics. Used to measure variability, they also
play a vital role in almost all statistical inference
procedures.
• Population variance is denoted by
• (Lower case Greek letter “sigma” squared)
• Sample variance is denoted by
• (Lower case “S” squared)
Statistical Symbols
Size
Mean
Variance
Population
Sample
N
n
Variance
• Population Variance:
• Sample Variance:
Sample
Mean
&
Variance…
Sample Mean
Sample Variance
Sample Variance (shortcut method)
Standard Deviation…
• The standard deviation is simply the
square root of the variance, thus:
• Population standard deviation:
• Sample standard deviation:
Excel Computations from Previous Data
• Data:
100
98
98
98
97
97
97
95
95
95
94
94
94
94
94
93
93
92
92
92
92
92
91
91
91
90
90
90
90
90
90
90
90
89
89
89
89
89
88
87
87
87
87
86
86
86
86
85
85
85
85
84
84
84
84
83
83
83
83
82
82
82
82
82
81
81
81
81
81
81
80
80
80
80
80
79
79
79
79
79
79
79
79
79
78
78
77
77
76
76
76
76
75
75
75
74
74
74
74
74
74
74
73
73
73
73
73
73
72
72
72
71
71
71
71
70
70
69
69
69
69
68
68
68
68
67
67
67
67
66
66
65
65
65
65
65
64
63
63
63
63
63
63
63
63
61
61
61
61
61
61
60
60
60
60
59
59
59
58
57
57
57
57
57
57
57
57
56
56
56
55
55
54
54
54
54
53
53
53
53
53
53
52
52
52
52
52
52
51
51
51
51
51
51
50
50
50
50
50
50
Excel Computations from Previous Data
• Formulas:
Mean =
Median =
Mode =
Variance =
Std. Dev. =
=AVERAGE(A1:J20)
=MEDIAN(A1:J20)
=MODE(A1:J20)
=VAR(A1:J20)
=STDEV(A1:J20)
[Excel will show only one mode, if you have more than one mode]
• Results:
Mean =
Median =
Mode =
Variance =
Std. Dev. =
73.11
74
79
200.62
14.16
• Work Problem 2.5.7
Related documents