Download Document

Document related concepts
no text concepts found
Transcript
Dr.S.Nishan Silva
(MBBS)
My weight
day
weight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
140
140.1
139.8
140.6
140
139.8
139.6
140
140.8
139.7
140.2
141.7
141.9
141.4
142.3
142.3
141.9
142.1
142.5
142.3
142.1
142.5
143.5
143
143.2
143
143.4
143.5
142.7
143.7
day
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
weight
day
143.9
144
142.5
142.9
142.8
143.9
144
144.8
143.9
144.5
143.9
144
144.2
143.8
143.5
143.8
143.2
143.5
143.6
143.4
143.9
143.6
144
143.8
143.6
143.8
144
144.2
144
143.9
weight
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
Plot as a function of time data was acquired:
144
144.2
144.5
144.2
143.9
144.2
144.5
144.3
144.2
144.9
144
143.8
144
143.8
144
144.5
143.7
143.9
144
144.2
144
144.4
143.8
144.1
day
Comments:
background is white (less ink);
Font size is larger than Excel
default (use 14 or 16)
146
145
144
weight (lbs)
weight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
140
140.1
139.8
140.6
140
139.8
139.6
140
140.8
139.7
140.2
141.7
141.9
141.4
142.3
142.3
141.9
142.1
142.5
142.3
142.1
142.5
143.5
143
143.2
143
143.4
143.5
142.7
143.7
day
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
weight
day
143.9
144
142.5
142.9
142.8
143.9
144
144.8
143.9
144.5
143.9
144
144.2
143.8
143.5
143.8
143.2
143.5
143.6
143.4
143.9
143.6
144
143.8
143.6
143.8
144
144.2
144
143.9
143
142
Do not use curved lines to connect data
points
– that assumes you know more about the
relationship of the data than you really do
141
140
139
0
10
20
30
Day
40
50
60
weight
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
144
144.2
144.5
144.2
143.9
144.2
144.5
144.3
144.2
144.9
144
143.8
144
143.8
144
144.5
143.7
143.9
144
144.2
144
144.4
143.8
144.1
Assume my weight is a single, random, set of similar data
25 Make a frequency chart (histogram) of the data
146
145
# of Observations
144
weight (lbs)
20
143
142
141
15
140
139
0
10
20
30
40
50
60
Day
10
5
0
Weight (lbs)
Create a “model” of my weight and determine average
Weight and how consistent my weight is
25
average
143.11
# of Observations
20
15
10
Inflection pt
s = 1.4 lbs
5
0
Weight (lbs)
s = standard deviation
= measure of the consistency, or similarity, of weights
0.45
0.4
0.35
Amplitude
Width is measured
At inflection point =
s
0.3
0.25
0.2
W1/2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
2
3
4
s
Triangulated peak: Base width is 2s < W < 4s
5
0.45
0.4
Pp = peak to peak – or
– largest separation of
measurements
0.35
+/- 1s Area = 68.3%
Amplitude
0.3
pp ~ 6s
0.25
0.2
0.15
0.1
Area +/- 2s = 95.4%
0.05
0
-5
-4
-3
-2
Area +/- 3s = 99.74 %
-1
0
1
2
3
4
5
s
Peak to peak is sometimes
Easier to “see” on the data vs time plot
pp ~ 6s
(Calculated s= 1.4)
146
144.9
145
Peak to
peak
143
25
142
20
# of Observations
weight (lbs)
144
141
15
10
5
140
139.5
0
Weight (lbs)
139
0
10
20
30
Day
s~ pp/6 = (144.9-139.5)/6~0.9
40
50
60
Inferential Statistics
Used to determine the likelihood that a
conclusion based on data from a sample is
true
Terms
p value: the probability that an observed
difference could have occurred by chance
Standardised Normal
distribution
• Formula
Z = X- µ
ó
Z – SND
X – variable
µ Mean and ó varience
SND table of values
Regression and Correlation
• Correlation
– To analyze the relationship between two
variables
• Regression
– Dependant of the variable x on variable y
– In this course we consider only two
- In real life, multiple variable interactions are
possible.
Example : X = Height, Y = Body weight
Basic Linear regression Equation
• Equation: Y` = a + bx
– b is the gradient, slope or regression
coefficient
– a is the intercept of the line at Y axis or
regression constant
– Y` is a value for the outcome
– x is a value for the predictor (real x valye)
Correlation Coefficient
• Page 100 lower down
Correlation coefficient ranges from 0 to 1
Correlation coefficient ranges from 0 to 1
Finding the significance of “r”
• Simple correlation significance
– http://www.biology.ed.ac.uk/archive/jdeacon/s
tatistics/table6.html#Correlation coefficient
• Pierson Product-moment coefficient
– http://www.experimentresources.com/pearson-product-momentcorrelation.html
• Refferences
– Best http://www.biology.ed.ac.uk/archive/jdeacon/s
tatistics/tress11.html
– In detail
http://www.statsdirect.com/help/regression_and_corr
elation/rcr.htm
Inferential Statistics – Page 102
• Sample statistics – “Generalized” to the
entire population
• Formulate hypothesis
• ? Null Hypothesis
• Prove hypothesis
Types of Errors
Truth
No
difference
Conclusion
TYPE II
ERROR ()
No
difference
Difference
Difference
TYPE I
ERROR ()
Power = 1-
(100% - The probability of a type 2 error)
confidence interval:
The range of values we can be reasonably
certain includes the true value.
If the “probability” of the true value not
being included is less than 5% we
reject the null hypothesis
Example
The Use of the Null Hypothesis
• Is the difference in two sample populations
due to chance or a real statistical
difference?
• The null hypothesis assumes that there
will be no “difference” or no “change” or no
“effect” of the experimental treatment.
• If treatment A is no better than treatment B
then the null hypothesis is supported.
• If there is a significant difference between
A and B then the null hypothesis is
rejected...
Parametric tests
• T test Page 104
T Table
T-test
• T-test determines the probability that the
null hypothesis concerning the means of
two small samples is correct
• The probability that two samples are
representative of a single population
(supporting null hypothesis) OR two
different populations (rejecting null
hypothesis)
Use t-test to determine whether or not sample population A and B came
from the same or different population
t = x1-x2 / sx1-sx2
x1 (bar x) = mean of A ; x2 (bar x) = mean of B
sx1 = std error of A; sx2 = std error of B
Example:
Sample A mean =8
Sample B mean =12
Std error of difference of populations =1
12-8/1 = 4 std deviation units
Non Parametric test
• Chi Squared test – Page 108
– Test for Goodness of fit
– Test of independence
Chi square
• Used with discrete values
• Phenotypes, choice chambers, etc.
• Not used with continuous variables (like
height… use t-test for samples less than
30 and z-test for samples greater than 30)
• O= observed values
• E= expected values
http://course1.winona.edu/sberg/Equation/chi-squ2.gif
Interpreting a chi square
•
•
•
•
Calculate degrees of freedom
# of events, trials, phenotypes -1
Example 2 phenotypes-1 =1
Generally use the column labeled 0.05 (which
means there is a 95% chance that any
difference between what you expected and what
you observed is within accepted random
chance.
• Any value calculated that is larger means you
reject your null hypothesis and there is a
difference between observed and expect values.
How to use a chi square chart
http://faculty.southwest.tn.edu/jiwilliams/probab2.gif
T-test or Chi Square? Testing the
validity of the null hypothesis
• Use the T-test (also called Student’s Ttest) if using continuous variables from a
normally distributed sample populations
(ex. Height)
• Use the Chi Square (X2) if using discrete
variables (if you are evaluating the
differences between experimental data
and expected or hypothetical data)…
Example: genetics experiments, expected
distribution of organisms.
Qualitative Analysis – Pages
113-114
• Phenomenology
– Data collected using interviews, tapes etc
– Analyzed as the researcher prefers
– Describes using descriptive statistics
• Ethnography
– Data collected using note taking, observation etc
– Categorised
– Relationships between patterns, identified
• Concurrent Analysis
– Qualitative data is transformed to numerical data
– Qualitative value may be lost
Using Excel
(Example)
Microsoft Excel
•
•
•
•
A Spreadsheet Application. It features calculation, graphing tools, pivot
tables and a macro programming language called VBA (Visual Basic for
Applications).
There are many versions of MS-Excel. Excel XP, Excel 2003, Excel
2007 are capable of performing a number of statistical analyses.
Starting MS Excel: Double click on the Microsoft Excel icon on the
desktop or Click on Start --> Programs --> Microsoft Excel.
Worksheet: Consists of a multiple grid of cells with numbered rows
down the page and alphabetically-tilted columns across the page. Each
cell is referenced by its coordinates. For example, A3 is used to refer to
the cell in column A and row 3. B10:B20 is used to refer to the range of
cells in column B and rows 10 through 20.
Microsoft Excel
Opening a document: File  Open (From a existing workbook). Change the
directory area or drive to look for file in other locations.
Creating a new workbook: FileNewBlank Document
Saving a File: FileSave
Selecting more than one cell: Click on a cell e.g. A1), then hold the Shift key and
click on another (e.g. D4) to select cells between and A1 and D4 or Click on a cell
and drag the mouse across the desired range.
Creating Formulas: 1. Click the cell that you want to enter the formula, 2. Type =
fx
(an equal sign), 3. Click the Function Button,
4. Select the formula you want
and step through the on-screen instructions.
Microsoft Excel
• Entering Date and Time: Dates are stored as MM/DD/YYYY. No
need to enter in that format. For example, Excel will recognize jan
9 or jan-9 as 1/9/2007 and jan 9, 1999 as 1/9/1999. To enter
today’s date, press Ctrl and ; together. Use a or p to indicate am
or pm. For example, 8:30 p is interpreted as 8:30 pm. To enter
current time, press Ctrl and : together.
• Copy and Paste all cells in a Sheet: Ctrl+A for selecting, Ctrl +C
for copying and Ctrl+V for Pasting.
• Sorting: Data  Sort Sort By …
• Descriptive Statistics and other Statistical methods:
ToolsData Analysis Statistical method. If Data Analysis is not
available then click on Tools Add-Ins and then select Analysis
ToolPack and Analysis toolPack-Vba
Histograms in Excel
1
Select
Tools/Data Analysis
Histograms in Excel
(continued)
2
Choose Histogram
(
Input data range and bin range
3
(bin range is a cell range
containing the upper class
boundaries for each class
grouping)
Select Chart Output
and click “OK”
Microsoft Excel
Statistical and Mathematical Function: Start with ‘=‘ sign and then select
function from function wizard f x .
Inserting a Chart: Click on Chart Wizard (or InsertChart), select chart, give, Input
data range, Update the Chart options, and Select output range/ Worksheet.
Importing Data in Excel: File open FileType Click on File Choose Option (
Delimited/Fixed Width) Choose Options (Tab/ Semicolon/ Comma/ Space/ Other)
 Finish.
Limitations: Excel uses algorithms that are vulnerable to rounding and truncation
errors and may produce inaccurate results in extreme
cases.
Computing the Mean
• Sum xi divide by n (or N for population
mean)
• Excel
– =AVERAGE(cellrange)
Computing the Mode
• Value that occurs most often in discretized
data
• Excel
– =MODE(cellrange)
– Reports first value seen if tie
Computing the Median
• The middle value in sorted data
• Excel
– =MEDIAN(cellrange)
Computing the Range
• Range is min to max values
• Excel
– =MIN(cellrange)
– =MAX(cellrange)
Computing the Standard
Deviation
• Std. Dev. is Square-Root of Variance
• Excel
– =STDEV(cellrange) - sample
– =STDEVP(cellrange) - population
– =VAR(cellrange) - sample
– =VARP(cellrange) - population
Tables and Charts for
Categorical Data: Univariate
Data
Categorical
Data
Graphing Data
Tabulating Data
Summary
Table
Bar Charts
Pie Charts
Pareto
Diagram
The Summary Table
Summarize data by category
Example: Current Investment Portfolio
Investment
Type
(Variables are
Categorical)
Amount
Percentage
(in thousands $)
(%)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Bar and Pie Charts
• Bar charts and Pie charts are often
used for qualitative (category) data
• Height of bar or size of pie slice
shows the frequency or percentage
for each category
Bar Chart Example
Current Investment Portfolio
Investment
Type
Amount
Percentage
(in thousands $)
(%)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Investor's Portfolio
Savings
CD
Bonds
Stocks
0
10
20
30
Amount in $1000's
40
50
Pie Chart Example
Investment
Type
Amount
Percentage
(in thousands $)
(%)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Current Investment Portfolio
Savings
15%
Stocks
42%
CD
14%
Bonds
29%
Percentages
are rounded to
the nearest
percent
Pareto Diagram Example
45%
100%
40%
90%
80%
35%
70%
30%
60%
25%
50%
20%
40%
15%
30%
10%
20%
5%
10%
0%
0%
Stocks
Bonds
Savings
CD
cumulative % invested
(line graph)
% invested in each category
(bar graph)
Current Investment Portfolio
Tabulating and Graphing
Multivariate Categorical Data
(continued)
• Side by side bar charts
C o m p arin g In vesto rs
S avings
CD
B onds
S toc k s
0
10
Inves tor A
20
30
Inves tor B
40
50
Inves tor C
60
Side-by-Side Chart Example
• Sales by quarter for three sales territories:
East
West
North
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
20.4
27.4
59
20.4
30.6
38.6
34.6
31.6
45.9
46.9
45
43.9
60
50
40
East
West
North
30
20
10
0
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
http://www.bmj.com/bmjseries/statistics-notes
Best source for you…
BMJ Statistics notes…