Download Overview – Courses - STT

Document related concepts
no text concepts found
Transcript
Laboratory Statistics &
Graphics with EXCEL®
Tutorial book
Dietmar Stöckl
[email protected]
With thanks to
• Linda M Thienpont
[email protected]
• Kristian Linnet, MD, PhD
[email protected]
• Per Hyltoft Petersen, MSc
[email protected]
• Sverre Sandberg, MD, PhD
[email protected]
STT Consulting
Dietmar Stöckl, PhD
Abraham Hansstraat 11
B-9667 Horebeke, Belgium
e-mail: [email protected]
Statistics & graphics for the laboratory
2
Content
Content overview
How to use this book
How to use the EXCEL-files
Univariate Statistics
• Data, data presentation, and data description
• Gaussian (or Normal) distribution
• Tests for normality & calculations with logarithms
• Sampling statistics: Confidence intervals
• Estimation and hypothesis testing (F-test, Chi2-test, t-tests, outliers)
• Analysis of variance (ANOVA)
• Statistical power concept and sample size
Bivariate Statistics
• Graphical techniques
• Combined graphical/statistical techniques
• Correlation
• Regression
Annex
Statistics & graphics for the laboratory
3
Content
Content overview
EXCEL-files
General
• DataGeneration
• StatFunctions&Tables
ANOVA
• Cochran&Bartlett
• ANOVA
Data, -presentation & -description
• Datasets
• Data&DataPresentation
• Graphs-EXCEL
• ProbPlots
Power & sample size
• Power
Gaussian distribution
• Exercises-BasicStats
• Gaussian Distribution
• NormalRankitPlot
Graphical techniques
• GraphBivariate-EXCEL
Sampling statistics
• SamplingStatistics
• CI-Calculator
Combined graphical/statistical
techniques
• Bland&Altman
Estimation & hypothesis testing
and confidence intervals
Correlation & regression
• CI&NHST-EXCEL
• CI&NHST
• CI&NHST-Exercise
• Grubbs: (http://www.graphpad.com/
articles/outlier.htm)
• Correlation&Regression
• CorrRegr-EXCEL
Statistics & graphics for the laboratory
4
Content
Detailed content
Univariate statistics
Data
• Types of data & types of statistics
• Exemplary laboratory data
– "Repeated weighing"-experiment
– Adult serum triacylglycerides
Data presentation (univariate data)
• Importance of digits
• Table
• Graphics with EXCEL®
– Dot-plot
– Histogram
– Frequency polygon
– Dynamic histogram
– Box and whisker plot
• Time-indexed plots
Data description
• Descriptive statistics
– Location
Mean, median, mode
– Dispersion
Range, variance, standard deviation, coefficient of variation
• Equations
• Descriptive statistics with EXCEL®
• Importance of digits
Gaussian (or Normal) distribution
• "Bell-shaped" (similar to a histogram)
• Cumulated: "S-shaped"
• Cumulated & linearized
• 2-sided and 1-sided probabilities
• Inside/outside probabilities
• Probabilities at selected s (z) values
• Deviation from normality (skewness & kurtosis)
Tests for normality
Calculations with logarithms
Statistics & graphics for the laboratory
5
Content
Detailed content
Sampling statistics: Confidence intervals
• t-distribution (distribution of means)
– Confidence interval of a mean
– Confidence interval of the "1.96 s-limit"
• Chi2(2)-distribution (distribution of variances)
– Confidence interval of a standard deviation
• Interpretation of confidence limits
Estimation and hypothesis testing (F-test, Chi2-test, t-tests)
• Introduction
• t-tests
• Outlier tests (k • SD, Grubbs, Dixon's Q)
• F-test, 2 (=Chi2)-test
• Tests and confidence limits
Analysis of Variance (ANOVA)
• Introduction
• Model I ANOVA
Performance strategy
– Testing of outliers
– Testing of variances (Cochran "C", Bartlett)
• Model II ANOVA
– Applications
Power and sample size
Bivariate Statistics
Graphical techniques
• Scatter-plot
• Difference-plot
• Residual-plot
• Krouwer-plot
• Influences on the plots (data-range; subgroups; outliers; scaling)
• Influences of random- and systematic errors on the plots
• Linearity
• Specifications in plots
Statistics & graphics for the laboratory
6
Content
Detailed content
Bivariate statistics (ctd.)
Combined graphical/statistical techniques
• The Bland & Altman approach
Correlation
• The statistical model
• Correlation in method comparison
• Non-parametric correlation
Regression
• Ordinary linear regression (OLR)
• Deming regression
• Passing-Bablok regression (non-parametric)
• Weighted regression
• Regression & method comparison
• Regression & calibration
Annex
Statistics with EXCEL®
• EXCEL® installation requirements
• Tips for EXCEL®-graphics
Statistical resources
• Web resources
– Glossary of statistical terms
– Interesting educational resources
• Statistical software
– General
– "Laboratory statistics"
• Books
Statistical tables
Presenter's publications & courses related to the topic
Statistics & graphics for the laboratory
7
Overview
Basics
Data
Quantitative
Categorical
[Importance] Digits
Statistics
Exploratory Data Analysis
Parametric
Non-parametric
Bayesian
Data presentation
(univariate)
Table
Dot-plot
Histogram/Frequency polygon
Frequency cumulated
Normal probability plot (Rankit)
Krouwer (mountain) plot
Box & whisker plot
Time-indexed plot
Data presentation
(bivariate)
2 x 2 Table
Scatter plot
Residual plot
Bias plot
Bland & Altman plot/approach
Data transformation
Logarithms
Other
Variance pooling
Variance propagation
Total error calculations
Statistics & graphics for the laboratory
8
Overview
Descriptive
statistics
Location
Mean
Median
Mode
Dispersion
Minimum/Maxiumum
Range; quartile/quantile
Variance
Standard deviation (SD, or s)
Coefficient of variation (CV)
z-value
Gaussian
distribution
Graphics
>Data presentation (univariate)
Probabilities
2-sided
1-sided
Inside
Outside
At selected z-values
Deviations
Skewness
Kurtosis
Sampling
statistics
Confidence intervals
Central limit theorem
t-distribution
Conf. interval of a mean
Conf. interval of a centile (1.96)
Chi-square (2) distribution
Conf. interval of a SD
Statistics & graphics for the laboratory
9
Overview
Significance
testing
Means (n>2: ANOVA)
Non-parametric
1-sample t-test
Wilcoxon signed rank
t-test
Non-parametric
Mann-Whitney U
Paired t-test
Non-parametric
SDs
Wilcoxon signed rank
1-sample F-test (Chi-square~)
F-test
Distribution
Chi-square
Non-parametric
Kolmogorov Smirnov
Anderson Darling
Outlier
Grubbs test
Dixon’s Q-test
Variances (n > 2)
Cochran "C"
Bartlett
Power
Sample size calculation
ANOVA
One-way
Model I: Significance testing
(means n > 2)
Model II: Variance estimation
Non-parametric
Kruskal-Wallis
Model I versus Model II
Correlation (r)
Pearson
Non-parametric
Regression
Spearman, Kendall
Ordinary linear regression
Deming regression
Non-parametric
Passing Bablok regression
Weighted regression forms
Statistics & graphics for the laboratory
10
Overview
General considerations to approach data
Frequent statistical questions
• Which kind of data?
• Which kind of distribution?
• Is there a difference?
• Was there a change?
• Is there an association?
• What is the probability?
Kind of data
Which kind of data
• Quantitative
–Measured
–Counted
• Categorical
–Ordinal
–Nominal
Appropriate statistic
Approach for quantitative, measured data
Data collection/
Kind of experiment
Which question
Plot data
• Retrospective
• Experience
• Statistically planned
• Sufficient digits
• Sample size calculations
• Description
• Difference
• Change
• Association
• Prediction
• Sample size
• Selection of plot
• Selection of test
• Selection of probability
• Outliers
• Distribution
(n > 20)
• Parametric direct or
–Remove outliers
–Transform data
• Non-parametric
–[Remove outliers]
Statistics & graphics for the laboratory
11
Overview
Summary of significance tests
Problem
Parametric
Non-parametric
Graphic
Outlier
Grubbs
Dixon’s Q
Distribution
CHI2
Anderson-Darling
(recommended)
Kolmogorov-Smirnov
Normal probability plot
(=Rankit-plot)
Mean$ vs target
t-test, 1-sample
Wilcoxon signed rank
Confidence interval (CI)
2 Means$
t-test (equal &
unequal variances):
perform Ftest before
Mann-Whitney U
CI
Paired means$
(Change)
Paired t-test
Wilcoxon signed rank
CI
SD:VAR vs target
CHI2
2 SDs/VAR
F-test
Siegel-Tukey
>2 means$
ANOVA
Kruskal-Wallis
>2 variances
Cochran’s C
Bartlett
Association
Pearson Correlation
Spearman or Kendall
Prediction
Regression
Passing-Bablok
regression
Dot-plot
CI
CI
Rankit-plot with CHI2function
$: or median; SD = standard deviation; VAR = variance; vs = versus
Summary of graphics
Univariate data
Bivariate data
Dot-plot
Scatter plot
Histogram/Frequency polygon
Difference (bias) plots
Cumulated frequency plot
Residual plot
Krouwer plot (folded cum. frequency)
[Contingency tables]
Normal probability (Rankit) plot
Box & Whisker plot
Run-sequence plot (Control charts)
Lag-plot
Statistics & graphics for the laboratory
12
Overview
Selected analytical problems and associated statistics
Analytical problem
Associated statistics$
Method evaluation/validation (in-house)
General
Basic statistics
Outlier tests (e.g., Grubbs)
Imprecision
F-test; CHI2-test (#), ANOVA
Limit of detection
Probability & Power
Linearity
Regression, ANOVA
Calibration
Regression & correlation
Sample trueness/bias (recovery)
t-tests (#)
Accuracy (uncertainty) of result
Variance propagation
Method comparison
Regression & correlation
Trouble-shooting
Power (sample size calculations)
$Pure measurement variation, usually, is assumed Gaussian
#Alternative: confidence intervals
Collaborative trials (n >2 laboratories)
Imprecision
Cochran C; Bartlett
Bias
ANOVA, Model I
Estimation of variance
ANOVA, Model II
Interpretation of analytical results
Depends on the problem
Various (see above)
Tests for distribution
Data transformation
Bayesian statistics
Statistics & graphics for the laboratory
13
How to use this book & the EXCEL-files
How to use this book
This book is an introductory text to basic statistical and graphical techniques used
in the analytical laboratory.
It is accompanied by EXCEL-files that should facilitate self-education by
-demonstrating the statistical & graphical possibilities of EXCEL
-explaining statistical concepts with dynamic worksheets
-providing examples for creating user-specific templates
The use of the EXCEL-files is indicated by the following icons:
The general layout is shown in the figure below.
Statistics & graphics for the laboratory
14
How to use this book & the EXCEL-files
How to use the EXCEL-files
EXCEL-Settings
The files have been tested with EXCEL 2000 & EXCEL XP.
Activate the AddIns: Analysis ToolPak & Analysis ToolPak -VBA.
Macro security: Medium or low.
When opening the files choose "Enable Macros"
The nicest view is in the "Full Screen" mode
Notes: Make a back-up with a different name; do not save changes.
Features of the EXCEL-files
-Easy "click-through" navigation between the worksheets
-Information icon: gives information about the intention of the file
-Note icon: draws attention to particular EXCEL or other issues
-Exercise icon: gives instructions for interactive worksheets; additionally, the
worksheets give detailed information of how to perform certain exercises.
-Comment-cell: contains important information to specific topics
-Many files contain dynamic elements for user interaction
Statistics & graphics for the laboratory
15
How to use this book & the EXCEL-files
How to use the EXCEL-files
CAVE
Please close other applications.
During extensive use with Windows 98, it may be necessary to delete
Windows>Temp files every now and then (otherwise, EXCEL may shut
down).
The EXCEL-files will guide the user through the statistical functions of EXCEL that
are available through the fx-icon and the "Data Analysis" AddIn. A summary of the
statistical functions of EXCEL is given in the file EXCEL-StatFunctions.
The file StatTables-EXCEL contains statistical tables that are created with the
EXCEL-functions
• NORMSINV (z-table)
• FINV (F-table)
• TINV (t-table)
• CHIINV (Chi2-table)
The EXCEL-files are of
• tutorial nature (explaining the statistical concepts)
• practical nature (templates for use)
Legal notice
The EXCEL-files are for educative purpose. They should not be regarded as
commercial software. They have been prepared with utmost care but it
cannot be excluded that they may contain an error. The author is not liable
for errors.
Statistics & graphics for the laboratory
16
Data, data presentation & data description
Data
• Types of data & types of statistics
• Exemplary laboratory data
– "Repeated weighing"-experiment
– Adult serum triacylglycerides
Data presentation (univariate data)
• Importance of digits
• Table
• Graphics with EXCEL®
– Dot-plot
– Histogram
– Frequency polygon
– Dynamic histogram
– Box and whisker plot
• Time-indexed plots
Datasets; Data&DataPresentation
Statistics & graphics for the laboratory
17
Data, data presentation & data description
Types of data & types of statistics
Types of data
To correctly apply statistical techniques, we first have to understand the type of
data we are dealing with (see Table below).
QUANTITATIVE ("numerical")
Measured ("continuous")$
• Blood pressure
• Height
• Weight
CATEGORICAL
Ordinal ("ranked")
(ordered categories,
usually based on a measure)
• Grade of cancer
• Better, same, worse
• Disagree, neutral, agree
Counted ("discrete")
Number of
• … childrens in a family
• … cases of aids in a city
Nominal
(unordered categories)
• Sex (male/female)
• Alive/Dead
• Blood group
$Maybe converted to nominal by "cutoffs" (normotension; hypertension)
Statistics at square one (10th ed). Swinscow, Campbell. BMJ Books, 2002
Types of statistics
When we know which type of data we are dealing with, we still have to know (or
make assumptions) about the probability distribution of the data to apply the
correct type of statistics (>Parametric-/>Non-parametric statistics; >Bayesian
statistics). Identification of the type of distribution can be done with graphical
techniques (>Exploratory Data Analysis) and formal statistical testing.
Parametric statistics
Parametric methods for statistical hypothesis testing assume that the distributions
of the variables being assessed have certain characteristics (usually, a "normal"
distribution is assumed).
The basic assumption of normality of distributions relies on the assumption of
many independent additive factors as responsible for a dispersion.
Parametric techniques usually involve squared measures, e.g. the standard
deviation is computed from sums of squared deviations from the mean.
The basis of squaring is properties of the normal distribution that renders
squaring the optimal (most effective) estimation technique.
With regard to real distributions, the squaring principle makes parametric
approaches sensitive towards the presence of outliers.
Testing for outliers should always be considered in parametric testing.
Statistics & graphics for the laboratory
18
Data, data presentation & data description
Types of data & types of statistics
Non-parametric statistics
Non-parametric (or distribution-free) methods for statistical hypothesis testing
make no assumptions about the frequency distributions of the variables being
assessed.
Bayesian statistics
Statistics which incorporate prior knowledge and accumulated experience into
probability calculations.
Statistics that uses subjective probability as a starting point for assessing a
subsequent probability.
Exploratory Data Analysis
Exploratory data analysis is a term used to describe a group of techniques (largely
graphical in nature) that sheds light on the structure of the data.
Without this knowledge the scientist, or anyone else, cannot be sure they are
using the correct form of statistical evaluation.
Before applying statistics, data should be plotted.
Data types & typical statistics
Often, certain types of data are related to certain types of statistics. Some of the
most common cases are presented below.
Quantitative continuous data (parametric)
• Descriptive statistics and confidence interval of a mean.
• Confidence interval of a standard deviation.
• Grubbs' test to detect an outlier.
• t-test to compare two means.
• Analysis of variance (ANOVA).
Ranked data (non parametric)
• Mann-Whitney U
• Kruskal-Wallis one-way ANOVA
• Wilcoxon signed ranks
• Sign test
• Kolmogorov Smirnov
Categorical data
• Chi-square (compare observed and expected frequencies).
• Binomial and sign test (compare observed and expected proportions).
• Fisher's and chi-square (analyze a 2x2 contingency table).
• Predictive values from sensitivity, specificity, and prevalence.
Statistics & graphics for the laboratory
19
Data, data presentation & data description
Exemplary laboratory data
In the laboratory, we mainly deal with measured data. The distribution of these
data can be dominated by the laboratory manipulation itself (e.g., pipetting,
repeated measurement of the same sample) or by the analyte (e.g., biological
variation). In the first part of the course, 2 data-sets will be given that represent
these 2 cases (weighing; biological variation of serum triacylglycerides).
Data-set 1 (weighing)
Data-set 2
Gravimetric control of a pipetted volume
Serum triacylglycerides
in adult males
The experiment
• Pipet: 20-200µl variabel
• Pipetted nominal volume: 100µl
• 21 Pipettings (n = 21)
• Balance: Readability: 0,01 mg
Other data-sets
Other data-sets can be found in the file "Datasets". It will be made use of at other
places in the book.
Creation of data-sets
The file "DataGeneration" explains how to generate
• Random, univariate data
• Bivariate data with constant SD
• Bivariate data with constant CV
• log-Normal distributed data
It should be used after the book has been worked-through.
Statistics & graphics for the laboratory
20
Data, data presentation & data description
Importance of digits
It is important to know that the quality of our data may depend on the number of
reported digits.
Adapting digits with EXCEL®
1st option
• The decrease/increase decimals icon
Decreases the decimals visually,
but still uses them for calculations
.0
.00
2nd option
• Tools>Options>Calculation>Precision as displayed:
• Then decrease decimals
The data are rounded and the decimals are lost!
Afterwards, deactivate field again!
Data&DataPresentation (Worksheet "Digits")
Weight (mg)
99.92
100.23
99.50
…
100.39
100.23
99.25
100.28
100.25
…
99.83
100.05
100.22
Weighing
1
2
3
…
10
11
12
13
14
…
19
20
21
Weight (mg)
100
100
100
…
100
100
99
100
100
…
100
100
100
25
20
15
10
5
0
99,00
99,25
99,50
99,75
100,00
100,25
100,50
100,75
101,00
Weighing
1
2
3
…
10
11
12
13
14
…
19
20
21
Frequency
We round the data of the weighing experiment:
Follow the instructions given in the EXCEL-sheet "Digits"
1. Tools>Options>Calculation>Precision as displayed
2. Select gray cells, reduce to 2 fewer digits
Afterwards, UNCHECK IT!
The rounded data don't reflect the spread of the original data anymore
• Report your data with sufficient digits, adapted to measurement precision!
Statistics & graphics for the laboratory
Bin
21
Data, data presentation & data description
Data presentation
When we have created data, it is important to present them in easily
comprehensible forms (>Tables; >Graphs). We use the weighing data for
exercise.
Table (weighing experiment)
• Try to describe the data (center, maximum, minimum, etc).
We note:
Tables are difficult to "read"!
Sorting may help, but keep the sample number & the result together!
Sorted!
Weighing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Weight (mg)
99.92
100.23
99.50
100.27
100.22
100.01
100.18
100.04
99.60
100.39
100.23
99.25
100.28
100.25
100.44
99.50
100.04
100.13
99.83
100.05
100.22
Weighing
12
3
16
9
19
1
6
8
17
20
18
7
5
21
2
11
14
4
13
10
15
Weight (mg)
99.25
99.50
99.50
99.60
99.83
99.92
100.01
100.04
100.04
100.05
100.13
100.18
100.22
100.22
100.23
100.23
100.25
100.27
100.28
100.39
100.44
Remark
"Single column data" (such as the weighing data), are also called univariate data.
Data&DataPresentation (Worksheet "Dataset")
Sorting by weight
1. Select gray cells
2. Data>Sort: Follow "Print Screen"
We have seen that tables "are difficult to read".
 Try a picture (graph)
Statistics & graphics for the laboratory
22
Data, data presentation & data description
Graphs (Exploratory Data Analysis – "EDA")
Graphs are particularly useful for presenting data in summarized form and for
"shedding light" onto the structure of the data.
Most useful for univariate data are
>Dot-plots,
>Histograms,
>Box-and-whisker plots, and the
>"Normal probability" plots.
First, these types of plots will be described, followed by the EXCEL-exercises.
Plots for univariate data
Note: can also be "derived data" from bivariate data (e.g., differences)
Dot plot
The dot plot presents the distribution of a variable (Yi)
(usually in y-axis) in a category (usually x-axis).
Data point coordinates are [Category; Yi].
Equal values are usually visualized by an offset.
Use: Visual summary of data and data distribution.
The dot plot can show:
center (i.e., the location) of the data;
spread (i.e., the scale) of the data;
skewness of the data; presence of outliers;
and presence of multiple modes in the data.
Statistics & graphics for the laboratory
23
Data, data presentation & data description
8
7
6
5
4
3
2
1
0
55
65
75
85
95
105
115
125
135
Histogram (Frequency polygon#)
Histograms (the term was first used by Pearson, 1895)
present the frequency distribution of a variable
in columns drawn over class intervals (bin).
The heights of the columns are proportional
to the class frequencies.
Coordinates are [Bin center Xi; frequency Yi].
*Bin = Midpoint: 85; Range: 80 – 90; Results in range: 2.
Frequency
Plots for univariate data
Value-Bin
Use: Visual summary of data and data distribution.
The histogram can show:
center (i.e., the location) of the data;
spread (i.e., the scale) of the data;
skewness of the data; presence of outliers; and presence of multiple modes in the
data.
#Frequency polygon: the midpoints of the top of the columns are connected by a
line (columns are not shown). Coordinates example: [55;1], [65;0], [75;0], [85;2],
[95;7], etc.
Box & Whisker plot
In box plots (this term was first used by Tukey, 1970),
the central tendency (e.g., median or mean),
and the range or variation statistics (e.g., quartiles)
are computed and presented as a "box".
The whiskers outside of the box represent
a selected range (e.g., 10% & 90%; here: the full range).
Outlier data points can also be plotted.
Coordinates are [Category; Particular Y].
Use: Visual summary of data distribution.
The box and whisker plot can show:
center (i.e., the location) of the data; spread (i.e., the scale) of the data; skewness
of the data; presence of outliers. It is particularly useful for detecting and
illustrating location and variation changes between different groups of data.
Statistics & graphics for the laboratory
24
Data, data presentation & data description
Plots for univariate data
Relative cumulative probability plot
Values of a distribution are ordered and the
relative cumulative probability of all values
up to a certain value is plotted versus that value.
Coordinates are [Value i; relative cumulative
probability up to & including Value i].
Use: Visual test for Normal distribution:
comparison of data polygon line with
cumulated Gaussian line calculated with data SD.
0.5
Cumulative frequency
Krouwer plot ("folded cumulated probability")
Cumulated probability plot with y-axis "folded"
at probability P = 0.5, or 50%
(P = 0.5 or 50% is the maximum y-value).
Up to 50%, coordinates are [Value i; cumulative
relative probability up to & including Value i].
Above 50%, coordinates are [Value i; 100% minus
cumulative relative probability
up to & including Value i].
0.4
0.3
0.2
0.1
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Multiple of sigma
Use: Visual test for Normal distribution:
comparison of data polygon line
with a "folded" cumulated Gaussian line
calculated with data SD.
Similar to a histogram: visual summary of data and data distribution. The Krouwer
plot can show: center (i.e., the location) of the data; spread (i.e., the scale) of the
data; skewness of the data; presence of outliers.
Special application: Method comparison (note: concentration information is lost).
Normal probability plot
Cumulated probability plot with y-axis
normalized to the Gaussian
(or Normal) distribution.
Coordinates are [Value i; z-value at Value i].
Use: Visual test for Normal distribution:
data should fit a line.
Special application: Reference intervals.
Statistics & graphics for the laboratory
25
Data, data presentation & data description
Graphics with EXCEL®
Dot-plots
Data&DataPresentation (Worksheet "Dot Plot 1")
Construct the figure below with EXCEL
Follow the instructions on the sheet. The adaptation of the layout requires general
knowldge about "Charts". This will not be explained further. Some guidance on
"Charts" is given in the Annex of the book.
Note
The Worksheet "Dot Plot 2" contains a more advanced version of the Dot-plot. Its
construction, however, is relatively complicated and requires some deeper
understanding of EXCEL.
Statistics & graphics for the laboratory
26
Data, data presentation & data description
Histogram with EXCEL®
Data&DataPresentation (Worksheet "Histogram")
Construct a histogram with the weighing data by
• Tools>Data Analysis>Histogram
• Follow the guidance in the "Print Screen"
The unmodified EXCEL® figure looks like the one below
Disadvantages
• Layout not attractive (but can be modified)
• "Strange" data classification ("Bins")
• The "More"-bin
• "Static" = does not adapt when data change
We can modify the histogram by use of the general EXCEL-commands.
(see Annex)
Difficulty with histograms
No general rule can be given for the definition of the bin-width.
Statistics & graphics for the laboratory
27
Data, data presentation & data description
Frequency polygon with EXCEL®
Data&DataPresentation (Worksheet "FrequPolygon")
10
8
6
4
2
0
99,00
99,25
99,50
99,75
100,00
100,25
100,50
100,75
101,00
Frequency
Construct the frequency polygon from the histogram
• Copy the histogram in the FrequPolygon sheet
• [Left] Click on the histogram
• Click the chart wizard
• Choose this figure type
Bin
• Go to series
• Finalize the figure
• [when necessary, delete series 1]
Dynamic histogram
The dynamic histogram is an elegant form of presenting your data. It adapts
automatically when data are added or changed.
It uses the "array formula" Frequency in EXCEL®
You have to
• Define "Bins"
• Select all cells of the "Frequency Range"
• Type: =Frequency(Data-cell1:Data-celln;(OR: ,)Bin-cell1:Bin-celln)
– Note: ; OR , depends on the "List Separator" (Control Panel>Regional
Settings>Number)
• Press: "SHIFT" & "CONTROL", hold them, and press "ENTER"
Data&DataPresentation (Worksheet "DynHistogram")
Statistics & graphics for the laboratory
28
Data, data presentation & data description
Box and whisker plot with EXCEL®
Data&DataPresentation (Worksheet "Box Plot")
Construct the Box plot according to the instructions in the Worksheet.
(from: http://www.mis.coventry.ac.uk/~nhunt/boxplot.htm)
The construction uses the EXCEL-functions Median, Quartile, Minumum, and
Maximum. The Box-plot can be constructed by putting them in the presented
order. The Figure must be finalized with some special "Figure-commands" (see
Worksheet explanations & "Print-Screen" at the right).
Summary: graphs for univariate data
• The dot-plot is a robust graph for small and large data sets.
• The histogram is more suitable for larger data sets, however, the bin-width must
be chosen adequately.
• The box and whisker plot is particularly useful for lager data sets, however, it
already contains some claculated statistics: it is not a pure graphical method.
Graphs are important tools for the investigation of data distribution
(outliers, sort of distribution).
Statistics & graphics for the laboratory
29
Data, data presentation & data description
Time-indexed plots
Run-sequence plot
The run-sequence plot presents data (Yi) along one axis in the time sequence they
were obtained. Coordinates are [Time or event#; Yi].
Use: Presentation and investigation of time series (drift, shift, outlier).
Special application: quality control.
The figures below show 3 situations where randomness is violated (remember:
During sorting, keep the sample number & the result together)
Lag-plot (a lag is a fixed time displacement)
A lag plot checks whether a data set or time series is random or not. Random data
should not exhibit any identifiable structure in the lag plot. Non-random structure in
the lag plot indicates that the underlying data are not random. In the Lag-plot, Yi-n
(n = usually 1) is plotted on the x-axis and Yi is plotted on y-axis.
2
Underlying
data structure
Yi
1
Sinusoidal
data sequence
0
-2
0
2
-1
-2
Yi -1
Statistics & graphics for the laboratory
30
Data, data presentation & data description
Exploratory data analysis
A wealth of information about Exploratory Data Analysis can be found in
NIST/SEMATECH e-Handbook of Statistical Methods,
http://www.itl.nist.gov/div898/handbook/
The most basic set of graphics for the investigation of a data set is the so-called
"4-plot".
"4-plot"
The "4-plot" consists of a
• run sequence plot;
• lag plot;
• histogram;
• normal probability plot (see later >Normal distribution).
Investigate data for
Location
Variation
Distribution
Outliers
Statistics & graphics for the laboratory
31
Notes
Notes
Statistics & graphics for the laboratory
32
Data, data presentation & data description
Data description
Descriptive statistics
• Location
• Dispersion
Equations
Descriptive statistics with EXCEL®
Importance of digits
GaussianDistribution
Introduction
After we have plotted our data, we need to characterize them quantitatively. We
use, for that purpose, several different measures that are related to the location (or
central tendency) and the dispersion (or variability) of the data.
Note
We have to distinguish in the following between parameters and their statistical
estimates (or “statistics”).
Parameter
A parameter is a numerical quantity measuring some aspect of a population. For
example, the mean is a measure of central tendency.
Greek letters are used to designate parameters. Parameters are rarely known and
are usually estimated by statistics computed in samples. To the right of each
Greek symbol is the symbol for the associated statistic used to estimate it from a
sample.
Quantity
Parameter
Statistic
Mean
μ
M (or Xbar)
Standard deviation
σ
s
Proportion
π
p
Correlation
ρ
r
Statistics & graphics for the laboratory
33
Data, data presentation & data description
Descriptive statistics
Location
Measures for the location (or central tendency) of data are the mean (average), the
median, and the mode.
Mean
• Sum of all values divided by the number of data
Median
Uneven number of data
• Value in the center
Even number of data
• Mean of the 2 values in the center
Mode
• Value that is observed most frequently
For symmetric distributions, the mean, median,
and mode are found at the same value.
For skewed distributions, those 3
are found at different values.
Notes
In symmetric distributions, the mean is a good location measure. In skewed
distributions, the median is a better location measure than the mean. The mode is the
only location measure that can be used with nominal data.
Dispersion
Measures for the dispersion (or variability) of data are the range, quartiles/quantiles,
the variance, the standard deviation, the coefficient of variation, and the z-value.
Range
• Maximum minus minimum
Quartiles
• The lower and upper quartiles (or 0.25 and 0.75 quantiles) are the 25th and 75th
percentiles of the distribution. The 25th percentile of a variable is a value such
that 25% of the values of the variable fall below that value.
Variance
• Sum of the squared difference of the values from the mean, divided by the
number of data minus 1!
Standard deviation (SD, or s: both are used in the course)
• Square root of the variance (see also: from duplicates)
Coefficient of variation
• = relative SD in %; = 100 • [SD/mean] (%)
z-value (Normalized, or normal, standard deviate)
• z = y - µ/s, or z = xi - mean/s
Statistics & graphics for the laboratory
34
Data, data presentation & data description
Descriptive statistics
Equations
From k duplicates
Statistics & graphics for the laboratory
35
Data, data presentation & data description
Descriptive statistics with EXCEL®
GaussianDistribution (Worksheet "DescrStats")
Tools>Data Analysis>Descriptive Statistics
• Follow the guidance in the "Print Screen"
Descriptive statistics
Single formula
Weight (mg)
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level(95,0%)
100,0276
0,070017
100,13
100,23
0,320857
0,102949
0,433566
-1,09862
1,19
99,25
100,44
2100,58
21
0,146052
100,0276
0,070017
100,13
100,23
0,320857
0,102949
0,433566
-1,09862
1,19
99,25
100,44
2100,58
21
0,146052
The descriptive statistics function in EXCEL calculates all the measures we
have seen (and more: those will be explained later)
Alternatively, the measures can be calculated individually by EXCEL using
the fx icon.
Importance of digits
Rounded
Coming back to the rounded weighing
data, we look at the mean and the
SD of the original data set
and the rounded data set:
We observe that the rounded data
give different mean and SD values!
Too few digits give
erroneous statistical estimates!
Weighing
Weight (mg)
Weight (mg)
1
2
3
…
19
20
21
Mean
SD
99,92
100,23
99,50
…
99,83
100,05
100,22
100,03
0,321
100
100
100
…
100
100
100
99,95
0,218
Statistics & graphics for the laboratory
36
Data, data presentation & data description
More data
Compare the distributions when you acquire more data:
Which distribution do you recognize?
…………………………………………………………………
Statistics & graphics for the laboratory
37
Gaussian distribution
Gaussian (or Normal) distribution
• "Bell-shaped" (similar to a histogram)
• Cumulated: "S-shaped"
• Cumulated & linearized
• 2-sided and 1-sided probabilities
• Inside/outside probabilities
• Probabilities at selected s (z) values
• Deviation from normality (skewness & kurtosis)
Datasets; GaussianDistribution; NormalRankitPlot
Statistics & graphics for the laboratory
38
Gaussian distribution
The Gaussian (or "Normal") distribution
The Gaussian distribution is of utmost importance in analytical chemistry.
The statistics involved with that distribution are called "parametric" statistics.
If we would repeat the weighing -times, we expect a distribution as represented
by the line.
The normal distribution is defined by its mean and standard deviation.
The standard normal distribution has a mean of 0 and a standard deviation of 1.
IMPORTANT NOTE:
For the "infinite" distributions, specific symbols (= "Parameters" ) are used:
• Mean = µ
• Standard deviation = s
• Normalized (or normal) standard deviate = z
GaussianDistribution (Worksheet "Random")
EXCEL has a function that can simulate Gaussian distributed data. The function
can be accessed with TOOLS>Data Analysis>Random number generation.
The worksheet "Random" explains
-how to generate random numbers
-presents the result in a dynamic histogram
-allows the comparison between "requested" mean and SD with the simulated
"sample mean and SD". Please note that those may be different, in particular,
when the sample size is low.
Statistics & graphics for the laboratory
39
Gaussian distribution
Graphical presentation of the Gaussian distribution
The Gaussian distribution can be presented
• In the normal way: "Bell-shaped" (similar to a histogram)
• Cumulated: "S-shaped"
• Cumulated & linearized = Normal probability plot
EXCEL® template from P Hyltoft Petersen
(note: not available in EXCEL ® itself)
GaussianDistribution (Worksheets "GaussBell"; "GaussCumul")
These worksheets use the EXCEL NORMDIST function.
The "Print Screens" guide you through their application.
The graphs will appear automatically.
Note: The Normal Probability Plot will be demonstrated later.
Statistics & graphics for the laboratory
40
Gaussian distribution
Gaussian distributions – Probabilities
IMPORTANT NOTE
When data are Gaussian distributed, we can predict the frequencies (or
probabilities) of their occurrence within or outside certain distances (s, or z-values)
from the mean (see also Figures above).
These probabilities are used in parametric statistical calculations. They are listed
in tables, but they also can be calculated with EXCEL®. Of particular importance
are probabilities that are used in statistical tests (95%, 99% probabilities).
2-sided and 1-sided probabilities
Statistics distinguish probabilities in
2-sided
&
1-sided
• 2-sided probabilities: question is A different from B?
• 1-sided probabilities: question(s) is A > B (A < B)?
Of practical importance are probabilities
"Inside"
&
"Outside"
• Outside probabilities, for example, are important in internal quality control.
Statistics & graphics for the laboratory
41
Gaussian distribution
Gaussian distributions – Probabilities
Probabilities at selected s (z) values
2-sided
OUTSIDE
1-sided 2-sided
1.28 s
INSIDE
1-sided
90%
1.65 s
95%
[90 %]
5%
[10 %]
1.96 s
97.5%
95%
2.5%
5%
2.0 s
97.7%
95.5%
2.3%
4.5%
2.33 s
99%
98%
1.0%
2.0%
2.58 s
99.5%
99%
0.5%
1.0%
3.0 s
99.87%
99.7%
0.13%
0.3%
1-sided probabilities
1-sided probabilities can be expected in the presence of considerable systematic error.
Frequency ..
< SE /RE = 1
SE /RE = 0 >
1.96
-1.96
0
Value
2.0
z-M ultiplier ..
1.9
1.8
1.7
1.6
0.00
0.25
0.50
SE/RE
0.75
1.00
Stöckl D, Thienpont LM. About the zmultiplier in total error calculations.
Clin Chem Lab Med 2008;46:1648–9.
At SE/RE  0.75 the probabilities become practically 1-sided (see Figure)
Statistics & graphics for the laboratory
42
Gaussian distribution
Gaussian distributions – Probabilities
GaussianDistribution
Worksheets "Probability"
These worksheets demonstrate the modulation of the Gaussian distribution and
the observation of probabilities:
Outside 3s
This templates demonstrates how the probabilities outside the original 3s limits
(original population SD = 1) change when the population mean and/or SD are
modulated. Modulation is achieved by simply clicking on the "Spinners".
Outside 1.96s
This templates demonstrates how the probabilities outside the original 1.96s limits
(original population SD = 1) change when the population mean and/or SD are
modulated. Modulation is achieved by simply clicking on the "Spinners".
1-sided probabilities
This template demonstrates the concept of 1-sided probabilities.
The "Spinners" allow the movement of the z-value.
1-sided probabilities are displayed at the top of the figure.
Mean and SD are fixed in this example.
"Inside"-probabilities
This template shows the probabilities within certain distances (z-values) of the
population mean.
The z-value can be modulated with the "Spinnners".
Mean and SD are fixed in this example.
"Outside"-probabilities
This template shows the probabilities outside certain distances (z-values) of the
population mean.
The z-value can be modulated with the "Spinnners".
Mean and SD are fixed in this example.
Statistics & graphics for the laboratory
43
Gaussian distribution
Deviation from normality
Skewness and Kurtosis
The Gaussian distribution is characterized by specific frequencies of values
around certain distances of the mean and it is symmetric to its mean.
Distributions observed in practice may deviate from the ideal Gaussian
distribution because of:
• Skewness (left skew; right skew)
– Too many data on one side (left or right)
• Kurtosis (too many, or too few data in the center)
– Platykurtic (too few data in the center)
– Leptokurtic (too many data in the center)
These situations are shown in the figures below, together with the respective
numbers calculated by EXCEL®.
Coefficient of skewness:
Cskew = [Σ(xi – xm)3/N]/SD3
Zero: symmetric distribution; Positive: skewed to the right; Negative: skewed to
the left.
Coefficient of kurtosis:
Ckurt = [Σ(xi – xm)4/N]/SD4 – 3
Zero: Normal distribution; Positive: Peaked distribution; Negative: Flat distribution.
Both together are used in significance tests for normality. Some of the mostly used
tests are listed on the next page.
Statistics & graphics for the laboratory
44
Testing normality
Testing normality
Statistical significance tests for deviation from normality
• Chi-square
• Kolmogorov-Smirnov
• Anderson-Darling
• D’Agostino-Pearson
The preferred tests are Anderson-Darling and D’Agostino-Pearson!
Unfortunately, EXCEL has no in-built test for normality.
Statistical tests for normality are usually not useful below sample sizes of
20 to 30.
Graphical test for deviation from normality
Normal Probability Plot (Courtesy of Per Hyltoft Petersen)
The Normal Probability Plot/Rankit Plot allows visual assessment of data
distribution. When data are NORMAL distributed, they should lie on a LINE.
Deviations from the line indicate other distributions (e.g., skewed ones).
NormalRankitPlot
• A maximum of 1000 values can be entered. Please SORT the data, if
neccessary.
• The template foresees the transformation of the data into the logarithm (the lnversion is chosen). The 1st cell (E6) contains the formula, already. After sorting of
the original data, the 1st cell should be copied down to the last entry.
• The graphics are automatically produced on the other sheets.
• The plots have 2 y-axes. The left y-axis is in units of z, it is linear in terms of z.
The right y-axis is in units of probability (%), it is non-linear in terms of probability!
The plot shows:
-The distribution of the data
-The Normal model of the data with its confidence intervals
-The -/+1.96 s limits of the data, corresponding to the 2.5th and 97.5th percentiles.
The cumulated percentage of the data can be read from the right y-scale.
Note: The % scale is represented by a picture and the tick-marks are created by
separate data series. If neccessary, adapt the location of the tick-marks by
changing the value in the yellow cell (D3).
Assesment of normality
Compare the distribution with the model.
Statistics & graphics for the laboratory
45
Testing normality
Testing normality
NormalRankitPlot
Triacylglyceride example
Statistics & graphics for the laboratory
46
Testing normality
Calculations with logarithms
Data transformation: Logarithms
When the data are not normal distributed, one can try a transformation. Because,
in nature, data are often log-normal distributed, logarithmic transformation of data
can make them normal distributed.
Test for normality: Triglycerides (See: Datasets.xls)
n = 282; Lowest value: 0.3 mmol/L; Highest value: 3.2 mmol/L; Median: 0.92
mmol/L.
CBstat
Anderson Darling test:
P < 0.01
 data not normally distributed
Anderson Darling test after
logarithmic (natural) transformation
P = 0.13
 data log-normally distributed
Normal Probability Plot (ln-transformed data
Data are "on a line"  Data are ln-Normal distributed
Statistics & graphics for the laboratory
47
Calculations with logarithms
Working with logarithms
Calculate the reference interval of a logarithmic distribution
Triglycerides
1. Transform the original data to ln
2. Calculate the mean of the ln (xi) values
3. Take the anti-ln of the mean of ln (xi)
This equals the geometric mean of the original population, which is close to its
median.
 The anti-ln of the mean of the logged value e-0.0689 is equal to the geometric
mean of the original distribution where the latter is given by [x1*x2 …Xn]1/n
 The anti-ln of the SD is meaningless.
Number mmol/l
ln
1
0.3
-1.204
2
0.32
-1.139
3
0.34
-1.079
4
0.38
-0.968
5
0.4
-0.916
6
0.4
-0.916
…
…
…
282
3.2
1.163
Median 0.92
Anti-ln (e x ) 0.933 -0.069 Mean, ln
EXCEL: EXP(x)
Geometric mean 0.933
EXCEL: GEOMEAN
Calculation of 2.5 and 97.5% percentile
Mean (ln transformed)
-0.0689
SD (ln transformed)
0.395
2.5 Percentile
-0.0689 – 1.96*0.395 = - 0.843
97.5 percentile-0.0689 + 1.96*0.395 = 0.7053
Anti-ln of 2.5 & 97.5 perc
0.43 – 2.02
 Reference interval = 0.43 – 2.02 mmol/l
Alternative
Alternatively, a non-parametric approach to the data may have been chosen.
Non-parametric reference intervals can be calculated with the CBstat-software.
Statistics & graphics for the laboratory
48
Notes
CAVE log-transformation
Introduction of non-linearity by data transformation in method comparison and
commutability studies.
Stöckl D, Thienpont LM. Clin Chem Lab Med 2008;46:1784-5.
6
y = 1.0994x - 0.3849
300
250
200
150
100
50
Routine method (lnAU)..
Routine method (AU)..
350
0
y = 1.0113x + 0.0339
5
4
3
2
1
0
50 100 150 200 250 300 350
1
Reference method (AU)
3
4
5
6
6
y = 0.9995x + 14.65
300
250
200
150
100
50
0
Routine method (lnAU)..
350
Routine method (AU)..
2
Reference method (lnAU)
5
4
3
y = -0.0108x 3 + 0.21x 2
- 0.376x + 3.075
2
1
0
50 100 150 200 250 300 350
Reference method (AU)
1
2
3
4
5
6
Reference method (lnAU)
Notes
Statistics & graphics for the laboratory
49
Sampling statistics – Confidence intervals
Sampling statistics & Confidence intervals
t-distribution (sampling distribution of means)
• Confidence interval of a mean
• Confidence interval of the "1.96" s-point
Chi2(2)-distribution (sampling distribution of variances)
• Confidence interval of a standard deviation
Interpretation of confidence limits
SamplingStatistics; CI-Calculator
Introduction
We have characterized the Normal distribution on the basis of infinite sample size.
In practice, we are only able to take a finite sample size. The smaller our sample
size, the more uncertain our estimates will be.
 All experimental estimates have an uncertainty.
 The "true" value lies within a certain confidence interval around our estimate!
We investigate the sampling distribution of
• Means  t-distribution
• Variances  2-distribution
These distributions are the basis for the calculation of confidence intervals (CI's) of
experimentally determined means and variances (standard deviations).
Statistics & graphics for the laboratory
50
Sampling statistics – Confidence intervals
t-distribution (sampling distribution of means)
The t-distribution forms the basis for the statistical treatment of means. Like with
the Normal distribution for single results, the t-distribution allows to predict (with a
certain probability) the location of a true population mean (µ) within a certain
distance (confidence interval, CI) of the experimental mean$. The probabilities
can be viewed 1-sided and 2-sided.
The formula for the calculation of the CI is:
s
m = x ± t(u,a ) ×
n
Note: The term s/n is called the standard error of the mean (SEM).
$Note (infinite measurements or known s):
If x is normally distributed with mean µ and standard deviation σ:
• 95% of x observations are within µ+/-1.96 σ
• 95% of xm values are within µ+/-1.96 σ/n
>When s is known, one can use the z-value instead of the t-value.
Characteristics of the t-distribution (see also figure below)
• The shape of the t-distribution(s) depend on n.
• The t-distribution equals the normal distribution for n = 
• t-distributions are more peaked than the normal and have wider tails.
n = : Gauß
n= 1
n=4
Remark
The means of independent observations tend to be normally distributed
irrespective of the primary type of distribution.
 Central limit theorem
Statistics & graphics for the laboratory
51
Sampling statistics – Confidence intervals
Confidence interval/limits of the mean
Relationship confidence interval/confidence limit
The confidence interval (mean ± CI) spans from the lower to the higher confidence
limit (CL): CI = - CL < mean < + CL
• CI = ± t • s/n
• Lower CL = - t • s/n
• Higher CL = + t • s/n
The CI/CL of the mean depends
• on the probability level, a
• on the sort of tail (1-/2-tailed, also called 1-sided, or 2-sided)
• on n (n, respectively)
 a, n, and the "sort of tail" determine the magnitude of t
• the standard deviation s (also denoted SD in the book)
The expression t/n can be summarized by a factor k. Then, a CL can be
calculated as k • SD. A table of k-factors is given below, as well as a graphical
presentation.
n
4
5
6
10
15
20
21
30
50
100
k
(X SD)
1,591
1,242
1,049
0,715
0,554
0,468
0,455
0,373
0,284
0,198
2-sided 95% CL (SD units)
Relationship between confidence limit and sample size:
k-factors for the 2-sided 95% confidence limit of a mean
1,6
1,4
1,2
1,0
0,8
0,6
0,4
0,2
0,0
0
20
40
60
80
100
n (from n = 4)
Confidence limit of the 1.96 s point ("centile")
Like for the mean, CL's can be calculated for any other point of the Normal
distribution, e.g., for the 1.96 s point.
s2 1.962  s2

The standard error (SE) for the 1.96 s point is: SE(1.96) =
n
2n
The CL of the 1.96 s point is calculated as: CL1.96s = 1.71 • Clmean
The CL of the 1.96 s point is important for
• Reference intervals
• The Bland & Altman interpretation of method comparison studies.
Statistics & graphics for the laboratory
52
Sampling statistics – Confidence intervals
Chi2(2)-distribution (sampling distribution of variances)
The 2-distribution allows to predict the location of a true population standard
deviation (s) within a certain distance (CI) of the experimental SD. The
probabilities can be viewed 1-sided and 2-sided. The distribution is used to
calculate CI's/CL's of SD's.
Characteristics of the Chi2(2)-distribution (see also figure below)
• The shape of the function(s) depend on n (n)
• The function(s) are highly asymmetric
 95%-CIs of SDs become asymmetric
Confidence interval/limits of s (SD)
The CI/CL of s (SD) depends:
• on the probability level, a (1-sided, or 2-sided, also called 1-/2-tailed)
• on n (n, respectively)
Calculation (2-sided; a/2 = 0.025, [1-a/2] = 0.975)
Lower CL = SD • [(n-1)/X20.025(n-1)]0.5
Upper CL = SD • [(n-1)/X20.975(n-1)]0.5
Relationship between confidence limit and sample size:
Factors for the 2-sided 95% confidence limit of s (SD)
4
5
6
10
15
20
21
30
50
100
Limits (X SD)
Lower Upper
0,566
3,729
0,599
2,874
0,624
2,453
0,688
1,826
0,732
1,577
0,760
1,461
0,765
1,444
0,796
1,344
0,835
1,246
0,878
1,162
4,0
2-sided 95% CL (SD units)
n
3,5
3,0
2,5
2,0
Upper limit
1,5
1,0
0,5 0
20
40
60
80
100
Lower limit
0,0
n (from n = 4)
Statistics & graphics for the laboratory
53
Sampling statistics – Confidence intervals
Interpretation of 95%-confidence limits
Confidence limits and quality specifications
The figure below shows a graphical interpretation of 95%-confidence limits versus
a predefined quality specification: "10".
Note
When comparing an estimate with a specification, usually, the confidence limits
are constructed 1-sided.
Specification 10
1. Limit
2. Typical performance
1. Interpretation of the cases A – D when the specification is a limit
A: "In", the specification is satisfied with 95% probability.
B: Not "In" with 95% probability
• More data may help
C: Not "In" with 95% probability, but also not out with 95% probability.
D: "Out"
2. Interpretation when the number characterizes a stable process
If the "number" is the typical performance of a stable process, situation C can still
be accepted.
C: Look at lower limit: Not "Out" with 95% probability.
This situation is applied in the EP 5 protocol to investigate
whether the user CV is different from the typical manufacturer CV.
Statistics & graphics for the laboratory
54
Exercises
SamplingStatistics
This tutorial contains interactive exercises that demonstrate:
Sampling
-The general effect of sample size on the distribution of mean & SD (single
repetitions).
Var, SD, Mean
-The expected distribution of variance, SD, and mean for different sample sizes
(high number of repetitions).
Central Limit
-The "Central Limit Theorem".
t-Distr
-The effect of the degrees of freedom on the t-distribution.
Chi-square
-The effect of the degrees of freedom on the Chi-square-distribution.
The worksheets
Conf-Interval
CI 1.96 centile
CI interpretation
contain similar information as this text.
CI-Calculator
This file allows the:
-Calculation of 1- and 2-sided 95% confidence intervals for mean and coefficient of
variation (CV).
-The comparison of experimentally observed mean and CV with a target value.
Statistics & graphics for the laboratory
55
Notes
Notes
Statistics & graphics for the laboratory
56