Download Chapter 10- Basic Statistical Concepts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Handbook for Health Care Research, Second Edition
Chapter 10
CHAPTER 10
Basic Statistical Concepts
© 2010 Jones and Bartlett Publishers, LLC
Handbook for Health Care Research, Second Edition
Chapter 10
Preliminary Concepts
The following terms are used frequently in this chapter:
• Population- Collection of data or objects (that describes some
phenomenon of interest.
• Sample: Subset of a population that is accessible for measurement.
• Variable- Characteristic or entity that can take on different values.
• Qualitative variable-Categorical variable not placed on a meaningful
number scale.
• Quantitative variable-One that is measurable using a meaningful
scale of numbers.
• Discrete variable- Quantitative variable with gaps or interruptions
in the values it may assume.
• Continuous variable- Quantitative variable that can take on any
value, including fractional ones possible and limited by
instrumentation or application.
© 2010 Jones and Bartlett Publishers, LLC
2
Handbook for Health Care Research, Second Edition
Chapter 10
Levels of Measurement
• Numbers (0, 1, 2, … ) have the following properties:
-Distinguishability: 0, 1, 2, and so on, are different numbers.
-Ranking (greater than or less than): 1 is less than 2.
-Equal intervals: Between 1 and 2, we assume the same
distance as between 3 and 4.
• Nominal- named categories without any particular
order to them
• Ordinal- consist of discrete categories that have an
order to them (no indication of equal interval)
• Continuous (Interval)- can assume any value, rather
than just whole numbers (assume equal, uniform
intervals)
• Continuous (Ratio) - mathematically strongest level is
the ratio, where numbers represent equal intervals
and start with zero
© 2010 Jones and Bartlett Publishers, LLC
3
Handbook for Health Care Research, Second Edition
Chapter 10
Significant Figures
• Number of digits used to express a measured
number is a rough indication of the error
• Zero as Significant Figures:
- Final zeros to the right of the decimal point that are
used to indicate accuracy are significant
- For numbers less than one, zeros between the
decimal point and the fi rst digit are not significant
• Calculations Using Significant Figures- the least
precise measurement used in a calculation
determines the number of significant figures in the
answer
© 2010 Jones and Bartlett Publishers, LLC
4
Handbook for Health Care Research, Second Edition
Chapter 10
Rounding
• Done so that you do not infer accuracy in the result
that was not present in the measurements
• Universal rounding rules:
• If the final digits of a number are 0, 1, 2, 3, or 4,
the numbers are rounded down (dropped, and the
preceding figure is retained unaltered).
• If the final digits are 5, 6, 7, 8, or 9, the numbers
are rounded up (dropped, and the preceding figure is
increased by one).
© 2010 Jones and Bartlett Publishers, LLC
5
Handbook for Health Care Research, Second Edition
Chapter 10
Descriptive Statistics
• Methods for organizing data and reducing a large set
of numbers to a few informative numbers
• Data representation- data set should be organized
for inspection through use of a frequency
distribution
• Histogram- A bar graph in which the height of the bar
indicates the frequency of occurrence of a value or class of
values.
• Frequency polygon- A graph in which a point indicates the
frequency of a value, and the points are connected to form a
broken line (hence a polygon)
• Percentage- The numerical frequency on the Y-axis is
replaced with the percentage of occurrence in this form of
the polygon.
© 2010 Jones and Bartlett Publishers, LLC
6
Handbook for Health Care Research, Second Edition
Chapter 10
Descriptive Statistics
Percentile- A percentile is the value of a variable in a
data set below which a certain percent of
observations fall
• Cumulative percentage curve- This graph plots the
cumulative percentage on the Y-axis against the
values of the variable on the X-axis. The curve then
describes the rate of accumulation for the values of
the variable.
© 2010 Jones and Bartlett Publishers, LLC
Handbook for Health Care Research, Second Edition
Chapter 10
Measures of the Typical Value of a Set of
Numbers
• Summation Operator- denoted by the Greek capital
letter sigma(∑) and simply indicates addition over
values of variable
• Three statistics are used to represent the typical value
(also called the central tendency)
-Mean-sum of all observations divided by the number
of observations
-Median- is the 50th percentile of a distribution, or the
point that divides the distribution into equal halves
-Mode- is the most frequently occurring observation in
the distribution
© 2010 Jones and Bartlett Publishers, LLC
8
Handbook for Health Care Research, Second Edition
Chapter 10
Measures of Dispersion
• Dispersion indicate the variability, or how spread out the
data are
• Range- is the distance between the smallest and the largest
values of the variable
• Variance - is a measure of how different the values in a set
of numbers are from each other. It is calculated as the
average squared deviation of the values from the mean
• Standard deviation-average deviation from mean
• Coefficient of Variation- expresses standard deviation as
percentage of mean
• Standard scores (or z score)- deviation from mean
expressed in units of standard deviation
© 2010 Jones and Bartlett Publishers, LLC
9
Handbook for Health Care Research, Second Edition
Chapter 10
Propagation of Errors in Calculations
and Correlation and Regression
• Propagation of Errors in Calculations- physical
quantity of interest is not measured directly but
rather a function of one of more measurement
made from an experiment
• Correlation- descriptive measure of relationship
or association between two variables
• Regression- linear relationship between two
variables, use the value of one variable to predict
the value of the other variable
- When we measure X and predict Y, Y is said to
be regressed on X
© 2010 Jones and Bartlett Publishers, LLC
10
Handbook for Health Care Research, Second Edition
Chapter 10
Inferential Statistics
• Although a sample from a population is economical, we still
wish to use the sample measurements (statistics) to infer to
the population measures (parameters).
• Concept of Probability- probability of an event can be defined
as the relative frequency, or proportion, of occurrence of that
event out of some total number of events.
-Values between 0 and 1
• Normal Distribution and Standard scores- a normally
distributed variable, the mean is at the center of the
distribution, and therefore, the mean is also the median and
the mode.
-Normal distribution- z score for the mean must always be
zero.
© 2010 Jones and Bartlett Publishers, LLC
11
Handbook for Health Care Research, Second Edition
Chapter 10
Normal Curve
Approximate areas under the normal curve within one, two, and
three standard deviations around the mean.
© 2010 Jones and Bartlett Publishers, LLC
12
Handbook for Health Care Research, Second Edition
Chapter 10
Inferential Statistics
• Sampling Distribution- is the probability
distribution of a statistic and most important
concept in inferential statistics
• Confidence Intervals- is the range of values that
are believed to contain the true parameter value
• Error intervals- describe the combined effects of
systematic and random errors on individual
measurements.
-We can also say something about how much
confidence should be placed in the estimate
© 2010 Jones and Bartlett Publishers, LLC
13
Handbook for Health Care Research, Second Edition
Chapter 10
Inferential Statistics
• Data Analysis for Device Evaluation Studies
Step 1- create a scatter plot of the raw data to get a
subjective impression of their validity.
Step 2- make sure the data comply with the
assumption of normality
Step 3- once the data are judged to conform to the
underlying assumptions, the mean and standard
deviation are used to calculate error intervals
Step 4- the data should be presented in graphic form
and labeled with the numerical values for the error
intervals
© 2010 Jones and Bartlett Publishers, LLC
14
Handbook for Health Care Research, Second Edition
Chapter 10
Inferential Statistics
• Interpreting Manufacturers’ Error Specifications- evaluating a
new device, a major concern is with how much error can be
expected in normal use.
- Knowing that any specification of error is just an estimate,
we want to know how much confidence to place in it.
- Manufacturers can be rather obscure about their error
specifications.
• Hypothesis testing- technique for quantifying our guess about
a hypothesis. We never know the “real” situation.
-Does drug X cause Y or not? We can figure the odds and
quantify our probability of being right or wrong.
-Chance difference
© 2010 Jones and Bartlett Publishers, LLC
Handbook for Health Care Research, Second Edition
Chapter 10
Inferential Statistics
• Type I and II Errors
– Type 1-the error of rejecting the null hypothesis
when it is true
– Type 2-the error of accepting false null hypothesis
• Power Analysis and Sample Size- probability of
correctly rejecting the null hypothesis
– The most practical means to control power is to
manipulate sample size
© 2010 Jones and Bartlett Publishers, LLC
16
Handbook for Health Care Research, Second Edition
Chapter 10
Inferential Statistics
• Rules of Thumb for Estimating Sample Size
-Estimates Based on Mean and Standard
Deviation
-Estimates Based on Proportionate Change and
Coefficient of Variation
-Estimates for Confidence Intervals
-Sample Size for Binomial Test
-Unequal Sample Sizes
-Rule of Threes
© 2010 Jones and Bartlett Publishers, LLC
17
Handbook for Health Care Research, Second Edition
Chapter 10
Inferential Statistics
• Clinical Importance Versus Statistical
Significance
– Size of the test statistic for a given difference is
determined by the standard error, which in turn is
determined by the sample size
– Difference between two mean values (treatment
group vs. control group) is significant but so small
that it does not have any practical effect, then we
must conclude that the results are not clinically
important
© 2010 Jones and Bartlett Publishers, LLC
18
Handbook for Health Care Research, Second Edition
Chapter 10
Inferential Statistics
• Matched Versus Unmatched Data
– Unmatched data (or unpaired or independent) if
values in one group are unrelated in any way to
the data values in the other group
– Matched data (or paired or dependent) data are
selected so that they will be as nearly identical as
possible
© 2010 Jones and Bartlett Publishers, LLC
19