Download Chapter 3 Slides (PPT)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Descriptive Statistics
• Tabular and Graphical Displays
– Frequency Distribution - List of intervals of
values for a variable, and the number of
occurrences per interval
– Relative Frequency - Proportion (often reported
as a percentage) of observations falling in the
interval
– Histogram/Bar Chart - Graphical representation
of a Relative Frequency distribution
– Stem and Leaf Plot - Horizontal tabular display
of data, based on 2 digits (stem/leaf)
Comparing Groups
•
•
•
•
Side-by-side bar charts
3 dimensional histograms
Back-to-back stem and leaf plots
Goal: Compare 2 (or more) groups wrt
variable(s) being measured
• Do measurements tend to differ among
groups?
Sample & Population Distributions
• Distributions of Samples and Populations- As
samples get larger, the sample distribution gets
smoother and looks more like the population
distribution
– U-shaped - Measurements tend to be large or small,
fewer in middle range of values
– Bell-shaped - Measurements tend to cluster around
the middle with few extremes (symmetric)
– Skewed Right - Few extreme large values
– Skewed Left - Few extreme small values
Measures of Central Tendency
• Mean - Sum of all measurements divided by
the number of observations (even
distribution of outcomes among cases). Can
be highly influenced by extreme values.
• Notation: Sample Measurements labeled
Y1,...,Yn
Y1  ...  Yn  Yi
Y

n
n
Median, Percentiles, Mode
• Median - Middle measurement after data have
been ordered from smallest to largest.
Appropriate for interval and ordinal scales
• Pth percentile - Value where P% of
measurements fall below and (100-P)% lie
above. Lower quartile(25th), Median(50th),
Upper quartile(75th) often reported
• Mode - Most frequently occurring outcome.
Typically reported for ordinal and nominal data.
Measures of Variation
• Measures of how similar or different
individual’s measurements are
– Range -- Largest-Smallest observation
– Deviation -- Difference between ith individual’s
outcome and the sample mean: Yi  Y
– Variance of n observations Y1,...,Yn is the “average”
squared deviation:
s2 
2
(
Y

Y
)
 i
n 1
(Y1  Y ) 2  (Y2  Y ) 2  ...  (Yn  Y ) 2

n 1
Measures of Variation
• Standard Deviation - Positive square root of
the variance (measure in original units):
s   s2 
2
(
Y

Y
)
 i
n 1
• Properties of the standard deviation:
• s  0, and only equals 0 if all observations are equal
• s increases with the amount of variation around the mean
• Division by n-1 (not n) is due to technical reasons (later)
• s depends on the units of the data (e.g. $1000s vs $)
Empirical Rule
• If the histogram of the data is approximately
bell-shaped, then:
– Approximately 68% of measurements lie within
1 standard deviation of the mean.
– Approximately 95% of measurements lie within
2 standard deviations of the mean.
– Virtually all of the measurements lie within 3
standard deviations of the mean.
Other Measures and Plots
• Interquartile Range (IQR)-- 75th%ile - 25th%ile
(measures the spread in the middle 50% of data)
• Box Plots - Display a box containing middle
50% of measurements with line at median and
lines extending from box. Breaks data into four
quartiles
• Outliers - Observations falling more than
1.5IQR above (below) upper (lower) quartile
Dependent and Independent Variables
• Dependent variables are outcomes of interest to
investigators. Also referred to as Responses or
Endpoints
• Independent variables are Factors that are often
hypothesized to effect the outcomes (levels of dependent
variables). Also referred to as Predictor or Explanatory
Variables
• Research ??? Does I.V.  D.V.
Example - Clinical Trials of Cialis
• Clinical trials conducted worldwide to study efficacy
and safety of Cialis (Tadalafil) for ED
• Patients randomized to Placebo, 10mg, and 20mg
• Co-Primary outcomes:
– Change from baseline in erectile dysfunction domain if the
International Index of Erectile Dysfunction (Numeric)
– Response to: “Were you able to insert your P… into your
partner’s V…?” (Nominal: Yes/No)
– Response to: “Did your erection last long enough for you to
have succesful intercourse?” (Nominal: Yes/No)
Source: Carson, et al. (2004).
Example - Clinical Trials of Cialis
• Population: All adult males suffering from erectile
dysfunction
• Sample: 2102 men with mild-to-severe ED in 11
randomized clinical trials
• Dependent Variable(s): Co-primary outcomes
listed on previous slide
• Independent Variable: Cialis Dose: (0, 10, 20 mg)
• Research Questions: Does use of Cialis improve
erectile function?
Sample Statistics/Population
Parameters
• Sample Mean and Standard Deviations are
most commonly reported summaries of
sample data. They are random variables
since they will change from one sample to
another.
• Population Mean (m) and Standard
Deviation (s) computed from a population
of measurements are fixed (unknown in
practice) values called parameters.