Download Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Descriptive Statistics
Two Branches of Stats
• Descriptive Statistics
– describe the data collected
• Inferential Statistics
– draw inferences about the population from which
the sample was drawn
Choosing a Statistic
• Deciding on the appropriate statistical test
requires understanding the level of
measurement and the type of variable.
• categorical(discrete) vs. continuous
• nominal, ordinal, interval and ratio
Conventions:
• I will try and use Latin letters to represent
sample statistics and Greek letters to
represent population parameters
• Latin (a, b, c, d, etc.)
• Greek (α, β, γ, δ, ε, etc.)
Descriptive Statistics
• Describing the data you’ve collected
• Univariate single variable
Descriptive Statistics
• Frequency distributions (categorical)
– count
• Relative frequency (percentage) distributions
– valid percent
– total percent
• Proportion
Other ways of describing the
distribution
• Measures of Central tendency
– 1. Mean -sometime called the first moment
n
x
x
i 1
n
i

x1  x2  ...  xn 
n
– 2. Median – When the data is ordered largest to
smallest it is the middles number if there are an
odd number, and the mean of the middle two if
there are an even number. The 50th percentile
– 3. Mode – the most frequently occurring
Measures of Dispersion
• Range – highest – lowest value
• Variance - sometimes called the second
moment
n
s 
2
 (x
i 1
i
 x)
n 1
2
Standard Deviation
n
s  s2 
2
(
x

x
)
 i
i 1
n 1
Skewness
• A measure of the asymmetry of a distribution. The normal distribution is
symmetric, and has a skewness value of zero. A distribution with a
significant positive skewness has a long right tail. A distribution with a
significant negative skewness has a long left tail. As a rough guide, a
skewness value more than twice it's standard error is taken to indicate a
departure from symmetry.
n
skewness 
(x
i 1
i
 x)
n  1s
3
3
Kurtosis
• A measure of the extent to which observations cluster around a central
point. For a normal distribution, the value of the kurtosis statistic is 0.
Positive kurtosis indicates that the observations cluster more and have
longer tails than those in the normal distribution and negative kurtosis
indicates the observations cluster less and have shorter tails.
n
kurtosis 
(x
i 1
i
 x)
n  1s
4
4
3
Graphical Representation of Single
Variables
• Categorical
– Bar Chart
– Pie Chart
Bar Chart
2
1
Pie Chart
• Continuous
– Histogram
– Line Chart
– Box and Whiskers
Data Visualization
• Much can be done to display data.
Practice Problems
Bivariate Descriptive Statistics
Bivariate Descriptive statistics
• 2 variables
• 3 possible combinations
– cat/cat;
– cat/cont;
– cont/cont
• Independent vs dependent.
Categorical/Categorical
• Crosstabulations (2 way frequency tables,
Crosstabs, Bivariate distributions)
Smoke\Gender
Male
Female
Row total
Yes
30
25
55
No
20
25
45
column total
50
50
100
Categorical/Continuous
• Any statistic that applied to cont. variables
done for each category
– Mean, median, mode.
– Variance, Std dev, skewness, kurtosis
Continuous/Continuous
• Simple Correlation coefficient (Pearson’s
product-moment correlation coefficient,
Covariance)
rxy  ryx 
 ( x  x )( y  y )
 ( x  x )  ( y  y)
i
i
2
i
• this ranges from +1 to -1
i
2
Four sets of data with the same correlation of 0.816
Graphical Representations
• Bar Charts pie charts etc.
• histogram, box plots
• scatter plots
Practice Problems