* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Descriptive Statistics
Survey
Document related concepts
Transcript
Descriptive Statistics Two Branches of Stats • Descriptive Statistics – describe the data collected • Inferential Statistics – draw inferences about the population from which the sample was drawn Choosing a Statistic • Deciding on the appropriate statistical test requires understanding the level of measurement and the type of variable. • categorical(discrete) vs. continuous • nominal, ordinal, interval and ratio Conventions: • I will try and use Latin letters to represent sample statistics and Greek letters to represent population parameters • Latin (a, b, c, d, etc.) • Greek (α, β, γ, δ, ε, etc.) Descriptive Statistics • Describing the data you’ve collected • Univariate single variable Descriptive Statistics • Frequency distributions (categorical) – count • Relative frequency (percentage) distributions – valid percent – total percent • Proportion Other ways of describing the distribution • Measures of Central tendency – 1. Mean -sometime called the first moment n x x i 1 n i x1 x2 ... xn n – 2. Median – When the data is ordered largest to smallest it is the middles number if there are an odd number, and the mean of the middle two if there are an even number. The 50th percentile – 3. Mode – the most frequently occurring Measures of Dispersion • Range – highest – lowest value • Variance - sometimes called the second moment n s 2 (x i 1 i x) n 1 2 Standard Deviation n s s2 2 ( x x ) i i 1 n 1 Skewness • A measure of the asymmetry of a distribution. The normal distribution is symmetric, and has a skewness value of zero. A distribution with a significant positive skewness has a long right tail. A distribution with a significant negative skewness has a long left tail. As a rough guide, a skewness value more than twice it's standard error is taken to indicate a departure from symmetry. n skewness (x i 1 i x) n 1s 3 3 Kurtosis • A measure of the extent to which observations cluster around a central point. For a normal distribution, the value of the kurtosis statistic is 0. Positive kurtosis indicates that the observations cluster more and have longer tails than those in the normal distribution and negative kurtosis indicates the observations cluster less and have shorter tails. n kurtosis (x i 1 i x) n 1s 4 4 3 Graphical Representation of Single Variables • Categorical – Bar Chart – Pie Chart Bar Chart 2 1 Pie Chart • Continuous – Histogram – Line Chart – Box and Whiskers Data Visualization • Much can be done to display data. Practice Problems Bivariate Descriptive Statistics Bivariate Descriptive statistics • 2 variables • 3 possible combinations – cat/cat; – cat/cont; – cont/cont • Independent vs dependent. Categorical/Categorical • Crosstabulations (2 way frequency tables, Crosstabs, Bivariate distributions) Smoke\Gender Male Female Row total Yes 30 25 55 No 20 25 45 column total 50 50 100 Categorical/Continuous • Any statistic that applied to cont. variables done for each category – Mean, median, mode. – Variance, Std dev, skewness, kurtosis Continuous/Continuous • Simple Correlation coefficient (Pearson’s product-moment correlation coefficient, Covariance) rxy ryx ( x x )( y y ) ( x x ) ( y y) i i 2 i • this ranges from +1 to -1 i 2 Four sets of data with the same correlation of 0.816 Graphical Representations • Bar Charts pie charts etc. • histogram, box plots • scatter plots Practice Problems