Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

no text concepts found

Transcript

6.1 What is Statistics? • Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively evaluated. 3 Phases: 1. Collecting data 2. Analyzing data 3. Interpreting data 6.1 What is Statistics? • Descriptive Statistics – summarize and describe a characteristic of a group example: batting average • Inferential Statistics – used to estimate, infer, or conclude something about a larger group example: polls • Sample – subset of the group of data available for analysis 6.1 What is Statistics? • Population – the entire set • Bias – favoring of certain outcomes over others • Census – collects data from all members of the population • Parameter – characteristic value of a population • Statistic – characteristic value of a sample 6.2 Organizing Data • Stem and Leaf Diagram: data – 35, 52, 37, 44, 51, 48, 45, 12 Stem 5 4 3 2 1 Leaves 12 458 57 2 6.2 Organizing Data • Frequency Table: data – 35, 52, 37, 44, 51, 48, 45, 12 Range 50-59 40-49 30-39 20-29 10-19 Frequency 2 3 2 0 1 6.3 Displaying Data • Ways to display data: – – – – – – Frequency histogram Relative frequency histogram Multiple bar graph Stacked bar graph Line graph Pie chart 6.3 Displaying Data Frequency Histogram 30 25 20 15 Series1 10 5 0 1 2 3 4 5 6 7 8 6.3 Displaying Data Relative Frequency Histogram Relative Frequency 0.3 0.25 0.2 0.15 Series1 0.1 0.05 0 1 2 3 4 5 6 7 8 6.3 Displaying Data Multiple Bar Graph 5000 4000 lower 3000 upper 2000 graduate 1000 Ar ts N ur s i S oc sc i du E at sc N C om m 0 6.3 Displaying Data Stacked Bar Graph 8000 7000 6000 5000 4000 3000 2000 1000 0 graduate upper Ar ts N ur s i sc So c i u Ed at sc N C om m lower 6.3 Displaying Data Line Graph 5000 4000 lower 3000 upper 2000 graduate 1000 Ar ts N ur s i S oc sc i du E at sc N C om m 0 6.3 Displaying Data Pie Chart Pie Chart Comm Edu Natsci Socsci Nurs Arts 6.4 Measures of Central Tendency • Central Tendency – the propensity of data to be located or clustered about some point. • Arithmetic Mean – sum of the values of all the observations divided by the total number of observations n • For sample data, mean is x x i 1 n i 6.4 Measures of Central Tendency n • For population data, the mean is x i 1 i n • Median – the median is the middle value of a set of data when data is arranged in ascending order 6.4 Measures of Central Tendency • Finding the median: 1. Arrange the data in increasing order or decreasing order. 2. Determine if n is even or odd. a. If n is odd, pick the middle value b. If n is even, take the average of the two middle values 6.4 Measures of Central Tendency • Mode – is the value or values that occur most frequently. Note: If all values occur with the same frequency, then there is no mode. • Symmetric Distribution Mean, Median, and Mode 6.4 Measures of Central Tendency • Distribution skewed to the left Mean Median Mode • Distribution skewed to the right Mode Median Mean 6.5 Measures of Variability • Definition: The range of a set of n measurements, x1, x2, x3, … xn is the difference between the largest and the smallest amounts. N • Variance - 2 2 ( x ) i i 1 N 6.5 Measures of Variability Problem with the variance: the units are the original units squared. • Standard deviation – population standard deviation is the square root of the population variance. n 2 ( xi x) • Sample variance - s 2 • s = square root of the sample variance i 1 n 1 6.5 Measures of Variability • Short cut formulas for s2 and 2 are given on page 495 (provided with test). • Short cut formula for frequency data is given on page 499 (provided with test). • Short cut formulas are genuinely easier to calculate. • Approximating the standard deviation: s (R/4) where R is the range. 6.6 Measures of Relative Position • pth percentile - for a data in increasing order - p% of the data are less than that value and (100 – p)% of the data are greater than that value. 6.6 Measures of Relative Position • Z-scores – The sample z-score for a measure x is: xx z s The population z-score for a measure x is: x z z-score represents the # of standard deviations away from the mean. 6.7 Normal Distribution • • Definition: Standardizing – converting data to z-scores. Some empirical rules: 1. About 68% of data is within one of the mean. 2. About 95% of data is within two of the mean. 3. About 99% of data is within three of the mean. 6.7 Normal Distribution • The normal distribution looks like: 1. Bell-shaped 2. Symmetric 3. Mean = median = mode 6.7 Normal Distribution • Definition: Standard normal distribution – normal distribution with = 1 and = 0. The standard normal distribution table (page 511 or in appendix page 647) can be used to determine probabilities for a range of zvalues 6.8 Confidence Intervals • Central Limit Theorem: For a large sample size, the random variable x is approximately normally distributed with mean and standard deviation /n where is the population mean of the x’s and is the population standard deviation of the x’s. 6.8 Confidence Intervals • x Z 2 n - may be replaced by s • Common levels of confidence (n 30): Level of Confidence z/2 80 90 95 99 1.28 1.645 1.96 2.575 6.8 Confidence Intervals • Margin of Error: margin of error of an estimate of a sample proportion is given by: Z 2 2 n 6.9 Regression and Correlation • Scatter Plot – a plot of data consisting of 2 variables • Linear Regression – modeling the data with the line that “best fits” – usually a “least squares” line or regression line • Least Squares Line – is the line that minimizes the sum of the squared errors for a set of data points (formulas given on page 531 and shortcut formulas are on page 532 – formulas to be provided on test) 6.9 Regression and Correlation • Correlation Coefficient r – is a measure of the strength of the linear relationship between the 2 random variables x and y. Note: The closer the correlation is to 1 or – 1, the stronger the relationship between the x and y variables. A correlation of zero means there is no evidence of a linear pattern.