Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
6.1 What is Statistics? • Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively evaluated. 3 Phases: 1. Collecting data 2. Analyzing data 3. Interpreting data 6.1 What is Statistics? • Descriptive Statistics – summarize and describe a characteristic of a group example: batting average • Inferential Statistics – used to estimate, infer, or conclude something about a larger group example: polls • Sample – subset of the group of data available for analysis 6.1 What is Statistics? • Population – the entire set • Bias – favoring of certain outcomes over others • Census – collects data from all members of the population • Parameter – characteristic value of a population • Statistic – characteristic value of a sample 6.2 Organizing Data • Stem and Leaf Diagram: data – 35, 52, 37, 44, 51, 48, 45, 12 Stem 5 4 3 2 1 Leaves 12 458 57 2 6.2 Organizing Data • Frequency Table: data – 35, 52, 37, 44, 51, 48, 45, 12 Range 50-59 40-49 30-39 20-29 10-19 Frequency 2 3 2 0 1 6.3 Displaying Data • Ways to display data: – – – – – – Frequency histogram Relative frequency histogram Multiple bar graph Stacked bar graph Line graph Pie chart 6.3 Displaying Data Frequency Histogram 30 25 20 15 Series1 10 5 0 1 2 3 4 5 6 7 8 6.3 Displaying Data Relative Frequency Histogram Relative Frequency 0.3 0.25 0.2 0.15 Series1 0.1 0.05 0 1 2 3 4 5 6 7 8 6.3 Displaying Data Multiple Bar Graph 5000 4000 lower 3000 upper 2000 graduate 1000 Ar ts N ur s i S oc sc i du E at sc N C om m 0 6.3 Displaying Data Stacked Bar Graph 8000 7000 6000 5000 4000 3000 2000 1000 0 graduate upper Ar ts N ur s i sc So c i u Ed at sc N C om m lower 6.3 Displaying Data Line Graph 5000 4000 lower 3000 upper 2000 graduate 1000 Ar ts N ur s i S oc sc i du E at sc N C om m 0 6.3 Displaying Data Pie Chart Pie Chart Comm Edu Natsci Socsci Nurs Arts 6.4 Measures of Central Tendency • Central Tendency – the propensity of data to be located or clustered about some point. • Arithmetic Mean – sum of the values of all the observations divided by the total number of observations n • For sample data, mean is x x i 1 n i 6.4 Measures of Central Tendency n • For population data, the mean is x i 1 i n • Median – the median is the middle value of a set of data when data is arranged in ascending order 6.4 Measures of Central Tendency • Finding the median: 1. Arrange the data in increasing order or decreasing order. 2. Determine if n is even or odd. a. If n is odd, pick the middle value b. If n is even, take the average of the two middle values 6.4 Measures of Central Tendency • Mode – is the value or values that occur most frequently. Note: If all values occur with the same frequency, then there is no mode. • Symmetric Distribution Mean, Median, and Mode 6.4 Measures of Central Tendency • Distribution skewed to the left Mean Median Mode • Distribution skewed to the right Mode Median Mean 6.5 Measures of Variability • Definition: The range of a set of n measurements, x1, x2, x3, … xn is the difference between the largest and the smallest amounts. N • Variance - 2 2 ( x ) i i 1 N 6.5 Measures of Variability Problem with the variance: the units are the original units squared. • Standard deviation – population standard deviation is the square root of the population variance. n 2 ( xi x) • Sample variance - s 2 • s = square root of the sample variance i 1 n 1 6.5 Measures of Variability • Short cut formulas for s2 and 2 are given on page 495 (provided with test). • Short cut formula for frequency data is given on page 499 (provided with test). • Short cut formulas are genuinely easier to calculate. • Approximating the standard deviation: s (R/4) where R is the range. 6.6 Measures of Relative Position • pth percentile - for a data in increasing order - p% of the data are less than that value and (100 – p)% of the data are greater than that value. 6.6 Measures of Relative Position • Z-scores – The sample z-score for a measure x is: xx z s The population z-score for a measure x is: x z z-score represents the # of standard deviations away from the mean. 6.7 Normal Distribution • • Definition: Standardizing – converting data to z-scores. Some empirical rules: 1. About 68% of data is within one of the mean. 2. About 95% of data is within two of the mean. 3. About 99% of data is within three of the mean. 6.7 Normal Distribution • The normal distribution looks like: 1. Bell-shaped 2. Symmetric 3. Mean = median = mode 6.7 Normal Distribution • Definition: Standard normal distribution – normal distribution with = 1 and = 0. The standard normal distribution table (page 511 or in appendix page 647) can be used to determine probabilities for a range of zvalues 6.8 Confidence Intervals • Central Limit Theorem: For a large sample size, the random variable x is approximately normally distributed with mean and standard deviation /n where is the population mean of the x’s and is the population standard deviation of the x’s. 6.8 Confidence Intervals • x Z 2 n - may be replaced by s • Common levels of confidence (n 30): Level of Confidence z/2 80 90 95 99 1.28 1.645 1.96 2.575 6.8 Confidence Intervals • Margin of Error: margin of error of an estimate of a sample proportion is given by: Z 2 2 n 6.9 Regression and Correlation • Scatter Plot – a plot of data consisting of 2 variables • Linear Regression – modeling the data with the line that “best fits” – usually a “least squares” line or regression line • Least Squares Line – is the line that minimizes the sum of the squared errors for a set of data points (formulas given on page 531 and shortcut formulas are on page 532 – formulas to be provided on test) 6.9 Regression and Correlation • Correlation Coefficient r – is a measure of the strength of the linear relationship between the 2 random variables x and y. Note: The closer the correlation is to 1 or – 1, the stronger the relationship between the x and y variables. A correlation of zero means there is no evidence of a linear pattern.