Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Data Characterization BUS304 – Data Charaterization 1 Today: Mean and Variance Mean: Exercise: also called “average” Formula: mean= compute the mean weight for the Chargers’ offense players and defense players. Sum of data Number of data Characterize the center of the data distribution The most commonly used measure Which mean should be higher? Why? Are they population mean or sample mean? Sample mean x The average derived from sample Ways to compute the mean: 1. Use calculator. Population mean The average derived from the population 2. Use Excel. (function: average) BUS304 – Data Charaterization 2 Sensitivity to outliers Compute the mean for the following 2 groups of data Household income in community a: (Unit =10000$) Household income in community b: (Unit =10000$) #1 #2 #3 #4 #5 #6 #7 #8 #1 #2 #3 #4 #5 #6 #7 #8 5 4 3 4 3 5 4 5 5 4 3 4 3 5 4 100 If the mayor decide to provide more public facilities to poor communities, and the decision is made based on whether the mean income in the community is below $50,000 per year. Does such a decision make sense? BUS304 – Data Charaterization 3 Compute the mean from frequency table Below is a frequency table showing Create a histogram using the data on the number of days the teams finish the left, locate the mean on the graph. their projects How to describe the shape of the histogram? Relative Days to Complete Frequency 5 4 ? 6 12 ? 7 8 ? 8 6 ? 9 4 ? 10 2 ? Frequency What is the relationship between the mean and peak? Use relative frequency to find out the mean. How many days on average does a team finish one project? mean= total days total teams BUS304 – Data Charaterization 4 Compute the mean from Histogram Histogram Frequency 7 6 6 Histogram conveys the same information as the frequency table 5 5 4 4 3 3 2 2 1 0 0 55 mean= 0 15 15 25 25 35 36 45 45 55 55 More total data value 15 3 25 6 35 5 45 4 55 2 33 data size 365 4 2 Mathematical Expression: x=33 if sample, =33 if population BUS304 – Data Charaterization 5 Weighted Mean The mean assumes that each piece of information equally. E.g. average score of the students. Sometimes, different data should be put in different weight. One may be more important than the other. • E.g. some instructor assign 60% on the homework score, and 40% on the final exam. If a student’s homework score is 84, and got 70 in the exam, compute the student’s final score. (weighted mean of homework score and exam score) -- this teacher thinks homework reveals more comprehensive information about a student’s knowledge, and hence put more weight. BUS304 – Data Charaterization 6 When to use weighted mean? Some other examples of weighted mean: A student’s GPA. A course with more credit takes more weight. An economic growth indicator. (some industries affects the economy more than others) Crush time leader: a player who perform the best in the last few minutes of the game. – can reveal the person’s performance under pressure. Expectation – you will see in chapter 4 • E.g. in a gambling game, if with 60% chance you lose one dollar, and with 40% chance you gain one dollar, the expectation is 60%x$(-1)+40%x$1=-$0.2 Other examples? (average Cal State Tuition) Always think whether you should use weighted mean or simple mean. BUS304 – Data Charaterization 7 Break BUS304 – Data Charaterization 8 Variance A measure of data spread. Also called “the average of squared deviations from the mean” The larger the variance, the fat the histogram -- sample variance N n s 2 (x i 1 -- population variance i x) 2 σ2 2 (x μ) i i 1 n -1 N Note the difference! BUS304 – Data Charaterization 9 Steps to compute the variance 1. Identify whether the data are of a population or sample (the formulae are different.) 2. Use the following table to compute the deviation: a) Data list 5 4 Distance from the mean Square the distance =5-mean=1.167 =(1.67)2=1.36 Find out the mean: mean= 5 4 453 2 3.833 6 b) Find out the distance (fill out the 2nd column) c) Find out the squared distance 4 (the 3rd column) 5 d) Add up the 3rd column 3 e) divided by 2 i. population size; or ii. sample size -1 BUS304 – Data Charaterization 10 Comparing variance vs. histogram Find the variance for the following groups of sample data: Compare the mean and variance. Create the histogram to compare the distribution. 11 14 11 12 15 11 13 15 11 16 15 12 16 16 19 17 16 20 18 16 20 21 17 20 BUS304 – Data Charaterization 11 What does variance mean? Variance indicate variation: The larger the variance, the more spread out the data. Indicates unpredictability. E.g. • Weather data: weather changes dramatically, hard to predict tomorrow’s temperature (If look at temperature data: which has larger variance, Chicago or San Diego?) • Stock: more risk on returns. • A person’s performance: consistency. emotional… • Other examples? BUS304 – Data Charaterization 12 Use frequency table to compute the population variance: 14 15 15 Data value Frequency Relative Frequency 15 14 1 0.125 16 15 3 0.375 16 16 3 0.375 16 17 1 0.125 17 Data distance square 14 15 15 15 16 16 Data distance square 14 15 16 Compute the weighted average 17 16 17 BUS304 – Data Charaterization 13 Standard Deviation Square root of variance. An indicator of data deviation, can be directly compared to the mean. s= s 2 OR Sample variance Sample standard deviation = 2 Exercise: compute the standard deviation from the histogram on slide no. 5 and locate it on the histogram. Population variance Population standard deviation BUS304 – Data Charaterization 14 Empirical Rule If the data is bell shaped (most of the time), then 95% 68% 99.7% 68% of all data will fall in the range of μ μσ μ 2σ 95% of all data will fall in the range of 2 99.7% of all data will fall in the range of 3 μ 3σ BUS304 – Data Charaterization 15