* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Basic Business Statistics, 10/e - RIT
Survey
Document related concepts
Transcript
Day 2 Lecture Review of Descriptive Statistics Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Agenda Procedures Review MyCourses, Smam320 website, WebAssign, Groups Lecture: Introduction to univariate data analysis Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-2 Learning “ Learning is defined as stabilizing, through repeated use, certain appropriate and desirable synapses in the brain.“ (Leamnson, 2000, p. 5) Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-3 Active Learning Knowing and learning are communal acts. They require a continual cycle of discussion, disagreement, and consensus over what has been and what it all means.” (Palmer) Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-4 The Quiet Signal In order to use active learning, we have to have a quiet signal. I will raise my hand when it is time to stop group work and resume all class activity. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-5 The Quiet Signal Complete your sentence. Raise your hand. Alert classmates that its time to focus up front again. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-6 Interview A interviews B At signal, reverse roles and B interviews A. (same number of minutes and same questions.) Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. C A B Chap 7-7 Interview Questions Name, etc. How comfortable are you with Minitab? How do you learn best? * * * * * * * Extra Time? * * * * * * * * What are your greatest fears about this class? Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-8 In Every Statistical Study: Questions are posed Characteristics are observed Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-9 Major Branches of Statistics Descriptive Statistics Organizing, Summarizing Information Graphical techniques (know which graph goes with which type of variable: qualitative, quantitative; and which are univariate techniques and which are bivariate) Numerical techniques (know which descriptive summary measure is appropriate for summarizing qualitative data and which for quantitative data.) Inferential Statistics Estimation Decision making Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-10 Descriptive Statistics Methods for summarizing data Summaries usually consist of graphs and numerical summaries of the data Summarize the raw data Present the data in a useful format Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-11 Inference Methods of making decisions or predictions about a populations based on sample information. Most of this class (320) is about Inference. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-12 Parameter and Statistic A parameter is a numerical summary of the population A statistic is a numerical summary of a sample taken from the population Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-13 Variables and Sample Statistics VARIABLE TYPE What it measures Variable Name Values NUMERIC Summary Measures Population Parameter Sample Statistic Population Sample Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. QUALITATIVE (categorical) QUANTITATIVE (numeric) A quality, a category, or the occurrence of a particular type that you should be able to identify in each question.. Shorthand way of referring to the variable. 0,1 or Y,N, or S,F A measure of something that you should be able to identify in each question. 1) proportion 2) mean p x n Shorthand way of referring to the variable. #’s μ where x is the # of Successes in n trials X Measures of Variability Measures of Location X2 or X μ sX2 or X sX Chap 7-14 Randomness Simple Random Sampling: each subject in the population has the same chance of being included in that sample Randomness is crucial to experimentation Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-15 Variability Measurements vary from person to person Measurements vary from sample to sample Variability is the name of the game in Data Analysis! Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-16 Characteristics are Variables A Variable is any characteristic that is recorded for subjects in the study. Note that our text uses the word characteristic for parameter. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-17 Variation in Data The terminology variable highlights the fact that data values vary. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-18 Data values are called observations Each observation can be: Quantitative Categorical Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-19 Categorical Data - Sample Proportion The sample proportion of successes, denoted by p, is p sample proportion of successes number of S'snin the sample Where S is the label used for the response designated as success. The population proportion of successes is denoted by . Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-20 Describing the Center of a Data Set with the arithmetic mean The population mean is denoted by µ, is the average of all x values in the entire population. The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list). Then the single middle value if n is odd sample median= the mean of the middle two values if n is even Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-21 Mean The sum of the observations divided by the number of observations x Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. x n Chap 7-22 Median The midpoint of the observations when they are ordered from the smallest to the largest (or from the largest to the smallest) Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-23 Comparing the Sample Mean & Sample Median when a distribution is skewed positively, the mean is larger than the median, when a distribution is skewed negatively, the mean is smaller then the median, and when a distribution is symmetric, the mean and the median are equal. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-24 Illustrated Distribution Shapes Unimodal Skew negatively Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Bimodal Multimodal Symmetric Skew positively Chap 7-25 Measuring Spread: Standard Deviation Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations (x x) s n 1 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. 2 Chap 7-26 Empirical Rule For bell-shaped data sets: Approximately 68% of the observations fall within 1 standard deviation of the mean Approximately 95% of the observations fall within 2 standard deviations of the mean Approximately 100% of the observations fall within 3 standard deviations of the mean Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-27 Z Scores We can measure any observation on a variable in deviations from its mean and measure these deviations in units of the standard deviation. The resulting value is the Z-score. Computing the z score is often referred to as standardization and the z score is called a standardized score. The formula used with sample data is x z score x s Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-28 Z Scores The z score is how many standard deviations the observation is from the mean. A positive z score indicates the observation is above the mean A negative z score indicates the observation is below the mean. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-29 Multivariate Data A multivariate data set consists of measurements or observations on each of two or more variables. The data we collected from the gas use survey represents multivariate data. When we looked at a scatterplot diagram of gas use and distance of residence and distinguished between commuters and those who live on-campus, that was multivariate data analysis. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-30 Bivariate Data A bivariate data set consists of measurements or observations on each of two variables. When we produced comparative descriptive statistics and comparative box-plots, those were examples of bivariate data analysis where one of the variables was a quantitative variable and the other was a qualitative variable. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-31 Examples with one Quantitative Variable and one Qualitative. Boxplot of How much do you spend per week How much do you spend per week 40 commuter on-campus 30 20 10 0 Panel variable: Do you live on-campus or are yo Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-32 Comparative Descriptive Statistics Variable How much do you year N Mean StDev Median 1st 6 8.00 6.78 6.50 2nd 10 18.70 11.89 17.50 6 3.67 4.32 2.50 3rd 4th Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. 2 10.000 0.000000000 10.000 Chap 7-33 Example: Food Type and Pesticide Status What proportion of all sampled items contain pesticide residuals? Pesticides: Food Type: Organic Conventional Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Yes No 29 98 19485 7086 Chap 7-34 Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-35 Scatterplots A scatterplot is a plot of pairs of observed values (both quantitative) of two different variables. When one of the variables is considered to be a response variable (y) and the other an explanatory variable (x). Use for y, the variable for which we want to understand the variation. The x will be the variable that we think might help us explain the variation in y. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-36 Multivariate Data Analysis Scatterplot of How much do you vs How far is your How much do you spend per week 40 Do you live on-campus or are yo commuter on-campus 30 20 10 0 0 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. 5 10 15 20 25 How far is your residence from 30 Chap 7-37 Association Positive Association - Two variables are positively associated when above-average values of one tend to accompany above-average values of the other and below-average values tend similarly to occur together. (i.e., Generally speaking, the y values tend to increase as the x values increase.) Negative Association - Two variables are negatively associated when above-average values of one accompany below-average values of the other, and vice versa. (i.e., Generally speaking, the y values tend to decrease as the x values increase.) Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-38 The Pearson Correlation Coefficient A measure of the strength of the linear relationship between the two variables is called the Pearson correlation value of r does not depend on the unit of measurement for each variable. coefficient. The The value of r does not depend on which of the two variables is labeled x. The value of r is between Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. –1 and +1. Chap 7-39 Some Correlation Pictures Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-40 Regression Analysis Regression involves “fitting a line” through the scatterplot of the data to represent the paired observations (x,y). The relationship y = a + bx is the equation of a straight line. The value b, called the slope of the line, is the amount by which y increases when x increase by 1 unit. The value of a, called the intercept (or sometimes the vertical intercept) of the line, is the height of the line above the value x = 0. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-41 Variation The Total Variation in the dependent variable can be split into two parts That which is EXPLAINED by the independent or explanatory variable and That which is NOT EXPLAINED by the independent variable. Least squares Regression is a method for getting an equation for a line through the data that produces a line that minimizes the sum of the unexplained variation for all the observations in the data set. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-42 Using the Regression Equation The regression equation is How much do you spend per week on gas Gas Spending = 9.32 + 0.422 (Distance of residence) y = 9.32 + 0.422 x We can use this to predict how much someone spends, if we know how far they live from RIT. We just plug that value in for x and solve for y. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-43 How Good Is the Line? The coefficient r2, gives the of determination, denoted by proportion of variation in y that can be attributed to an approximate linear relationship between x and y. The r2 is between 0 and +1. The higher the value, the greater the % of variation explained by the regression equation. In our example, we were not able to explain much of the variation in the amount spent on gasoline. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 7-44