Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 1 Introduction Outline and Definitions Statistics: art and science of collecting, analyzing and interpreting data Key is not only generating relevant measures but interpreting them Used in a wide variety of academic disciplines and business environment such as: i) Accounting - statistical sampling procedures when conducting audits ii) Economics – economic forecasting iii) Marketing – industry analysis and surveys iv) Production – quality control techniques v) Finance – investment prospects and maintenance vi) Insurance – actuarial science Data: facts and figures collected and summarized for a given topic, research or area of concern may be the most time consuming part of project (may also be very costly) obtained through internal systems (company databases), experiments, external sources (Hoovers), Gov’t agencies (Bureau of Labor and Statistics, Bureau of Census) and general business websites when existing data is not available conduct statistical study (experiment / survey) Data Set: complete set of data (i.e. data collected in a study) Elements: entities for which data is collected Eg) For data on Y-town businesses, individual businesses are elements Variables: unique characteristic/category of interest for each element Eg) For each Y-town business, you are collect info on type of industry, # employees, years in business, annual revenue, etc Observation: set of measurements for each element Eg) Type of industry, # employees, years in business, etc for Youngstown Propane (element) The total number of data values in a data set is the number of elements multiplied by the number of variables. Example 1 Categorical vs. Quantitative Data -- data for a variable can be described as one of 2 types A) Categorical Data: Data that is grouped or is identified by specific categories (use of labels/names). Also known in some texts as Qualitative Data Limited statistical summaries (i.e counting within categories and % of observations within categories) Can be represented by numeric labels (coding) or non-numeric labels Uses either the nominal or ordinal scales of measurement (see below) Categorical Variable: Variable whose data is represented by categorical data Example: Ratings, Sex, Religion, Nationality, College Major B) Quantitative Data: data that is numeric and contains values that indicates how many or how much Always numeric Allows for a wide variety of statistical summaries Uses either the interval or ratio scales of measurement (see below) Can be either continuous or discrete Quantitative Variable: Variable represented by quantitative data Example: Revenue, Prices, Income Scales of Measurement For each element, data collected for a specific variable is categorized as having one of 4 scales of measurement Scale of measurement is an assignment describing the type of data contained within variable information about the data Dictates the data summarization and statistical analysis that are most appropriate 1) Nominal Scale (categorical data where order is not important) Variable is described as having nominal scale when the data contains labels or names Labels can be translated into numeric codes (assignment of #’s are arbitrary) Categories are mutually exclusive Example: 2 2) Ordinal Scale (categorical data where order is important) Data has the properties of nominal data and the order or rank of the data is meaningful or important Label can be translated into numeric code (numeric coding is normally a logical process that follows order/rank) For numeric codes, difference between values are meaningless Example: Note: Order is important but interval between each value may not be the same or not well defined 3 3) Interval Scale (Quantitative data where differences between the data are meaningful and measurable) Data has the properties of ordinal data, and the interval (distance) between observations is expressed in terms of fixed unit of measure (standardized units) distance between values is measured in equal units or constant size across all levels of the scale With fixed or standardized units, differences between data values becomes meaningful regardless of position on scale Always numeric Point 0 is just another point on scale Does not indicate that nothing exists for that variable at that level You must ask the question: Does a 0 value indicate that nothing exists at that value? If answer is no, then numeric data is interval scale Example: 4 4) Ratio Scale (Quantitative data where the ratio of 2 values is meaningful) Data for a particular variable is ratio scale if it has all the proprieties of interval data and the ratio of two values is meaningful Scale must contain a true 0 value indicating that nothing exists for the variable at the 0 point (absence of characteristic) does it satisfy the absence logic? You must ask the question: Does a 0 value indicate that nothing exists at that value? If answer is yes, then numeric data is ratio scale Example: Cross Sectional and Time Series Data Cross Sectional Data: data collected at the same or approximately the same point in time Example: Time Series Data: data collected over an extended period of time Purpose is to show a comparison over time 5 Descriptive Statistics: Method for summarizing and describing a given set of data Purpose: to make sense of a large volume of info Form: tables, graphs and numeric summaries (most common is the average or mean) Statistical Inference: Using sample data from a population to draw conclusions and make predictions about the characteristics of a whole population Population: entire set of elements in a given study Sample: portion or subset of the population Example: Data Mining: Methods for developing useful decision-making information using data from large databases Use of statistics, mathematics and computer programming to convert raw data into useful reports/summaries for forecasting, prediction and daily decision making Data mining begins with data warehousing (process of capturing, storing and maintaining data) Data Warehousing Data Mining Reports/Summaries Appendix: Matrix for Scales of Measurement Nominal Scale Ordinal Scale Data categories are Data categories are 1. mutually exclusive 1. mutually exclusive 2. have a logical order (scaled according to the amount of the particular characteristic they possess) Interval Scale Data categories are 1. mutually exclusive 2. have a logical order 3. Equal distance (differences in the characteristics are represented by equal differences in the numbers) Difference or distance is standardized * The point 0 is just another point on the scale. gender, ethnicity, religious affiliation class, grade (A, B, C, D, & F), ranks Temperature measured by Celsius and Fahrenheit 6 Ratio Scale Data categories are 1. mutually exclusive 2. have a logical order 3. Equal distance 4. true zero point (The point 0 reflects an absence of the characteristic) * Can do all the mathematical operations usually associated with numbers, including ratios. age, time, height, weight, # of chairs in a room