Download A. Observed Frequencies B. Cell Percentages

Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009 Today’s Agenda     Announcements Secondary data quality Measure types Hypothesis Testing and Chi-Square Announcements • National Insurance Case for Sat. 8/22 – Stephen will do a tutorial today, Friday, 8/21 from 1:00 -2:15 in the MBA PC Lab and be available tonight from 7 – 9 pm in the MBA PC Lab to answer questions – Submit slides by 8:00 am on Sat. 8/22 – 2 slides with your conclusions – you may add Appendices to support you conclusions 3 Primary vs. Secondary Data Primary -- collected anew for current purposes Secondary -- exists already, was collected for some other purpose Finding Secondary Data Online @ Fuqua  http://library.fuqua.duke.edu Primary vs. Secondary Data Evaluating Sources of Secondary Data If you can’t find the source of a number, don’t use it. Look for further data. Always give sources when writing a report. Applies for Focus Group write-ups too Be skeptical. Secondary Data: Pros & Cons Advantages cheap quick often sufficient there is a lot of data out there Disadvantages there is a lot of data out there numbers sometimes conflict categories may not fit your needs Types of Secondary Data Database: Can Slice/Dice; Need more processing Summary: Can’t change categories, get new crosstabs Internal External WEMBA_C IMS Health, Nielsen, IRI* Knowledge Management Conquistador, Simmons, IRI_factbook *IRI = Information Resources, Inc. (http://us.infores.com/) Secondary Data Quality: KAD p. 120 & “What’s Behind the Numbers?” Data consistent with other independent sources? What are the classifications? Do they fit needs? When were numbers collected? Obsolete? Who collected the numbers? Bias, resources? Why were the data collected? Self-interest? How were the numbers generated? Sample size Sampling method Measure type Causality (MBA Marketing Timing & Internship) It is Hard to Infer Causality from Secondary Data Took Core Marketing Did Not Get Desired Marketing Internship Term 1 Got Desired Marketing Internship 76% Term 3 51% 49% 24% Today’s Agenda     Announcements Secondary data quality Measure types Hypothesis Testing and Chi-Square Measure Types Nominal: Unordered Categories Male=1; Female = 2; Ordinal: Ordered Categories, intervals can’t be assumed to be equal. I-95 is east of I-85; I-80 is north of I-40; Preference data Interval: Equally spaced categories, 0 is arbitrary and units arbitrary.  Fahrenheit temperature – each degree is equal, Attitudes Ratio: Equally spaced categories, 0 on scale means 0 of underlying quantity.  $ Sales, Market Share Meaningful Statistics & Permissible Transformations Examples Permissible Transform Meaningful Stats Ratio Q1 = Bottles of wine Q2 = b*Q1 e.g., cases sold (b = 1/12) All below + % change Interval Wine Rating Scale 1 = Very Bad to 20 = Very Good Rank order of wines 1 = favorite 2 = 2nd preferred 3 = least preferred All below + mean Ordinal Nominal 1 = Pinot Noir 2 = Merlot 3 = Chardonnay Att2 = a + (b*Att1) e.g., 81 to 100 (a = 80, b = 1) e.g., 80.5 to 90 (a = 80, b = .5) Any order preserving 100 = favorite 90 = 2nd preferred 0 = least preferred Any transformation is ok 16 = Pinot Noir 3 = Merlot 13 = Chardonnay All below + median # of cases mode Means and Medians with Ordinal Data Gender Measure 1 Measure 2 Means M 1 1 Measure 1 M 2 2 M=5.4 < F=5.6 F 3 3 Measure 2 F 4 4 M=65.4 > F=25.6 F 5 5 F 6 6 Medians M 7 107 Measure 1 M 8 108 M=7 > F=5 M 9 109 Measure 2 F 10 110 M=107 > F=5 Ratio Scales & Index Numbers Index= 100* (Per Capita Segment i) / (Per Capita Ave) (000s) Sales Per Capita Segment Age Group Population Units (000) Sales Index <25 700 1400 2.00 70 25-34 500 1250 2.50 88 35-44 300 900 3.00 105 45-54 240 960 4.00 140 55 + 260 1196 4.60 161 Total 2000 5706 2.85 100 Today’s Agenda       Announcements Southwestern Conquistador Beer Case Backward Market Research Secondary data quality Measure types Hypothesis Testing and Chi-Square Cross Tabs of MBA Acceptance by Gender A. Raw Frequencies Accept Reject M 140 860 1000 F 60 740 800 200 1600 B. Cell Percentages Accept Reject M .078 .478 .556 F .033 .411 .444 .111 .889 1.0 C. M F D. M F Row Percentages Accept Reject 140/1000 = .140 60/800 =.075 860/1000 = .860 740/800 = .925 Column Percentages Accept Reject 140/200 = .700 60/200 =.300 1.00 860/1600 = .538 740/1600 = .462 1.00 1.00 1.00 Rule of Thumb If a potential causal interpretation exists, make numbers add up to 100% at each level of the causal factor. Above: it is possible that gender (row) causes or influences acceptance (column), but not that acceptance influences gender. Hence, row percentages (format C) would be desirable. Hypothesis Formulation and Testing Hypothesis: What you believe the relationship is between the measures. Theory Empirical Evidence Beliefs Experience Here: Believe that acceptance is related to gender Null Hypothesis: Acceptance is not related to gender Logic of hypothesis testing: Negative Inference The null hypothesis will be rejected by showing that a given observation would be quite improbable, if the hypothesis was true. Want to see if we can reject the null. Steps in Hypothesis Testing 1. State the hypothesis in Null and Alternative Form – Ho: There is no relationship between gender and MBA acceptance – Ha1: Gender and Acceptance are related (2-sided) – Ha2: Fewer Women are Accepted (1-sided) 2. Choose a test statistic 3. Construct a decision rule Chi-Square Test Used for nominal data, to compare the observed frequency of responses to what would be “expected” under the null hypothesis. Two types of tests Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists between the two variables Goodness of fit test – Compare whether the data sampled is proportionate to some standard Chi-Square Test (Oi  Ei )   Ei i 1 k 2 2 With (r-1)*(c-1) degrees of freedom number in cell i Oi Observed number in cell i Ei Expected under independence i k number of cells r number of rows c number of columns Ei = Column Proportion * Row Proportion * total number observed MBA Acceptance Data Contingency A. Observed Frequencies Accept Reject M 140 860 1000 F 60 740 800 200 1600 1800 C. B. Cell Percentages Accept Reject M .078 .478 .556 F .033 .411 .444 .111 .889 1.0 Expected Frequencies Accept Reject M .111*.556*1800=111 .889*.556*1800=890 F .111*.444*1800= 89 .889*.444*1800=710 Chi-Square Test (Oi  Ei )   Ei i 1 k 2 2  With (r-1)*(c-1) degrees of freedom 2 =(140-111)2/111 + (860-890)2/890 + (60-89)2/89 + (740-710)2/710 = 19.30 So? i 3. Construct a decision rule Decision Rule 1. Significance Level -   .05 Probability of rejecting the Null Hypothesis, when it is true 2. Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (r-1)*(c-1), so here that would be 1. When the number of cells is larger, we need a larger test statistic to reject the null. 3. Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517 Ha1: Gender and Acceptance are related (2-sided) Critical Value = 3.84 Ha2: Fewer Women are Accepted (1-sided) Critical Value = 2.71 4. Decision Rule: Reject the Ho if calculated Chi-sq value (19.3) > the test critical value (3.84) for Ha1 or (2.71) for Ha2 Chi-Square Table Chi-Square Test Used for nominal data, to compare the observed frequency of responses to what would be “expected” under some specific null hypothesis. Two types of tests Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists Goodness of fit test – Compare whether the data sampled is proportionate to some standard Goodness of fit – Chi-Square Ho: Car Color Preferences have not shifted Ha: Car color Preferences have shifted Data Red 680 Green 520 Black 675 White 625 Tot (n) 2500 Historic Distribution Expected # = Prob*n 30% 25% 25% 20% Do we observe what we expected? 750 625 625 500 Chi-Square Test (Oi  Ei )   Ei i 1 k 2 2  With (k-1) degrees of freedom 2 =(680-750)2/750 + (520-625)2/625 + (675-625)2/625 + (625-500)2/500 = 59.42 i So? 3. Construct a decision rule Decision Rule 1. Significance Level -   .05 Probability of rejecting the Null Hypothesis, when it is true 2. Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (k-1), so here that would be 3. When the number of cells is larger, we need a larger test statistic to reject the null. 3. Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517 Ha: Preference have changed (2-sided) Critical Value = 7.81 4. Decision Rule: Reject the Ho if calculated Chi-sq value (59.42) > the test critical value (7.81). Chi-Square Table Recap Finding & Evaluating Secondary Data Measure Types permissible transformations Meaningful statistics Index #s Crosstabs Casting right direction Chi-square statistic Contingency Test Goodness of Fit Test

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A. Observed Frequencies B. Cell Percentages