* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Basics of Statistical Analysis
Survey
Document related concepts
Transcript
Basics of Statistical Analysis Basics of Analysis • The process of data analysis Observation Encode Data Information Analysis Example 1: – Gift Catalog Marketer – Mails 4 times a year to its customers – Company has I million customers on its file Example 1 • Cataloger would like to know if new customers buy more than old customers? • Classify New Customers as anyone who brought within the last twelve months for first time. • Analyst takes a sample of 100,000 customers and notices the following. Example 1 • 5000 orders received in the last month • 3000 (60%) were from new customers • 2000 (40%) were from old customers • So it looks like the new customers are doing better Example 1 • Is there any Catch here!!!!! • Data at this gross level, has no discrimination between customers within either group. – A customer who bought within the last 11 days is treated exactly similar to a customer who bought within the last 11 months. Example 1 • Can we use some other variable to distinguish between old and new Customers? • Answer: Actual Dollars spent ! • What can we do with this variable? – Find its Mean and Variation. • We might find that the average purchase amount for old customers is two or three times larger than the average among new customers Numerical Summaries of data • The two basic concepts are the Center and the Spread of the data • Center of data - Mean, which is given by - Median - Mode n x x i 1 n i Numerical Summaries of data • Forms of Variation – Sum of differences about the mean: n ( x x) i 1 i n – Variance: 2 ( x x ) i i 1 n 1 – Standard Deviation: Square Root of Variance Confidence Intervals • In catalog eg, analyst wants to know average purchase amount of customers • He draws two samples of 75 customers each and finds the means to be $68 and $122 • Since difference is large, he draws another 38 samples of 75 each • The mean of means of the 40 samples turns out to be $ 94.85 • How confident should he be of this mean of means? Confidence Intervals • Analyst calculates the standard deviation of sample means, called Standard Error (SE). (For our example, SE is 12.91) • Basic Premise for confidence Intervals – 95 percent of the time the true mean purchase amount lies between plus or minus 1.96 standard errors from the mean of the sample means. • C.I. = Mean (+or-) (1.96) * Standard Error Confidence Intervals • However, if CI is calculated with only one sample then Standard Error of sample mean = Standard deviation of sample n • Basic Premise for confidence Intervals with one sample – 95 percent of the time the true mean lies between plus or minus 1.96 standard errors from the sample means. Example 2: Confidence Intervals for response rates • You are the marketing analyst for Online Apparel Company • You want to run a promotion for all customers on your database • In the past you have run many such promotions • Historically you needed a 4.5% response for the promotions to break-even • You want to test the viability of the full-scale promotion by running a small test promotion Example 2: Confidence Intervals for response rates • Test 1,000 names selected at random from the full list. • You construct CI based on required rate of 4.5% and n=1000 • Confidence Interval= Expected Response ± 1.96*SE • The SE=.00655, and CI is (.0322, .0578) • In our case C.I. = 3.22 % to 5.78%. Thus any response between 3.22 and 5.78 % supports hypothesis that true response rate is 4.5% © 2007 Prentice Hall 16-13 Example 2: Confidence Intervals for response rates • The promo is mailed to test sample. Sample response rate is 3.5%. • Thus, the true response rate maybe 4.5% • What if the sample response rate were 5% ? • Regression towards mean: Phenomenon of test result being different from true result • Give more thought to lists whose cutoff rates lie within confidence interval