Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basics of Statistical Analysis Basics of Analysis • The process of data analysis Observation Encode Data Information Analysis Example 1: – Gift Catalog Marketer – Mails 4 times a year to its customers – Company has I million customers on its file Example 1 • Cataloger would like to know if new customers buy more than old customers? • Classify New Customers as anyone who brought within the last twelve months. • Analyst takes a sample of 100,000 customers and notices the following. Example 1 • 5000 orders received in the last month • 3000 (60%) were from new customers • 2000 (40%) were from old customers • So it looks like the new customers are doing better Example 1 • Is there any Catch here!!!!! • Data at this gross level, has no discrimination between customers within either group. – A customer who bought within the last 11 days is treated exactly similar to a customer who bought within the last 11 months. Example 1 • Can we use some other variable to distinguish between old and new Customers? • Answer: Actual Dollars spent ! • What can we do with this variable? – Find its Mean and Variation. • We might find that the average purchase amount for old customers is two or three times larger than the average among new customers Numerical Summaries of data • The two basic concepts are the Center and the Spread of the data • Center of data - Mean, which is given by - Median - Mode n x x i 1 n i Numerical Summaries of data • Forms of Variation – Sum of differences about the mean: n ( x x) i 1 i n – Variance: 2 ( x x ) i i 1 n 1 – Standard Deviation: Square Root of Variance Confidence Intervals • In catalog eg, analyst wants to know average purchase amount of customers • He draws two samples of 75 customers each and finds the means to be $68 and $122 • Since difference is large, he draws another 38 samples of 75 each • The mean of means of the 40 samples turns out to be $ 94.85 • How confident should he be of this mean of means? Confidence Intervals • Analyst calculates the standard deviation of sample means, called Standard Error (SE) • Basic Premise for confidence Intervals – 95 percent of the time the true mean purchase amount lies between plus or minus 1.96 standard errors from the mean of the sample means. • C.I. = Mean (+or-) (1.96) * Standard Error Confidence Intervals • However, if CI is calculated with only one sample then Standard Error of sample mean = Standard deviation of sample n • Basic Premise for confidence Intervals with one sample – 95 percent of the time the true mean lies between plus or minus 1.96 standard errors from the sample means. C.I. For Response Rates • Standard error for response rates is S.E.= p * (1 p) n Where, p = Sample response rate n = sample size Example 2: • Test 1,000 names selected at random from a new list. • To break-even the list must be expected to have a response rate of 4.5 percent • Confidence Interval= Expected Response (+/-) 1.96*SE = p(+/-) 1.96*SE • In our case C.I. = 3.22 % to 5.78%. Thus any response between 3.22 and 5.78 % supports hypothesis that true response rate is 4.5% Example 2: • The list is mailed and actually pulls in 3.5% • Thus, the true response rate maybe 4.5% • What if the actual rate pulled in were 5% ? • Regression towards mean: Phenomenon of test result being different from true result • Give more thought to lists whose cutoff rates lie within confidence interval