Download Basics of Statistical Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Basics of Statistical Analysis
Basics of Analysis
• The process of data analysis
Observation
Encode
Data
Information
Analysis
Example 1:
– Gift Catalog Marketer
– Mails 4 times a year to its customers
– Company has I million customers on its file
Example 1
• Cataloger would like to know if new
customers buy more than old customers?
• Classify New Customers as anyone who
brought within the last twelve months for
first time.
• Analyst takes a sample of 100,000
customers and notices the following.
Example 1
• 5000 orders received in the last month
• 3000 (60%) were from new customers
• 2000 (40%) were from old customers
• So it looks like the new customers are
doing better
Example 1
• Is there any Catch here!!!!!
• Data at this gross level, has no discrimination
between customers within either group.
– A customer who bought within the last 11 days is
treated exactly similar to a customer who bought
within the last 11 months.
Example 1
• Can we use some other variable to distinguish between
old and new Customers?
• Answer: Actual Dollars spent !
• What can we do with this variable?
– Find its Mean and Variation.
• We might find that the average purchase amount for old
customers is two or three times larger than the average
among new customers
Numerical Summaries of data
• The two basic concepts are the Center
and the Spread of the data
• Center of data
- Mean, which is given by
- Median
- Mode
n
x 
x
i 1
n
i
Numerical Summaries of data
• Forms of Variation
– Sum of differences about the mean:
n
 ( x  x)
i 1
i
n
– Variance:
2
(
x

x
)
 i
i 1
n 1
– Standard Deviation: Square Root of Variance
Confidence Intervals
• In catalog eg, analyst wants to know average
purchase amount of customers
• He draws two samples of 75 customers each
and finds the means to be $68 and $122
• Since difference is large, he draws another 38
samples of 75 each
• The mean of means of the 40 samples turns out
to be $ 94.85
• How confident should he be of this mean of
means?
Confidence Intervals
• Analyst calculates the standard deviation of
sample means, called Standard Error (SE).
(For our example, SE is 12.91)
• Basic Premise for confidence Intervals
– 95 percent of the time the true mean purchase
amount lies between plus or minus 1.96 standard
errors from the mean of the sample means.
• C.I. = Mean (+or-) (1.96) * Standard Error
Confidence Intervals
• However, if CI is calculated with only one
sample then
Standard Error of sample mean
= Standard deviation of sample
n
• Basic Premise for confidence Intervals with one sample
– 95 percent of the time the true mean lies between plus or minus
1.96 standard errors from the sample means.
Example 2: Confidence Intervals for
response rates
• You are the marketing analyst for Online Apparel
Company
• You want to run a promotion for all customers on
your database
• In the past you have run many such promotions
• Historically you needed a 4.5% response for the
promotions to break-even
• You want to test the viability of the full-scale
promotion by running a small test promotion
Example 2: Confidence Intervals for response rates
• Test 1,000 names selected at random from the full list.
• You construct CI based on required rate of 4.5% and n=1000
• Confidence Interval= Expected Response ± 1.96*SE
• The SE=.00655, and CI is (.0322, .0578)
• In our case C.I. = 3.22 % to 5.78%. Thus any response
between 3.22 and 5.78 % supports hypothesis that true
response rate is 4.5%
© 2007 Prentice Hall
16-13
Example 2: Confidence Intervals for
response rates
• The promo is mailed to test sample. Sample
response rate is 3.5%.
• Thus, the true response rate maybe 4.5%
• What if the sample response rate were 5% ?
• Regression towards mean: Phenomenon of
test result being different from true result
• Give more thought to lists whose cutoff rates
lie within confidence interval