Download Basics of Statistical Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Basics of Statistical Analysis
Basics of Analysis
• The process of data analysis
Observation
Encode
Data
Information
Analysis
Example 1:
– Gift Catalog Marketer
– Mails 4 times a year to its customers
– Company has I million customers on its file
Example 1
• Cataloger would like to know if new
customers buy more than old customers?
• Classify New Customers as anyone who
brought within the last twelve months.
• Analyst takes a sample of 100,000
customers and notices the following.
Example 1
• 5000 orders received in the last month
• 3000 (60%) were from new customers
• 2000 (40%) were from old customers
• So it looks like the new customers are
doing better
Example 1
• Is there any Catch here!!!!!
• Data at this gross level, has no discrimination
between customers within either group.
– A customer who bought within the last 11 days is
treated exactly similar to a customer who bought
within the last 11 months.
Example 1
• Can we use some other variable to distinguish between
old and new Customers?
• Answer: Actual Dollars spent !
• What can we do with this variable?
– Find its Mean and Variation.
• We might find that the average purchase amount for old
customers is two or three times larger than the average
among new customers
Numerical Summaries of data
• The two basic concepts are the Center
and the Spread of the data
• Center of data
- Mean, which is given by
- Median
- Mode
n
x 
x
i 1
n
i
Numerical Summaries of data
• Forms of Variation
– Sum of differences about the mean:
n
 ( x  x)
i 1
i
n
– Variance:
2
(
x

x
)
 i
i 1
n 1
– Standard Deviation: Square Root of Variance
Confidence Intervals
• In catalog eg, analyst wants to know average
purchase amount of customers
• He draws two samples of 75 customers each
and finds the means to be $68 and $122
• Since difference is large, he draws another 38
samples of 75 each
• The mean of means of the 40 samples turns out
to be $ 94.85
• How confident should he be of this mean of
means?
Confidence Intervals
• Analyst calculates the standard deviation of
sample means, called Standard Error (SE)
• Basic Premise for confidence Intervals
– 95 percent of the time the true mean purchase
amount lies between plus or minus 1.96 standard
errors from the mean of the sample means.
• C.I. = Mean (+or-) (1.96) * Standard Error
Confidence Intervals
• However, if CI is calculated with only one
sample then
Standard Error of sample mean
= Standard deviation of sample
n
• Basic Premise for confidence Intervals with one sample
– 95 percent of the time the true mean lies between plus or minus
1.96 standard errors from the sample means.
C.I. For Response Rates
• Standard error for response rates is
S.E.=
p * (1  p) n
Where,
p = Sample response rate
n = sample size
Example 2:
• Test 1,000 names selected at random from a new list.
• To break-even the list must be expected to have a
response rate of 4.5 percent
• Confidence Interval= Expected Response (+/-) 1.96*SE
= p(+/-) 1.96*SE
• In our case C.I. = 3.22 % to 5.78%. Thus any response
between 3.22 and 5.78 % supports hypothesis that true
response rate is 4.5%
Example 2:
• The list is mailed and actually pulls in
3.5%
• Thus, the true response rate maybe 4.5%
• What if the actual rate pulled in were 5% ?
• Regression towards mean: Phenomenon
of test result being different from true
result
• Give more thought to lists whose cutoff
rates lie within confidence interval