Download IS 'STUP_ID' REALLY "STUPID"?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Versant Object Database wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Business intelligence wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Clusterpoint wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Database model wikipedia , lookup

Transcript
IS ‘STUP_ID’ REALLY “STUPID”?
Sergey Sian, PreVision Marketing, Lincoln, MA
INTRODUCTION
Direct mail promotion is a pretty complicated
process: you need extract information from
different sources in a database (customer,
transactions, promotion tables, etc.). Then you
must merge some of those files with “external”
files, which have additional information
necessary for analysis. In many cases you don’t
catch everyone you want on your mailing list by
merging data sets just by customer ID (cust_ID).
You may want to get more prospects for direct
mail by merging files by ‘stup_ID’, which really is
a combination (concatenation) of last name, city,
ZIP code and, say, the first 9 bytes of an
address field.
GETTING MORE CUSTOMERS
For example, let’s say there is a chain of grocery
stores “Best European Food,” which maintains a
database of all customer information (cust_ID –
let’s call it food_ID, first name, last name,
address, city, state, ZIP code, etc.) as well as
transactions. At the same time there is a
company “Fast Delivery,” which distributes and
delivers goods purchased from “Best European
Food” either by phone or via the Internet. The
“Fast Delivery” company keeps its own database
(cust_ID – let’s call it fast_ID, first name, last
name, address, city, state, ZIP code, etc.). The
objective of a mail tape (direct mail) is to
reach customers, who spent on average
more then $45/week (during the last 25-week
period) and who, at the same time, use “Fast
Delivery” company to get food at home.
To solve this problem the first intention is to
- extract transaction data from both
databases (“Best European Food” and “Fast
Delivery”) for a defined time period.
-
-
Roll both transaction data up to
“customer” level by week and
keep only customers who spent
on average more than $45/week.
Later on merge both files by “cust_ID”
to get a “mailable” audience.
With this logic we get the following numbers:
- 813,842 records (customer level) have
been extracted from the “Best European
Food” database.
- 45,768 records (customer level) have
been extracted from the “Fast Delivery”
database.
- After merging the two files by ‘cust_ID’ we
get 26,541 customers for the direct mail
promotion (customers exist in both files).
But let’s keep in mind that those two
companies created and maintain their
databases differently (different systems,
database structure, update process, etc.). In
our case they even allocated different
numbers of bytes to ‘cust_ID’; that’s why
before the merge we manipulated this field.
To increase the number of “matched”
customers we decided implement a slightly
different approach. We put aside those
26,541 customers, whom we’ve found in both
files based on the ‘cust_ID’ field. Then we
standardized the customer’s “First” and “Last”
names as well as all “address” components
(street, city, ZIP code, etc.), using PostalSoft
software (DataRight and ACE modules) for
the rest of both files. Later on in SAS we
defined a new customer ID as:
stup_id=left(compress(lastn))||left(compress
(city))||left(compress(zip))||left(compress
(substr(address,1,9)));
The order of variables in this concatenation is
very important. After merging the two files by
‘stup_ID’ we get an extra 8,489 customers for
the direct mail promotion. We could run
merge/purge in PostalSoft using different
criteria (exact, tight, medium, etc.) to match
different fields, but in SAS it runs much faster (in
spite of the “exact” match).
FINDING OUT WHETHER CUSTOMERS
MADE PURCHASES IN STORES
OTHER THAN THEIR PRIMARY
STORES
This helps better understand the customer’s
behavior as well as their purchase activity.
It is important as well to assign labels such as
the “first” store, the “recent” store, etc.
CLEANING UP THE FILE TO BE MAILED
The following code might be helpful.
After the final “mailable” file has been created in
SAS (sometimes it has millions of records), we
should convert it into ASCII [“Dump a SAS
data set to a flat file to use it with any PostalSoft
products for business and marketing analysis,”
S. Sianissian (1999)] and send it out to a
lettershop to be presorted. The presort module is
very sensitive to any “unexpected” values, such
as slashes (/) or back slashes (\). It just stops
work when it finds them in fields like “First
Name”, “Last Name”, or “City”.
%macro ckstore(store);
proc sql;
Θ
connect to oracle(user=username
pw=password path=”path");
create table stor&store(compress=yes)
as select * from connection to oracle
(select distinct
transa.CUST_ID as custid
from srt.trans transa
where transa.store_id=&store);
The following code will help solve this problem.
/* Check if we have “unexpected” values */
data one(compress=yes);
set mydata(keep=custid fname lname city);
by custid;
where (fname like '%/%') OR (lname like '%/%')
OR (fname like '%\%') OR (lname like '%\%');
run;
/* Delete slashes (/) */
data mydata(compress=yes);
set mydata;
by custid;
fname =scan(fname,1,"/");
lname =scan(lname,1,"/");
run;
/* Delete back slashes (\) */
data mydata(compress=yes);
set mydata;
by custid;
fname=scan(fname,1,"\");
lname=scan(lname,1,"\");
run;
create table ozestore(compress=yes)
as select * from connection to oracle
(select distinct
transo.CUST_ID as custid,
transo.store_id as store
from srt.trans transo
where transo.store_id ^= &store);
disconnect from oracle;
quit;
proc sort data=stor&store;
by custid;
run;
data _NULL_;
set stor&store END=last;
if last then do;
call symput ('storeobs', _N_);
end;
run;
proc sort data=ozestore;
by custid;
run;
data stor&store(compress=yes);
merge ozestore(IN=other)
stor&store(IN=mystore);
by custid;
if other AND mystore;
run;
♣
/* ABORT EXECUTION IF THERE ARE */
/* NO OBSERVATIONS IN "stor&store" */
/*
FILE AND PUT A NOTE
*/
proc sql;
create table two as
select memname, nobs
from dictionary.tables
where libname='WORK' AND
memname="STOR&store";
quit;
data two;
set two;
if nobs=0 then do;
put "Nobody whose Primary
store=&store shopped in
other stores.";
put /;
abort;
end;
run;
data _NULL_;
set stor&store END = last;
if last then do;
call symput ('mergeobs', _N_);
end;
run;
proc datasets;
delete ozestore two;
run;
options nodate pageno=1;
proc print data=stor&store;
title "The number of customers whose
primary”;
title2 “store=&store is equal to &storeobs..”;
title3 “But &mergeobs shopped in provided”;
title4 “list of stores";
footnote "PREVISION
&sysdate";
run;
%mend ckstore;
%ckstore(1207)
%ckstore(1306)
%ckstore(1323)
Θ
♦
From a database, pull out a list of
customers who shopped in the store we
are interested in, as well as a list of
those who didn’t shop (in one query
fetch).
Get the number of ‘distinct’ customers
who shopped in that store.
♠
data stor&store;
⊗
set stor&store;
percet=round((&mergeobs/&storeobs)*100,.01);
drop custid;
run;
proc transpose data=stor&store
⊕
(keep=percet store) out=stor&store;
id store;
run;
♣
Define how many customers
shopped in other stores.
♠
Get the number of distinct customers
who shopped in other stores.
⊗
Calculate the percent of those who
shopped in other stores.
⊕
Create a list of “other” stores.
♦
Call macro.
♣
♠
LAST TIP FOR TODAY
As you get more and more experienced in
SAS, your programs will become more efficient,
but at the same time more sophisticated and
time consuming (especially if you need to sort a
huge file several times, or call macro(s) many
times). If a program runs more then half an hour,
you are likely to submit it in batch, or while it’s
running you might want concentrate on
something else. In this case it would be nice to
get a note saying that the program has finished
(for instance, via e-mail). You can do this as
follows.
⊗
⊕
♦
•
E-mail’s body.
Executable application, which might be
easily downloaded from Internet
(sendeml.exe); server’s name, where
e-mail “lives” in your company;
from whom the e-mail would be
distributed.
To whom e-mail would be
distributed.
E-mail’s subject.
Switch to an H: drive
Execute created batch file.
CONCLUSION
data test;
input a b c;
total=a+b+c;
mean=total/3;
cards;
123
;
run;
data _null_;
file 'H:\send_e_mail.bat';
put
'H:'
/
'echo Your Program just finished!!!|
sendeml pm-svr03 server@pm-svr05
[email protected]
FYI:'
/
'exit'
;
run;
data _NULL_;
X 'H:';
call system ("send_e_mail.bat");
run;
Θ
Θ
These SAS code samples demonstrate
some advanced techniques you can use
while working with a big data warehouse to
select an audience for direct mail promotion,
to understand better the customer’s behavior
as well as their purchase activity.
The paper touches on how to “jump”
between SAS and your PC (with DOS or
Window NT on it).
ACKNOWLEDGMENTS
♣
♠
⊗
⊕
Song Jungdong and Chen Zhao contributed
extensively to the development of this paper.
Their support and suggestions are greatly
appreciated.
♦
•
PreVision Marketing is a customer
relationship management agency specializing
in the development and implementation of
relational database and relationship marketing
strategies, including comprehensive customer
loyalty, upgrade and acquisition programs.
The body of your main sophisticated
program.
Create a batch executable file on your
H: drive.
PreVision Marketing provides strategic,
analytic, creative and mail production
services, along with support in the selection
and efficient use of the newest database
technologies.
AUTHOR CONTACT INFORMATION
TRADEMARK INFORMATION
Sergey Sian
PreVision Marketing, Inc
55 Old Bedford Road
Lincoln, MA 01773
Direct: (781) 259-5169
Fax:
(781) 259-1548
E-mail: [email protected]
SAS is a registered trademark of SAS
Institute Inc. in the USA and other countries.
 indicates USA registration. PostalSoft is a
registered trademark of FirstLogic
Incorporation.