Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Versant Object Database wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Business intelligence wikipedia , lookup
Information privacy law wikipedia , lookup
Data vault modeling wikipedia , lookup
Clusterpoint wikipedia , lookup
IS ‘STUP_ID’ REALLY “STUPID”? Sergey Sian, PreVision Marketing, Lincoln, MA INTRODUCTION Direct mail promotion is a pretty complicated process: you need extract information from different sources in a database (customer, transactions, promotion tables, etc.). Then you must merge some of those files with “external” files, which have additional information necessary for analysis. In many cases you don’t catch everyone you want on your mailing list by merging data sets just by customer ID (cust_ID). You may want to get more prospects for direct mail by merging files by ‘stup_ID’, which really is a combination (concatenation) of last name, city, ZIP code and, say, the first 9 bytes of an address field. GETTING MORE CUSTOMERS For example, let’s say there is a chain of grocery stores “Best European Food,” which maintains a database of all customer information (cust_ID – let’s call it food_ID, first name, last name, address, city, state, ZIP code, etc.) as well as transactions. At the same time there is a company “Fast Delivery,” which distributes and delivers goods purchased from “Best European Food” either by phone or via the Internet. The “Fast Delivery” company keeps its own database (cust_ID – let’s call it fast_ID, first name, last name, address, city, state, ZIP code, etc.). The objective of a mail tape (direct mail) is to reach customers, who spent on average more then $45/week (during the last 25-week period) and who, at the same time, use “Fast Delivery” company to get food at home. To solve this problem the first intention is to - extract transaction data from both databases (“Best European Food” and “Fast Delivery”) for a defined time period. - - Roll both transaction data up to “customer” level by week and keep only customers who spent on average more than $45/week. Later on merge both files by “cust_ID” to get a “mailable” audience. With this logic we get the following numbers: - 813,842 records (customer level) have been extracted from the “Best European Food” database. - 45,768 records (customer level) have been extracted from the “Fast Delivery” database. - After merging the two files by ‘cust_ID’ we get 26,541 customers for the direct mail promotion (customers exist in both files). But let’s keep in mind that those two companies created and maintain their databases differently (different systems, database structure, update process, etc.). In our case they even allocated different numbers of bytes to ‘cust_ID’; that’s why before the merge we manipulated this field. To increase the number of “matched” customers we decided implement a slightly different approach. We put aside those 26,541 customers, whom we’ve found in both files based on the ‘cust_ID’ field. Then we standardized the customer’s “First” and “Last” names as well as all “address” components (street, city, ZIP code, etc.), using PostalSoft software (DataRight and ACE modules) for the rest of both files. Later on in SAS we defined a new customer ID as: stup_id=left(compress(lastn))||left(compress (city))||left(compress(zip))||left(compress (substr(address,1,9))); The order of variables in this concatenation is very important. After merging the two files by ‘stup_ID’ we get an extra 8,489 customers for the direct mail promotion. We could run merge/purge in PostalSoft using different criteria (exact, tight, medium, etc.) to match different fields, but in SAS it runs much faster (in spite of the “exact” match). FINDING OUT WHETHER CUSTOMERS MADE PURCHASES IN STORES OTHER THAN THEIR PRIMARY STORES This helps better understand the customer’s behavior as well as their purchase activity. It is important as well to assign labels such as the “first” store, the “recent” store, etc. CLEANING UP THE FILE TO BE MAILED The following code might be helpful. After the final “mailable” file has been created in SAS (sometimes it has millions of records), we should convert it into ASCII [“Dump a SAS data set to a flat file to use it with any PostalSoft products for business and marketing analysis,” S. Sianissian (1999)] and send it out to a lettershop to be presorted. The presort module is very sensitive to any “unexpected” values, such as slashes (/) or back slashes (\). It just stops work when it finds them in fields like “First Name”, “Last Name”, or “City”. %macro ckstore(store); proc sql; Θ connect to oracle(user=username pw=password path=”path"); create table stor&store(compress=yes) as select * from connection to oracle (select distinct transa.CUST_ID as custid from srt.trans transa where transa.store_id=&store); The following code will help solve this problem. /* Check if we have “unexpected” values */ data one(compress=yes); set mydata(keep=custid fname lname city); by custid; where (fname like '%/%') OR (lname like '%/%') OR (fname like '%\%') OR (lname like '%\%'); run; /* Delete slashes (/) */ data mydata(compress=yes); set mydata; by custid; fname =scan(fname,1,"/"); lname =scan(lname,1,"/"); run; /* Delete back slashes (\) */ data mydata(compress=yes); set mydata; by custid; fname=scan(fname,1,"\"); lname=scan(lname,1,"\"); run; create table ozestore(compress=yes) as select * from connection to oracle (select distinct transo.CUST_ID as custid, transo.store_id as store from srt.trans transo where transo.store_id ^= &store); disconnect from oracle; quit; proc sort data=stor&store; by custid; run; data _NULL_; set stor&store END=last; if last then do; call symput ('storeobs', _N_); end; run; proc sort data=ozestore; by custid; run; data stor&store(compress=yes); merge ozestore(IN=other) stor&store(IN=mystore); by custid; if other AND mystore; run; ♣ /* ABORT EXECUTION IF THERE ARE */ /* NO OBSERVATIONS IN "stor&store" */ /* FILE AND PUT A NOTE */ proc sql; create table two as select memname, nobs from dictionary.tables where libname='WORK' AND memname="STOR&store"; quit; data two; set two; if nobs=0 then do; put "Nobody whose Primary store=&store shopped in other stores."; put /; abort; end; run; data _NULL_; set stor&store END = last; if last then do; call symput ('mergeobs', _N_); end; run; proc datasets; delete ozestore two; run; options nodate pageno=1; proc print data=stor&store; title "The number of customers whose primary”; title2 “store=&store is equal to &storeobs..”; title3 “But &mergeobs shopped in provided”; title4 “list of stores"; footnote "PREVISION &sysdate"; run; %mend ckstore; %ckstore(1207) %ckstore(1306) %ckstore(1323) Θ ♦ From a database, pull out a list of customers who shopped in the store we are interested in, as well as a list of those who didn’t shop (in one query fetch). Get the number of ‘distinct’ customers who shopped in that store. ♠ data stor&store; ⊗ set stor&store; percet=round((&mergeobs/&storeobs)*100,.01); drop custid; run; proc transpose data=stor&store ⊕ (keep=percet store) out=stor&store; id store; run; ♣ Define how many customers shopped in other stores. ♠ Get the number of distinct customers who shopped in other stores. ⊗ Calculate the percent of those who shopped in other stores. ⊕ Create a list of “other” stores. ♦ Call macro. ♣ ♠ LAST TIP FOR TODAY As you get more and more experienced in SAS, your programs will become more efficient, but at the same time more sophisticated and time consuming (especially if you need to sort a huge file several times, or call macro(s) many times). If a program runs more then half an hour, you are likely to submit it in batch, or while it’s running you might want concentrate on something else. In this case it would be nice to get a note saying that the program has finished (for instance, via e-mail). You can do this as follows. ⊗ ⊕ ♦ • E-mail’s body. Executable application, which might be easily downloaded from Internet (sendeml.exe); server’s name, where e-mail “lives” in your company; from whom the e-mail would be distributed. To whom e-mail would be distributed. E-mail’s subject. Switch to an H: drive Execute created batch file. CONCLUSION data test; input a b c; total=a+b+c; mean=total/3; cards; 123 ; run; data _null_; file 'H:\send_e_mail.bat'; put 'H:' / 'echo Your Program just finished!!!| sendeml pm-svr03 server@pm-svr05 [email protected] FYI:' / 'exit' ; run; data _NULL_; X 'H:'; call system ("send_e_mail.bat"); run; Θ Θ These SAS code samples demonstrate some advanced techniques you can use while working with a big data warehouse to select an audience for direct mail promotion, to understand better the customer’s behavior as well as their purchase activity. The paper touches on how to “jump” between SAS and your PC (with DOS or Window NT on it). ACKNOWLEDGMENTS ♣ ♠ ⊗ ⊕ Song Jungdong and Chen Zhao contributed extensively to the development of this paper. Their support and suggestions are greatly appreciated. ♦ • PreVision Marketing is a customer relationship management agency specializing in the development and implementation of relational database and relationship marketing strategies, including comprehensive customer loyalty, upgrade and acquisition programs. The body of your main sophisticated program. Create a batch executable file on your H: drive. PreVision Marketing provides strategic, analytic, creative and mail production services, along with support in the selection and efficient use of the newest database technologies. AUTHOR CONTACT INFORMATION TRADEMARK INFORMATION Sergey Sian PreVision Marketing, Inc 55 Old Bedford Road Lincoln, MA 01773 Direct: (781) 259-5169 Fax: (781) 259-1548 E-mail: [email protected] SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. indicates USA registration. PostalSoft is a registered trademark of FirstLogic Incorporation.