Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
bindex.indd 848 3/9/2011 7:54:10 PM Data Mining Techniques Third Edition ffirs.indd i 3/8/2011 3:06:13 PM ffirs.indd ii 3/8/2011 3:06:13 PM Data Mining Techniques For Marketing, Sales, and Customer Relationship Management Third Edition Gordon S. Linoff Michael J. A. Berry ffirs.indd iii 3/8/2011 3:06:13 PM Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2011 by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-0-470-65093-6 ISBN: 978-1-118-08745-9 (ebk) ISBN: 978-1-118-08747-3 (ebk) ISBN: 978-1-118-08750-3 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read. For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Library of Congress Control Number: 2011921769 Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affi liates, in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. Wiley Publishing, Inc. is not associated with any product or vendor mentioned in this book. ffirs.indd iv 3/8/2011 3:06:15 PM To Stephanie, Sasha, and Nathaniel. Without your patience and understanding, this book would not have been possible. — Michael To Puccio. Grazie per essere paziente con me. Ti amo. — Gordon ffirs.indd v 3/8/2011 3:06:15 PM ffirs.indd vi 3/8/2011 3:06:15 PM About the Authors Gordon S. Linoff and Michael J. A. Berry are well known in the data mining field. They are the founders of Data Miners, Inc., a boutique data mining consultancy, and they have jointly authored several influential and widely read books in the field. The first of their jointly authored books was the first edition of Data Mining Techniques, which appeared in 1997. Since that time, they have been actively mining data in a wide variety of industries. Their continuing hands-on analytical work allows the authors to keep abreast of developments in the rapidly evolving fields of data mining, forecasting, and predictive analytics. Gordon and Michael are scrupulously vendor-neutral. Through their consulting work, the authors have been exposed to data analysis software from all of the major software vendors (and quite a few minor ones as well). They are convinced that good results are not determined by whether the software employed is proprietary or open-source, command-line or point-and-click; good results come from creative thinking and sound methodology. Gordon and Michael specialize in applications of data mining in marketing and customer relationship management — applications such as improving recommendations for cross-sell and up-sell, forecasting future subscriber levels, modeling lifetime customer value, segmenting customers according to their behavior, choosing optimal landing pages for customers arriving at a website, identifying good candidates for inclusion in marketing campaigns, and predicting which customers are at risk of discontinuing use of a software package, service, or drug regimen. Gordon and Michael are dedicated to sharing their knowledge, skills, and enthusiasm for the subject. When not mining data themselves, they enjoy teaching others through courses, lectures, articles, on-site classes, and of course, the book you are about to read. They can frequently be found speaking at conferences and teaching classes. The authors also maintain a data mining blog at blog.data-miners.com. vii ffirs.indd vii 3/8/2011 3:06:15 PM viii About the Authors Gordon lives in Manhattan. His most recent book before this one is Data Analysis Using SQL and Excel, which was published by Wiley in 2008. Michael lives in Cambridge, Massachusetts. In addition to his consulting work with Data Miners, he teaches Marketing Analytics at the Carroll School of Management at Boston College. ffirs.indd viii 3/8/2011 3:06:15 PM Credits Executive Editor Robert Elliott Senior Project Editor Adaobi Obi Tulton Production Editor Daniel Scribner Vice President and Executive Group Publisher Richard Swadley Vice President and Executive Publisher Barry Pruett Copy Editor Paula Lowell Associate Publisher Jim Minatel Editorial Director Robyn B. Siesky Project Coordinator, Cover Katie Crocker Editorial Manager Mary Beth Wakefield Proofreaders Word One New York Freelancer Editorial Manager Rosemarie Graham Indexer Ron Strauss Marketing Manager Ashley Zurcher Cover Image Ryan Sneed Production Manager Tim Tate Cover Designer © PhotoAlto/Alix Minde/GettyImages ix ffirs.indd ix 3/8/2011 3:06:16 PM ffirs.indd x 3/8/2011 3:06:16 PM Acknowledgments We are fortunate to be surrounded by some of the most talented data miners anywhere, so our first thanks go to our colleagues, past and present, at Data Miners, Inc., from whom we have learned so much: Will Potts, Dorian Pyle, and Brij Masand. There are also clients with whom we work so closely that we consider them our colleagues and friends as well: Harrison Sohmer, Stuart E. Ward, III, and Michael Benigno are in that category. Our editor, Bob Elliott, kept us (more or less) on schedule and helped us maintain a consistent style. SAS Institute and the Data Warehouse Institute have given us unparalleled opportunities over the past 12 years for teaching. We owe special thanks to Herb Edelstein (now retired), Herb Kirk, Anne Milley, Bob Lucas, Hillary Kokes, Karen Washburn, and many others who have made these classes possible. Over the past year, while we were writing this book, several friends and colleagues have been very supportive. We would like to acknowledge Diane and Savvas Mavridis, Steve Mullaney, Lounette Dyer, Maciej Zworski, John Wallace, Paul Rosenblum, and Don Wedding. We also want to acknowledge all the people with whom we have worked in scores of data mining engagements over the years. We have learned something from every one of them. Among the many who have helped us throughout the years: Alan Parker Dave Waltz Craig Stanfill Dirk De Roos Michael Alidio Michael Cavaretta Dave Duling Jeff Hammerbacher Andrew Gelman Gary King Tim Manns Jeremy Pollock Richard James Georgia Tourasi Avery Wang Eric Jiang Bruce Rylander Daryl Berry xi ffirs.indd xi 3/8/2011 3:06:16 PM xii Acknowledgments Doug Newell Ed Freeman Erin McCarthy Josh Goff Karen Kennedy Ronnie Rowton Kurt Thearling Mark Smith Nick Radcliffe Patrick Surry Ronny Kohavi Terri Kowalchuk Victor Lo Yasmin Namini Zai Ying Huang Amber Batata Adam Schwebber Tiha Ghyczy Usama Fayyad Patrick Ott John Muller Frank Travisano Jim Stagnito Stephen Boyer Yugo Kanazawa Xu He Kiran Nagarur Ramana Thumu Jacob Hauskens Jeremy Pollock Lutz Hamel And, of course, all the people we thanked in the first edition are still deserving of acknowledgment: Bob Flynn Bryan McNeely Claire Budden David Isaac David Waltz Dena d’Ebin Diana Lin Don Peppers Ed Horton Edward Ewen Fred Chapman Gary Drescher Gregory Lampshire Janet Smith Jerry Modes Jim Flynn Kamran Parsaye Karen Stewart Larry Bookman Larry Scroggins Lars Rohrberg Lounette Dyer Marc Goodman Marc Reifeis Marge Sherold Mario Bourgoin Prof. Michael Jordan Patsy Campbell Paul Becker Paul Berry Rakesh Agrawal Ric Amari Rich Cohen Robert Groth Robert Utzschnieder Roland Pesch Stephen Smith Sue Osterfelt Susan Buchanan Syamala Srinivasan Wei-Xing Ho William Petefish Yvonne McCollin Finally, we would like to thank our family and friends, particularly Stephanie and Giuseppe, who have endured with grace the sacrifices in writing this book. ffirs.indd xii 3/8/2011 3:06:16 PM Contents at a Glance Introduction xxxvii Chapter 1 What Is Data Mining and Why Do It? 1 Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27 Chapter 3 The Data Mining Process 67 Chapter 4 Statistics 101: What You Should Know About Data 101 Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151 Chapter 6 Data Mining Using Classic Statistical Techniques 195 Chapter 7 Decision Trees 237 Chapter 8 Artificial Neural Networks 281 Chapter 9 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 321 Chapter 10 Knowing When to Worry: Using Survival Analysis to Understand Customers 357 Chapter 11 Genetic Algorithms and Swarm Intelligence 397 Chapter 12 Tell Me Something New: Pattern Discovery and Data Mining 429 Chapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459 Chapter 14 Alternative Approaches to Cluster Detection 499 Chapter 15 Market Basket Analysis and Association Rules 535 xiii ffirs.indd xiii 3/8/2011 3:06:16 PM