Download Data Mining Techniques For Marketing, Sales, and Customer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
bindex.indd 848
3/9/2011 7:54:10 PM
Data Mining Techniques
Third Edition
ffirs.indd i
3/8/2011 3:06:13 PM
ffirs.indd ii
3/8/2011 3:06:13 PM
Data Mining Techniques
For Marketing, Sales, and Customer
Relationship Management
Third Edition
Gordon S. Linoff
Michael J. A. Berry
ffirs.indd iii
3/8/2011 3:06:13 PM
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
Published by
Wiley Publishing, Inc.
10475 Crosspoint Boulevard
Indianapolis, IN 46256
www.wiley.com
Copyright © 2011 by Wiley Publishing, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-0-470-65093-6
ISBN: 978-1-118-08745-9 (ebk)
ISBN: 978-1-118-08747-3 (ebk)
ISBN: 978-1-118-08750-3 (ebk)
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2 1
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means,
electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or
108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive,
Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed
to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201)
748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with
respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including
without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or
promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work
is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional
services. If professional assistance is required, the services of a competent professional person should be sought.
Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or
Web site is referred to in this work as a citation and/or a potential source of further information does not mean that
the author or the publisher endorses the information the organization or website may provide or recommendations
it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services please contact our Customer Care Department within the
United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Library of Congress Control Number: 2011921769
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or
its affi liates, in the United States and other countries, and may not be used without written permission. All other
trademarks are the property of their respective owners. Wiley Publishing, Inc. is not associated with any product
or vendor mentioned in this book.
ffirs.indd iv
3/8/2011 3:06:15 PM
To Stephanie, Sasha, and Nathaniel. Without your patience and
understanding, this book would not have been possible.
— Michael
To Puccio.
Grazie per essere paziente con me.
Ti amo.
— Gordon
ffirs.indd v
3/8/2011 3:06:15 PM
ffirs.indd vi
3/8/2011 3:06:15 PM
About the Authors
Gordon S. Linoff and Michael J. A. Berry are well known in the data mining field.
They are the founders of Data Miners, Inc., a boutique data mining consultancy,
and they have jointly authored several influential and widely read books in the
field. The first of their jointly authored books was the first edition of Data Mining
Techniques, which appeared in 1997. Since that time, they have been actively mining data in a wide variety of industries. Their continuing hands-on analytical
work allows the authors to keep abreast of developments in the rapidly evolving
fields of data mining, forecasting, and predictive analytics. Gordon and Michael
are scrupulously vendor-neutral. Through their consulting work, the authors
have been exposed to data analysis software from all of the major software
vendors (and quite a few minor ones as well). They are convinced that good
results are not determined by whether the software employed is proprietary or
open-source, command-line or point-and-click; good results come from creative
thinking and sound methodology.
Gordon and Michael specialize in applications of data mining in marketing
and customer relationship management — applications such as improving recommendations for cross-sell and up-sell, forecasting future subscriber levels,
modeling lifetime customer value, segmenting customers according to their
behavior, choosing optimal landing pages for customers arriving at a website,
identifying good candidates for inclusion in marketing campaigns, and predicting
which customers are at risk of discontinuing use of a software package, service,
or drug regimen. Gordon and Michael are dedicated to sharing their knowledge,
skills, and enthusiasm for the subject. When not mining data themselves, they
enjoy teaching others through courses, lectures, articles, on-site classes, and of
course, the book you are about to read. They can frequently be found speaking
at conferences and teaching classes. The authors also maintain a data mining
blog at blog.data-miners.com.
vii
ffirs.indd vii
3/8/2011 3:06:15 PM
viii
About the Authors
Gordon lives in Manhattan. His most recent book before this one is Data
Analysis Using SQL and Excel, which was published by Wiley in 2008.
Michael lives in Cambridge, Massachusetts. In addition to his consulting
work with Data Miners, he teaches Marketing Analytics at the Carroll School
of Management at Boston College.
ffirs.indd viii
3/8/2011 3:06:15 PM
Credits
Executive Editor
Robert Elliott
Senior Project Editor
Adaobi Obi Tulton
Production Editor
Daniel Scribner
Vice President and Executive Group
Publisher
Richard Swadley
Vice President and Executive
Publisher
Barry Pruett
Copy Editor
Paula Lowell
Associate Publisher
Jim Minatel
Editorial Director
Robyn B. Siesky
Project Coordinator, Cover
Katie Crocker
Editorial Manager
Mary Beth Wakefield
Proofreaders
Word One New York
Freelancer Editorial Manager
Rosemarie Graham
Indexer
Ron Strauss
Marketing Manager
Ashley Zurcher
Cover Image
Ryan Sneed
Production Manager
Tim Tate
Cover Designer
© PhotoAlto/Alix Minde/GettyImages
ix
ffirs.indd ix
3/8/2011 3:06:16 PM
ffirs.indd x
3/8/2011 3:06:16 PM
Acknowledgments
We are fortunate to be surrounded by some of the most talented data miners
anywhere, so our first thanks go to our colleagues, past and present, at Data
Miners, Inc., from whom we have learned so much: Will Potts, Dorian Pyle,
and Brij Masand. There are also clients with whom we work so closely that
we consider them our colleagues and friends as well: Harrison Sohmer, Stuart
E. Ward, III, and Michael Benigno are in that category. Our editor, Bob Elliott,
kept us (more or less) on schedule and helped us maintain a consistent style.
SAS Institute and the Data Warehouse Institute have given us unparalleled
opportunities over the past 12 years for teaching. We owe special thanks to Herb
Edelstein (now retired), Herb Kirk, Anne Milley, Bob Lucas, Hillary Kokes, Karen
Washburn, and many others who have made these classes possible.
Over the past year, while we were writing this book, several friends and colleagues have been very supportive. We would like to acknowledge Diane and
Savvas Mavridis, Steve Mullaney, Lounette Dyer, Maciej Zworski, John Wallace,
Paul Rosenblum, and Don Wedding.
We also want to acknowledge all the people with whom we have worked in
scores of data mining engagements over the years. We have learned something from
every one of them. Among the many who have helped us throughout the years:
Alan Parker
Dave Waltz
Craig Stanfill
Dirk De Roos
Michael Alidio
Michael Cavaretta
Dave Duling
Jeff Hammerbacher
Andrew Gelman
Gary King
Tim Manns
Jeremy Pollock
Richard James
Georgia Tourasi
Avery Wang
Eric Jiang
Bruce Rylander
Daryl Berry
xi
ffirs.indd xi
3/8/2011 3:06:16 PM
xii
Acknowledgments
Doug Newell
Ed Freeman
Erin McCarthy
Josh Goff
Karen Kennedy
Ronnie Rowton
Kurt Thearling
Mark Smith
Nick Radcliffe
Patrick Surry
Ronny Kohavi
Terri Kowalchuk
Victor Lo
Yasmin Namini
Zai Ying Huang
Amber Batata
Adam Schwebber
Tiha Ghyczy
Usama Fayyad
Patrick Ott
John Muller
Frank Travisano
Jim Stagnito
Stephen Boyer
Yugo Kanazawa
Xu He
Kiran Nagarur
Ramana Thumu
Jacob Hauskens
Jeremy Pollock
Lutz Hamel
And, of course, all the people we thanked in the first edition are still deserving of acknowledgment:
Bob Flynn
Bryan McNeely
Claire Budden
David Isaac
David Waltz
Dena d’Ebin
Diana Lin
Don Peppers
Ed Horton
Edward Ewen
Fred Chapman
Gary Drescher
Gregory Lampshire
Janet Smith
Jerry Modes
Jim Flynn
Kamran Parsaye
Karen Stewart
Larry Bookman
Larry Scroggins
Lars Rohrberg
Lounette Dyer
Marc Goodman
Marc Reifeis
Marge Sherold
Mario Bourgoin
Prof. Michael Jordan
Patsy Campbell
Paul Becker
Paul Berry
Rakesh Agrawal
Ric Amari
Rich Cohen
Robert Groth
Robert Utzschnieder
Roland Pesch
Stephen Smith
Sue Osterfelt
Susan Buchanan
Syamala Srinivasan
Wei-Xing Ho
William Petefish
Yvonne McCollin
Finally, we would like to thank our family and friends, particularly Stephanie
and Giuseppe, who have endured with grace the sacrifices in writing this book.
ffirs.indd xii
3/8/2011 3:06:16 PM
Contents at a Glance
Introduction
xxxvii
Chapter 1
What Is Data Mining and Why Do It?
1
Chapter 2
Data Mining Applications in Marketing and Customer
Relationship Management
27
Chapter 3
The Data Mining Process
67
Chapter 4
Statistics 101: What You Should Know About Data
101
Chapter 5
Descriptions and Prediction: Profiling and
Predictive Modeling
151
Chapter 6
Data Mining Using Classic Statistical Techniques
195
Chapter 7
Decision Trees
237
Chapter 8
Artificial Neural Networks
281
Chapter 9
Nearest Neighbor Approaches: Memory-Based
Reasoning and Collaborative Filtering
321
Chapter 10 Knowing When to Worry: Using Survival Analysis to
Understand Customers
357
Chapter 11 Genetic Algorithms and Swarm Intelligence
397
Chapter 12 Tell Me Something New: Pattern Discovery and
Data Mining
429
Chapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459
Chapter 14 Alternative Approaches to Cluster Detection
499
Chapter 15 Market Basket Analysis and Association Rules
535
xiii
ffirs.indd xiii
3/8/2011 3:06:16 PM