Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Product Profit Optimization Jack Meyers, Bria Lambert, Dan Bonneville, Kelsey Frain 1 Outline 1. Problem Background 2. Our Approach 3. Data Analysis 4. Classification-Decision Tree 5. Validation 6. Future work 2 PROBLEM BACKGROUND 3 Special Pricing Agreement SPA: Contract between customers and suppliers Advantage: – Low Cost – Lower Sale Price – Higher Quantity SPA SPA 4 Research Goal 1. Identify key drivers of existing SPAs 2. Distinguish new potential SPAs 5 OUR APPROACH 6 Our Approach 1. Clean the large datasets 2. Data exploration 3. Classification 4. Decision Tree 5. Cross validate model 7 DATA ANALYSIS 8 Data Tables We were given: 1. Quotes 5. Customers : 2.5 M 2. Attributes : 5.2 M 6. Hubs 3. Branch Inventory 7. Consumption 4. Products : 1.3 M 8. Invoices 72 M 9 Data Cleaning 1. Found top 1000 customers based on quantity purchased 2. Combined sales data 3. Removed relationships: – Sale price < 0 – Cost < 0 – Any missing values – Quantity sold < 10 10 Creating Variables From the invoice data we created the following variables: – Quantity Sold – Average Sale Price – Average Cost – High and Low Sale Price – High and Low Cost – Profit 11 Customer-Product relationships Number of Percent of relationships data SPA 1,577 Non-SPA 308,562 0.44% Average unit Profit $0.34 99.56% $0.21 Quantity Sold 151,083 units 18,513 units 12 CLASSIFICATION – DECISION TREE 13 Classification The process of categorizing an instance based on shared characteristics 14 Decision Tree • 1 Outcome Variable • 2 Predictor Variables 15 Why Decision Trees? Advantages: – Easy to implement – Easy to identify how parameter values lead to an SPA creation – Easy to discover potential SPAs Challenges: – Variable Selection 16 Overfit Tree 17 Random Forest Variable Selection 18 ... 1 2 100 Important Variables 19 Selected Predictor Variables 1. Quantity Sold 2. High Sale Price 3. High Cost 4. Low Cost 20 #1 #2 #3 21 #4 200 potential SPA relationships 42 potential SPA 15 potential SPA 52 potential SPA 91 potential SPA 22 VALIDATION 23 Profitability of Potential SPAs Top 1,000 Customers SPAs not in Tree: 1577 $5K profit per SPA Run through Decision Tree SPAs in Tree: 141 $362K profit per SPA Potential SPAs: 200 $42K profit per potential SPA 24 Comparing Relationships Subset of 609 unique products that occur as both an SPA and Non-SPA for various customers. Average... Quantity Sale Cost Profit per item Profit per product SPA 255,640 $9.30 $5.70 $3.60 $920,000 Non SPA 18,279 $12.13 $7.25 $4.88 $89,201 25 Comparison Graph Log( Quantity ) vs. Log( Profit per Unit ) 1. Each point is a product 2. Not readable • SPA • Non-SPA 3. Regression lines: ratio between profit and quantity 26 Comparison Graph Log( Quantity ) vs. Log( Profit per Unit ) 4. 2830 is the Minimum threshold for potential SPA • SPA • Non-SPA 5. Products with a quantity sold greater than 2830 average a profit increase 27 Comparing Tree with Next 1000 Customers Top 1000 Customers (Original) Next 1000 Customers First Group of SPAs withwith higher volume: Second Group of SPAs lower volume: Initial Split: ••• Items cheaper ( cost of 33¢ vs 50¢ Itemson have higher cost and sale price Items can't be tooside cheap (cost greater than) 20¢ vs 38¢) ••• Higher Quantity Soldvolume ( 300k vs(2.8k 40k vs ) 312) Items sell medium Must sell reasonable volume 28 SUMMARY 29 Business Action 1. Pick a new set of the data 2. Select variables with Random Forest 3. Create Decision Tree with random subset of the data (~30%) – Including all SPAs in subset 4. Identify parameter values describing SPA subsets from tree Subset of Tree Low Cost Quantity Sold High Cost #1 > $0.20 > 118,00 units > $0.33 5. Extract relationships matching the parameter values from each subset that should be potential SPAs 30 FUTURE WORK 31 Suggestions for Future Work – Continue to look at the next top 1,000 customer-product relationships based on quantity sold – Discover weak SPAs and flag them for removal – Examine a new model without weak SPAs – Explore different classification methods such as Random Forests 32 ACKNOWLEDGMENTS 33 Acknowledgments – Fastenal: Brain Keeling and Andrew Kores – “PIC Math is a program of the Mathematical Association of America (MAA) and the Society for Industrial and Applied Mathematics (SIAM). Support is provided by the National Science Foundation (NSF grant DMS-1345499).” – Dr. Song Chen and Dr. Chad Vidden 34 QUESTIONS? 35