Download Product Profit Optimization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Product Profit
Optimization
Jack Meyers, Bria Lambert, Dan Bonneville, Kelsey Frain
1
Outline
1. Problem Background
2. Our Approach
3. Data Analysis
4. Classification-Decision Tree
5. Validation
6. Future work
2
PROBLEM
BACKGROUND
3
Special Pricing Agreement
SPA: Contract between customers and suppliers
Advantage:
– Low Cost
– Lower Sale Price
– Higher Quantity
SPA
SPA
4
Research Goal
1.
Identify key drivers of existing SPAs
2.
Distinguish new potential SPAs
5
OUR APPROACH
6
Our Approach
1. Clean the large datasets
2. Data exploration
3. Classification
4. Decision Tree
5. Cross validate model
7
DATA ANALYSIS
8
Data Tables
We were given:
1. Quotes
5. Customers : 2.5 M
2. Attributes : 5.2 M
6. Hubs
3. Branch Inventory
7. Consumption
4. Products : 1.3 M
8. Invoices 72 M
9
Data Cleaning
1. Found top 1000 customers based on quantity purchased
2. Combined sales data
3. Removed relationships:
– Sale price < 0
– Cost < 0
– Any missing values
– Quantity sold < 10
10
Creating Variables
From the invoice data we created the following variables:
– Quantity Sold
– Average Sale Price
– Average Cost
– High and Low Sale Price
– High and Low Cost
– Profit
11
Customer-Product relationships
Number of
Percent of
relationships data
SPA
1,577
Non-SPA 308,562
0.44%
Average
unit
Profit
$0.34
99.56%
$0.21
Quantity
Sold
151,083
units
18,513 units
12
CLASSIFICATION –
DECISION TREE
13
Classification
The process of categorizing an instance based on shared
characteristics
14
Decision Tree
• 1 Outcome Variable
• 2 Predictor Variables
15
Why Decision Trees?
Advantages:
– Easy to implement
– Easy to identify how parameter values lead to an SPA
creation
– Easy to discover potential SPAs
Challenges:
– Variable Selection
16
Overfit Tree
17
Random Forest Variable Selection
18
...
1
2
100
Important Variables
19
Selected Predictor Variables
1. Quantity Sold
2. High Sale Price
3. High Cost
4. Low Cost
20
#1
#2
#3
21
#4
200 potential SPA relationships
42 potential SPA
15 potential SPA
52 potential SPA
91 potential SPA
22
VALIDATION
23
Profitability of Potential SPAs
Top 1,000 Customers
SPAs not in Tree: 1577
$5K profit per SPA
Run through Decision Tree
SPAs in Tree: 141
$362K profit per SPA
Potential SPAs: 200
$42K profit per
potential SPA
24
Comparing Relationships
Subset of 609 unique products that occur as both an SPA and
Non-SPA for various customers.
Average...
Quantity
Sale
Cost
Profit per item
Profit per product
SPA
255,640
$9.30
$5.70
$3.60
$920,000
Non SPA
18,279
$12.13
$7.25
$4.88
$89,201
25
Comparison Graph
Log( Quantity ) vs. Log( Profit per Unit )
1. Each point is a product
2. Not readable
• SPA
• Non-SPA
3. Regression lines: ratio
between profit and
quantity
26
Comparison Graph
Log( Quantity ) vs. Log( Profit per Unit )
4. 2830 is the Minimum
threshold for potential SPA
• SPA
• Non-SPA
5. Products with a quantity
sold greater than 2830
average a profit increase
27
Comparing Tree with Next 1000 Customers
Top 1000 Customers (Original)
Next 1000 Customers
First
Group
of SPAs
withwith
higher
volume:
Second
Group
of SPAs
lower
volume:
Initial
Split:
••• Items
cheaper
( cost
of
33¢
vs 50¢
Itemson
have
higher
cost
and
sale
price
Items
can't
be
tooside
cheap
(cost
greater
than) 20¢ vs 38¢)
••• Higher
Quantity
Soldvolume
( 300k vs(2.8k
40k vs
) 312)
Items
sell
medium
Must
sell
reasonable
volume
28
SUMMARY
29
Business Action
1. Pick a new set of the data
2. Select variables with Random Forest
3. Create Decision Tree with random subset of the data (~30%)
– Including all SPAs in subset
4. Identify parameter values describing SPA subsets from tree
Subset of Tree
Low Cost
Quantity Sold
High Cost
#1
> $0.20
> 118,00 units
> $0.33
5. Extract relationships matching the parameter values from each
subset that should be potential SPAs
30
FUTURE WORK
31
Suggestions for Future Work
– Continue to look at the next top 1,000 customer-product
relationships based on quantity sold
– Discover weak SPAs and flag them for removal
– Examine a new model without weak SPAs
– Explore different classification methods such as Random
Forests
32
ACKNOWLEDGMENTS
33
Acknowledgments
– Fastenal: Brain Keeling and Andrew Kores
– “PIC Math is a program of the Mathematical
Association of America (MAA) and the Society for
Industrial and Applied Mathematics (SIAM). Support
is provided by the National Science Foundation (NSF
grant DMS-1345499).”
– Dr. Song Chen and Dr. Chad Vidden
34
QUESTIONS?
35