Download Applying Analytics to Search Engine Marketing (SEM)

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer simulation wikipedia , lookup

Data analysis wikipedia , lookup

Predictive analytics wikipedia , lookup

Data assimilation wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Corecursion wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Applying Analytics to Search Engine Marketing (SEM):
(i) Modeling KW Revenue Per Click (RPC)
(ii) Harnessing the Long Tail of KWs
Sameer Chopra
VP, Marketing Analytics
Orbitz Worldwide
March 2011
Proprietary and Confidential
• Notes/Caveats
• Importance of deriving Rev Per Click (RPC) for KWs
– Recap: (i) Google AdWords Ad Ranking logic
(ii) Max CPC Bid  Actual CPC relationship
• Why is the Long Tail of keywords important?
• Tapping into the Long Tail opportunity: Mining the tail for KW addition
• Decision Tree approach for RPCs: Meta-data driven matrix-of-rules
• Efficiency Curves: Measuring Efficiency Gains
• Q&A
• Speaking to a blend of different experiences and approaches over the years/different firms
• This is just one of many different approaches that are used (eg: (i) 2 stage modeling (ii)
Predicting Revenue separately from Clicks, etc.)
• Re: Title of Presentation: Analytics is applied to many other area of SEM as well:
 Match Type Analysis
 Day Parting Analysis
 Geo Targeting Analysis
 Spend Allocation Optimization
Clearly cannot cover those topics in the time here for the purpose of this talk.
Importance of Deriving Rev-Per-Click (RPC) for KWs
• Given a bid (Max CPC), actual cost (Actual CPC) is well understood:
Actual_CPCA = (Max CPCB * QSB)/QSA + 1¢ (rounded up to nearest penny)
• But how much to bid in the first place? – this is where KW valuation (RPC) comes
into play.
Google AdWords Ad Ranking Logic
Source: Google AdWords Learning Center
Why is the Long Tail of KWs important?
Long Tail: search terms with 4+ words, now represent majority of user queries (compare to
short broad phrases for head KWs: “Nike shoes” vs. “Red Nike shoes for men size 12”)
 Increasing competition (more ads per keyword) results in advertisers facing higher bid prices
and an uphill task of maximizing ROI – the long tail can be a very handy lever
 Do not have high search volume individually but in aggregate the Long Tail provides the
vast majority of a site’s search traffic and conversions
 Long-tail user queries are often very specific and highly qualified, with searchers further
along the purchase cycle
 LT KWs are more targeted and more likely to convert vs. generic head non-branded KWs
 Long Tail keywords can offer incredible ROI because they're less competitive and less
expensive to bid on for PPC
 ~70% of user queries have no exact-matched KWs….a big ROI opportunity in the long tail.
Tapping into the Long Tail Opportunity: Mining the Tail for KW Addition
• A common mistake: bidding almost all KWs on Broad Matched on Auto-Pilot
- Precludes being relevant & targeted (Copy/LP), “1 size fits all queries” max bid
 Use Broad Match a „net‟ to fish for LT queries…the diamonds in the rough:
Analyze which queries result in higher conversion – bid these on exact match – and
group into Ad Groups
 Exact Matching is more targeted and generally results in higher Quality Score, lower CPCs,
higher ROI]
• Analyze which queries do not convert – perfect fodder for negative keywords
 Note: Utilizing Broad Match is a good strategy if done effectively – not using BM at all is
akin to „throwing the baby out with the bath water‟ given the value in the long tail queries
Account Structure Organization: There will be an added necessity to get smart about
grouping/clustering the Long Tail KWs into tightly themed ad groups with associated copy &
LPs --- but it is well worth the effort since you capture searcher-intent effectively.
Decision Tree approach for RPCs: Meta-data driven matrix of rules
 TOKENIZATION: Categorization/Mapping of KW tokens to capture intent
 Useful variables for segmentation & tree splitting, and KW generation
KW = “Red Nike Shoes”
• Product = „Shoes‟
• Color = „Red‟
• Brand Name = „Nike‟
KW= “Charming Bed & Breakfast in Boston”
• Product = „B&B‟
• Modifier = „Charming‟
• City = “Boston”
 VARIABLE CREATION: This is a very important step of course in having an effective
D-tree model built –-creative derivation of additional variables
• # of Tokens in KW,
• Length of KW,
• KW contains <XYZ>
• Brand Y/N Indicator, etc.
Regression Tree: pseudo-CART approach
ABOVE: Illustration of a Decision Tree model for RPCs
Example: KW= “Looking for a Discounted B&B in SFO that accepts Pets” would get segmented with KWs in
the bottom right leaf node -- having Dtree Model RPC = $0.35
Where the weight αkw is used to
incorporate actuals for kws with
lots of information.
Decision Tree approach (continued)
 Some Pros:
 Trees represent Rules which are easily understood by humans and expressed in SQL/SAS/PMML etc.
 Easy to follow path from root node to a leaf – explanation for any prediction is easy to understand
 Trees carve up & cover completely the multi-dimensional space – enabling us to assign any new record to
an outcome based on which region it falls into.
 Robust/insensitive to outliers, missing values, & skewed data distributions
 Non-parametric: do not assume that the dependent variable follows any given distribution
 Relatively little data preparation needed (eg: Categorical field with hundreds of values – groups formed)
 Data exploration: Can relatively quickly (no more than 1 pass for each level of the tree) explore large datasets
to determine useful input variables.
 Some Cons:
• Over-fitting: could result in large unstable trees with many leaf nodes –- not lending itself to easy interpretation
or prediction -- need to prune tree and validate against holdout/validation data set
• ‘Greedy’ Algorithm for splitting: look at variables hierarchically/sequentially rather than simultaneously
• Not all data fits neatly into rectangular regions (a deep tree would be needed to try and approximate the
required diagonal split for instance)
• Usually used for classification; other data mining techniques generally suited for continuous estimation.
• D-tree can only generate as many discrete values as there are leaves in the tree – so we have discrete, ‘lumpy’,
step-function-like discontinuous estimates [Note: Since we use dampening with weights, final predictions
are a continuous spectrum of RPCs]
Efficiency Curves: Measuring Efficiency Gains
$$ Daily Rev (R)
R=B*S^n2; 0<n2<1
Efficiency Gain
R=A*S^n1; 0<n1<1
(Common Spend Region)
$$ Daily Spend (S)
• Note: Model fit vs. Business Lift…don‟t always go hand in hand.
Accuracy by itself is not a sufficient measure of a model‟s usefulness to business performance…