Download PDF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Is Chocolate Milk the New-Age Energy\Sports Drink in the United States?
Senarath Dharmasena*
Oral Capps, Jr.*
*Agribusiness, Food and Consumer Economics Research Center (AFCERC)
Texas A&M University
Emails: [email protected]
[email protected]
Phone: (979) 845-8491
Selected Paper prepared for presentation at the Southern Agricultural Economics
Association Annual Meeting, Corpus Christi, TX, February 5-8, 2011
Copyright 2011 by Senarath Dharmasena and Oral Capps, Jr. All rights reserved. Readers
may make verbatim copied of this document for non-commercial purposes by any means,
provided that this copyright notice appears on all such copies.
Is Chocolate Milk the New-Age Energy\Sports Drink in the United States?
Senarath Dharmasena and Oral Capps, Jr.
Abstract
Data from U.S. households for calendar year 2008 were used in examining
demographic and economic factors affecting demand for chocolate milk using Heckman twostep procedure. Price, income, age, education, region, race, Hispanic status, and presence of
children were significant drivers of consumption of chocolate milk. Sample selection bias
was statistically significant.
Key Words: Chocolate milk, Nielsen HomeScan data, Heckman two-step
JEL Classification: D11, D12
Is Chocolate Milk the New-Age Energy\Sports Drink in the United States?
Senarath Dharmasena and Oral Capps, Jr.
Background:
Energy and sports drinks are sold in the market as a functional beverage serving to
boost on-the-go lifestyles and to rejuvenate individuals after workouts (Beverage Marketing
Corporation, 2010). However, Karp et al., (2006) and Thomas et al., (2009) suggest
consumption of chocolate milk vis-à-vis sports and energy drinks is an effective recovery aid
after prolonged workouts. To strengthen the role of chocolate milk as a new-age
sports/energy drink, the “Got Milk?” campaign with the participation of U.S Olympic
celebrities promotes chocolate milk as an “easy, effective and cost efficient” way to fuel up
the body after an intense workout (Brandweek, 2010). According to NPD Group (2010) and
Nielsen (2010), U.S. consumption of chocolate milk is growing and servings of plain
chocolate milk grew from 1.2 billion in 2009 to 1.4 billion in 2010.
Given this backdrop, knowledge of price sensitivity, substitutes/complements and
demographic profiling with respect to consumption of chocolate milk is important for
manufacturers, retailers and advertisers of chocolate milk from a competitive intelligence
perspective as well as from a strategic decision-making perspective. We could not find any
past study pertaining to demand for chocolate milk in the extant literature. Therefore, to our
knowledge, our study is the first to examine the economic and demographic factors
determining U.S. demand for chocolate milk.
Objectives
A thorough and a complete analysis of demand for chocolate milk is important due to
increasing growth in consumption in recent times as an alternative beverage to sports and
energy drinks and to the lack of information in the literature. In this light, specific objectives
are: (1) to determine the factors affecting the decision to purchase chocolate milk, and (2)
once the decision to purchase chocolate milk is made, to determine the drivers of purchase
volume.
Methodology
At first, household purchases of chocolate milk (expenditure and quantity) and socioeconomic-demographic characteristics are generated for each household in the Nielsen
HomeScan Panel for calendar year 2008 (total of 61,440 households). Only 15,078 unique
households purchased chocolate milk. Quantity data are standardized in terms of liquid
ounces and expenditure data are expressed in terms of dollars. Then taking the ratio of
expenditure to volume, we generate unit values (prices in dollars per ounce). Using this data
set, we estimate demand for drinkable yogurts with adjustment to sample selection bias
(Heckman, 1979).
Factors hypothesized to affect the decision to buy chocolate milk and volume of
chocolate milk purchased are: price of chocolate milk, and host of demographic
characteristics such as, gender, employment and education status of the household head;
region; race; Hispanic origin; age and presence of children, and income of the household
head.
Model Development, Procedures and Variables
Choice to purchase or not to purchase chocolate milk could be affected by price of
chocolate milk and various demographic factors. This type of choice is a dichotomous
discrete (buy or not-to buy or “one” if buy and “zero” if do not buy) and a probit model is
used generally to model such a choice decision. The dependent variable is a zero one type
dummy variable which is created to reflect the non-purchase or purchase respectively of
chocolate milk. It is regressed on price and a host of demographic factors. Probit analysis
will provide statistically significant findings of the decision to purchase chocolate milk.
Demographic and economic factors hypothesized to be affecting the decision to buy
chocolate milk are listed on Table 1. Also, we provide different categories used in each factor
along with base category for dummy variables.
The probit model for chocolate milk can be written as follows:
Pr(Y = 1 | xi′β ) = β 1 + β 2 PRICE i + β 3 AGEHH 2529 i + β 4 AGEHH 3034 i +
β 5 AGEHH 3544 i + β 6 AGEHH 4554 i + β 7 AGEHH 5564 i + β 8 AGEHHGT 64 i +
β 9 EMPHHPTi + β 10 EMPHHFTi + β 11 EDUHHHS i + β 12 EDUHHU i +
β 13 EDUHHPC i + β 14 MIDWEST i + β 15 SOUTH i +
β 16WESTi + β 17 BLACK i + β 18 ASIAN i +
(1)
β 19 OTHER i + β 20 HISP _ YES i + β 21 AGEPCLT 6 _ ONLYi +
β 22 AGEPC 6 _ 12ONLYi + β 23 AGEPC13 _ 17ONLYi +
β 24 AGEPCLT 6 _ 6 _ 12ONLYi + β 25 AGEPCLT 6 _ 13 _ 17ONLYi +
β 26 AGEPC 6 _ 12 AND13 _ 17ONLYi + β 27 AGEPCLT 6 _ 6 _ 12 AND13 _ 17 i +
β 28 MHONLYi + β 29 FHONLYi + β 30 INCOME i
where i = 1,......, n is the number of households. Y corresponds to the decision to buy chocolate
milk. Variables are defined in Table 1.
A common characteristic in micro level data (data gathered at consumer level such as
at the individual or household level) is a situation where some consumers do not purchase
some items during the sampling period and presence of them in the sample creates a zero
consumption level for that data period. The data used in this study are gathered at household
level and due to that it suffers from zero consumption data. As such we face a censored
sample of data. Application of ordinary least squares (OLS) to estimate a regression with a
limited dependent variable (such as in a censored sample like ours) usually give rise to biased
estimates, even asymptotically (Kennedy, 2003). Removing all observations pertaining to
zero purchases and estimating regression functions only for non-zero purchases too creates a
bias in the estimates. This phenomenon also is known as sample selection bias. Heckman
(1979) stated that not adjusting for sample selection may result in biased estimates of the
demand parameters. Furthermore, Heckman (1979), discussed the sample selection bias as a
specification error, and developed a simple consistent estimation method that eliminates the
specification error for the case of censored samples. It is known as Heckman-type correction
procedure.
The first stage of the Heckman-two-step sample selection procedure, involves in
decision to purchase chocolate milk. It is modeled through a probit model. A binary
dependent variable is observed (purchase or not purchase), where purchase is represented by
one (1) and not purchase is given by a zero (0). The latent selection equation can be written
as follows;
Z h = w′h γ + ε h
(2)
where Z k represents a latent selection variable (buy or not to buy type dichotomous
variable),
1 if Z h > 0
Zh = 
0 if Z h < or = 0
(3),
wh is a vector of explanatory variables in the latent decision making variable, γ h is a
vector of parameters to be estimated in the decision making equation, ε h is the error term,
and h = 1,2,....., N is the number of observations (in our work the number of households in
the sample) in the sample. Modeling above equation 2 through probit model gives us
following relationships;
Pr[ Z h = 1] = φ ( wh , γ )
Pr[ Z h = 0] = 1 − φ ( wh , γ )
(4) and
(5)
where φ is the normal cumulative probability distribution function (cdf). The first
stage estimation provides estimates of γ and the inverse of the Mills Ratio (IMR hereinafter).
We also generate the associated probability density function (pfd). Inverse of Mills Ratio is
calculated taking the ratio of pdf to cdf. Mathematically, it is as follows;
for Z k = 1 , IMRh =
ϕ ( wh γˆ )
φ ( wh γˆ )
(6),
where ϕ represents the probability density function. Inverse mills ratio is a monotone
decreasing function of the probability that an observation is selected into the sample,
φ ( wk γˆ k ) (Heckman, 1979). In particular,
limφ ( Z hi )→1 IMRh = 0
(7)
lim φ ( Z hi )→0 IMRh = ∞
(8)
∂IMRh
<0
∂φ ( Z h )
(9)
The calculated IMR, will be used as an additional explanatory variable in the second
stage volume equation, which takes care of the sample selection bias in the data. Second
stage equation is given as follows;
E[Yh | Z h = 1] = X h′ β + α
ϕ ( wh γˆ )
φ ( wh γˆ )
(10)
E[Yh | Z h = 1] = X h′ β + αIMRˆ h
(11)
where X k is a vector of explanatory variables considered in the second stage. Importantly,
only observations associated with non-zero observations on Yk are considered here. The IMR
calculated using information retrieved from first stage probit model is used as an explanatory
variable in the second stage (see equations 10 and 11 above). Presence of a sample selection
bias in data will be communicated through statistical significance of the coefficient
associated with IMR, i.e. α k . If α k is statistically not different from zero, we conclude that
there is no sample selection bias in the data and result in the following regression model;
E[Yh | Z h = 1] = X h′ β i
(12)
It is important to know that the explanatory variables in first stage and second stage
equations may or may not be the same. In our work, the price variables in both equations do
not. However, rest of the demographic variables is exactly the same in the first stage and
second stage.
Choice of explanatory variables in the first and second stage has an implication on the
derivation and interpretation of marginal effects associated with variables in the second
stage. This is because in the second stage, we have the IMR term augmenting the regular
regression function with other explanatory variables. Therefore, in calculating marginal
effects, the influence of IMR and its associated regression coefficient on other regression
coefficients have to be taken into consideration.
Suppose X kj denote the jth regressor that is common to both first stage regressors, wk
and, second stage regressors, X j . Differentiating equation 11 with respect to jth regressor,
the marginal effect is given by the following relationship (following explanation is borrowed
from Saha, Capps and Byrne (1997));
∂E[Yh | Z h = 1]
∂ ( IMRˆ hi )
= β ij + α i
∂X kj
∂X hj
(13)
It is evident from 13 that marginal effect of the jth regressor on Yki consists of two parts: a
change in X j which affects the probability of consuming the commodity (this effect is
represented by
∂ ( IMRˆ hi )
in 13); a change in X j which affects the level of consumption (or
∂X hj
expenditure of consumption) which is conditional upon the household choosing to consume
the ith commodity (this is represented by β ij in 13). The former of the above two expression is
important, because the sign and magnitude of the marginal effect depends not only on the β ij ,
but also that of the
∂ ( IMRˆ hi )
. According to Saha, Capps and Byrne (1997), after some
∂X hj
simplification we get arrive at the following relationship for the Heckman second stage
marginal effects,
MEˆ kj =
∂E[ y k | Z = 1]
= β j − αγ ij {WγIMRk + ( IMRk ) 2 }
∂X kj
(14)
In general the marginal effect MEˆ kj ≠ βˆ j ; however the only case where MEˆ kj = β̂ j is where
αˆ = 0 which is a situation where the errors in the first-stage and second-stage estimation
equations have zero covariance. It must be noted that the MEˆ kj estimation depends on a local
set of co-ordinates. Therefore, we estimate the MEˆ kj at the sample means. Following equation
14 shows this result. For simplicity, let us denote IMR in the letter λ .
MEˆ kj | samplemean = βˆ j − αˆ i γˆ j {(W γˆ )λˆ + λˆ 2 }
(15)
where W denotes the vector of regressor sample means in the probit equation (the first stage
equation of the Heckman two-step model and
λˆ =
ϕ (W γˆ )
φ (W γˆ )
(16)
is the inverse Mills ratio evaluated at those means.
The Heckman two-step demand model for chocolate milk can be written as follows:
q i = β 1 + β 2 Pi + β 3 AGEHH 2529 i + β 4 AGEHH 3034 i +
β 5 AGEHH 3544 i + β 6 AGEHH 4554 i + β 7 AGEHH 5564 i + β 8 AGEHHGT 64 i +
β 9 EMPHHPTi + β 10 EMPHHFTi + β 11 EDUHHHS i + β 12 EDUHHU i +
β 13 EDUHHPC i + β 14 REG _ CENTRAL i + β 15 REG _ SOUTH i +
β 16 REG _ WESTi + β 17 RACE _ BLACK i + β 18 RACE _ ORIENTAL i +
(17)
β 19 RACE _ OTHER i + β 20 HISP _ YES i + β 21 AGEPCLT 6 _ ONLYi +
β 22 AGEPC 6 _ 12ONLYi + β 23 AGEPC13 _ 17ONLYi +
β 24 AGEPCLT 6 _ 6 _ 12ONLYi + β 25 AGEPCLT 6 _ 13 _ 17ONLYi +
β 26 AGEPC 6 _ 12 AND13 _ 17ONLYi + β 27 AGEPCLT 6 _ 6 _ 12 AND13 _ 17 i +
β 28 MHONLYi + β 29 FHONLYi + β 30 INCOME i + α i IMR + ε i
where i = 1,......, n is the number of observations (households in our work) in the model. q i
corresponds to the quantity of purchase of chocolate milk and Pi variable represent the price
of chocolate milk. We have defined the variables in the above equation 17 in Table 1. In the
equation 17, IMR stands for the inverse Mills ratio and α i corresponds to the coefficient
associated with IMR. Presence of sample selection bias is determined looking at the
significance of α i . If we have sample selection bias, we have to do an adjustment to the
coefficient estimates in the second stage estimation in trying to get at correct marginal
effects. Procedure to adjust for marginal effects was elaborated in the preceding section.
As such, we will calculate marginal effects associated with each explanatory variable.
The level of significance we will be using in this study is 0.05. We further conduct an F-test
for demographic variable categories to find statistically significant demographics.
Results and Discussion
Market penetration for chocolate milk is 25 percent. The average at-home quantity of
chocolate milk consumed is 404 ounces per household per year and the average price is $0.04
per ounce. Factors affecting the probability of purchase of chocolate milk (the decision to
buy) are, price of chocolate milk, household income, age of household head, education status
of household head, region, race, Hispanic household head, age and presence of children, and
gender of household head. The factors affecting the volume of purchase of chocolate milk are
price of chocolate milk, household income, age of household head, education status of the
household head, region, race, Hispanic household head, age and presence of children in the
household, and gender of household head. The own-price elasticity of demand for chocolate
milk was estimated to be -0.04. Sample selection bias was statistically significant.
References:
Beverage Marketing Corporation, 2010
Brandweek. (2010).”Got Chocolate Milk? Apolo Ohno Does”. www.brandweek.com
(internet accessed on August 13, 2010)
Heckman, J. J., (1979). “Sample Selection Bias as a Specification Error”. Econometrica 47:
153-161
Karp, J.S., J.D. Johnston, S. Tecklenburg, T.M. Mickleborough, A.D. Fly, and J.M. Stager.
(2006). “Chocolate Milk as a Post-Exercise Recovery Aid”. International Journal
of Sport Nutrition and Exercise Metabolism 16: 78-91
Nielsen. (2010). www.nielsen.com (internet accessed on September 13, 2010)
NPD Group. (2010). www.npd.com (internet accessed on September 13, 2010)
Thomas, K., P. Morris, and Emma Stevenson. (2009). “Improved Endurance Capacity
following Chocolate Milk Consumption compared with 2 Commercially Available
Sport Drinks”. Applied Physiological Nutrition Metabolism 34: 78-82
Saha, A., O.Capps. Jr, et al. (1997). "Calculating Marginal Effects in DichotomousContinuous Models." Applied Economic Letters 4(3): 181-185.
Kennedy, P. (2003). Limited Dependent Variables. A Guide to Econometrics, MIT Press.
Table 1 Description of the Right-Hand Side Variables Used in the Econometric Analysis
Variable
PRICE
Explanation
Price of Chocolate Milk
AGEHHLT25
AGEHH2529
AGEHH3034
AGEHH3544
AGEHH4554
AGEHH5564
AGEHHGT64
EMPHHNFP
EMPHHPT
EMPHHFT
Age of Household Head less than 25 years (Base category)
Age of Household Head between 25-29 years
Age of household Head between 30-34 years
Age of household Head between 35-44 years
Age of household Head between 45-54 years
Age of household Head between 55-64 years
Age of household Head greater than 64 years
Household Head not employed for full pay (Base category)
Household Head Part-time Employed
household Head Full-time Employed
EDUHHLTHS
EDUHHHS
EDUHHU
EDUHHPC
Education
Education
Education
Education
EAST
MIDWEST
SOUTH
WEST
WHITE
BLACK
ASIAN
RACE_OTHER
HISP_NO
HISP_YES
of
of
of
of
Household
Household
Household
Household
Head:
Head:
Head:
Head:
Less than high school (Base category)
High school only
Undergraduate only
Some post-college
Region: East (Base category)
Region: Central (Midwest)
Region South
Region West
Race
Race
Race
Race
White (Base category)
Black
Oriental
Other (non-Black, non-White, non-Oriental)
Non-Hispanic Ethnicity (Base category)
Hispanic Ethnicity
Table 1 Continued….
Variable
NPCLT_18
AGEPCLT6_ONLY
AGEPC6_12ONLY
AGEPC13_17ONLY
AGEPCLT6_6_12ONLY
AGEPCLT6_13_17ONLY
AGEPC6_12AND13_17ONLY
AGEPCLT6_6_12AND13_17
FHMH
MHONLY
FHONLY
Explanation
No Child less than 18 years (Base category)
Age and Presence of Children less than 6-years
Age and Presence of Children between 6-12 years
Age and Presence of Children between 13-17 years
Age and Presence of Children less than 6 and 6-12 years
Age and Presence of Children less than 6 and 13-17 years
Age and Presence of Children between 6-12 and 13-17 years
Age and Presence of Children less than 6, 6-12 and 13-17 years
Household Head both Male and Female (Base category)
Household Head Male only
Household Head Female only