Download Extended abstract - Conference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Is web data capable of detecting firms’ activity status?
Desamparados Blazquez ([email protected])1 and Josep Domenech ([email protected])12
Keywords: business demography statistics, firms’ activity status, web data, website
1.
INTRODUCTION
Business demography is one of the economic aspects that attracts more attention from
governments and policy makers. Indeed, most official statistics institutions carry out
detailed surveys to monitor the active population of firms, their birth, survival and death.
The interest on business demography statistics relies on the important role they play in
economic growth, productivity and employment [1].
In the Digital Era, the important role of the Internet in economy and society, together with
the development of advanced computer systems, opens up new ways of monitoring
economic activities and, thus, business demography. The Internet and the World Wide
Web (WWW) have become basic tools within the daily activities of individuals and
companies. These have massively developed their websites, where they describe their main
activities, products and strategies, in order to have a presence in the digital channels.
Therefore, corporate website contents are necessarily connected to the business activity to
some extent, as has been recently verified for topics such as technology adoption [2],
innovation activities [3], firm growth [4] and firm export orientation [5]. Given that firm’s
activities emerge on the website, the question on whether firm's inactivity is also
manifested on the website arises.
Since keeping the website updated requires firms to mobilize some financial or working
resources, it is plausible that only active and healthy firms would invest their resources to
that end. Therefore, if a company dies, this event is likely to be manifested in its website
as lacking updates or, simply, as a being down.
2.
METHODS
The sample for this study included 780 companies with website from manufacturing,
services and other sectors (NACE Rev.2 codes 10-95) established in Spain and that were
active in 2008. The dataset consists of a panel of economic and online data for these firms
between years 2008 and 2014. The economic data were retrieved from the SABI database
[6]. The online information was obtained by accessing the corporate websites with the
Wayback Machine of the Internet Archive [7], which is a public and free repository of
snapshots of about 484 billion web pages. It captures and stores websites on a daily basis,
allowing users to access them and track their evolution over time.
To account for changes in the corporate websites, we queried the Wayback Machine with
the URL of each company’s website and checked the homepage for each year under study.
The observed changes were coded into the variable Web_status, which could take four
different values: “1” if the website was down; “2” if the website was unchanged; “3” if the
website had experimented minor changes, such as including or removing pictures or
1
Department of Economics and Social Sciences, Universitat Politècnica de València, Valencia (Spain)
2 Department of Economics and Social Sciences, Un iversitat Politècnica de València, Valencia (Spain)
1
sections; and “4” if the website had experimented major changes, such as having a
completely renewed design compared to the previous year’s version.
For assessing the relation between the WWW and firms’ activity status, a multi-period
logistic regression was used, where the dependent variable was the firm’s status (Active or
Inactive) and the variable Web_status was used as explanatory along with the periodspecific effects.
3.
RESULTS
Table 1 shows the estimation results, including the estimated regression coefficients (β),
Odds Ratios (OR), Standard Errors (SE) and p values. As one can see, the observed web
status have a statistically significant effect on the probability of being active.
Table 1 - Multi-period logistic regression with website status
Variables
(Intercept)
β
3.483
OR
32.557
SE
0.598
p value
0.000
Web_status(2)
1.628
5.094
0.227
0.000
Web_status(3)
3.423
30.661
0.390
0.000
Web_status(4)
3.970
52.985
0.727
0.000
Log-likelihood
−349.349
Notes: Time dummies were included.
As the activity in the website increases, the probability of a firm being active increases too.
The estimate that corresponds to having an unchanged website (Code 2) is positive, which
means that having just a working website increases the probability that a firm is active with
respect to having a down website (Code 1, which was used as the baseline level).
Concretely, the probability that a firm with an unchanged website is active is 5 times that
of a firm whose website is down, as the OR indicates.
Updating websites rather to a minor (Code 3) or major (Code 4) extent also increases the
probability that a firm is active, as expected. The probability of a firm being active when
it moderately changed its website is 30 times compared to a firm whose website is down,
while it is more than 50 times when a website has been totally renewed. These results are
in line with what was hypothesized: healthy firms invest more in maintaining and updating
their websites, so the more activity they evidence in their website, the more probable is
that they are active. It is important to remark that this does not mean that updating websites
contributes to firms remaining active, but it is a strong reflect of the firm's active status.
4.
CONCLUSIONS
This work analyzed and confirmed the connection of a company’s activity status to the
corporate website’s activity status. Multi-period logistic regression estimations pointed out
that the corporate website clearly reflects the firm’s activity status.
These results open up new possibilities for monitoring business demography. Web data
capture a firm’s status, while access to corporate websites is open and inexpensive, making
it possible to build online indicators to monitor and forecast business death rates. This
could be done in a very short period thanks to the digital nature of the WWW, which allows
2
firms’ information to be automatically gathered and analyzed. This way, policy-makers
and other consumers of official statistics would have the chance to obtain short-term
estimates of the business demography, which could turn into more informed decisions.
REFERENCES
[1] Eurostat, and OECD. 2007. Eurostat-OECD Manual on Business Demography
Statistics. Luxembourg: Office for Official Publications of the European Communities.
[2] Arora, S. K., Youtie, J., Shapira, P., Gao, L., and Ma, T. (2013). Entry strategies in an
emerging technology: a pilot web-based study of graphene firms. Scientometrics, 95,
1189-1207.
[3] Gök, Abdullah, Alec Waterworth, and Philip Shapira. (2015). “Use of web mining in
studying innovation.” Scientometrics, 102, 653 – 671.
[4] Li, Yin, Sanjay Arora, Jan Youtie, and Philip Shapira. In Press. “Using web mining to
explore Triple Helix influences on growth in small and mid-size firms.” Technovation.
[5] Blazquez, Desamparados, and Josep Domenech. In Press. “Web Data Mining for
Monitoring Business Export Orientation.” Technological and Economic Development of
Economy.
[6] Bureau van Dijk. 2010. “SABI: Sistema de Análisis de Balances Ibéricos.” CD-ROM
(Version 36.1).
[7] Kahle, Brewster,
http://archive.org/web/.
and
Bruce
Gilliat.
3
2016.
“Wayback
Machine.”