Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Is web data capable of detecting firms’ activity status? Desamparados Blazquez ([email protected])1 and Josep Domenech ([email protected])12 Keywords: business demography statistics, firms’ activity status, web data, website 1. INTRODUCTION Business demography is one of the economic aspects that attracts more attention from governments and policy makers. Indeed, most official statistics institutions carry out detailed surveys to monitor the active population of firms, their birth, survival and death. The interest on business demography statistics relies on the important role they play in economic growth, productivity and employment [1]. In the Digital Era, the important role of the Internet in economy and society, together with the development of advanced computer systems, opens up new ways of monitoring economic activities and, thus, business demography. The Internet and the World Wide Web (WWW) have become basic tools within the daily activities of individuals and companies. These have massively developed their websites, where they describe their main activities, products and strategies, in order to have a presence in the digital channels. Therefore, corporate website contents are necessarily connected to the business activity to some extent, as has been recently verified for topics such as technology adoption [2], innovation activities [3], firm growth [4] and firm export orientation [5]. Given that firm’s activities emerge on the website, the question on whether firm's inactivity is also manifested on the website arises. Since keeping the website updated requires firms to mobilize some financial or working resources, it is plausible that only active and healthy firms would invest their resources to that end. Therefore, if a company dies, this event is likely to be manifested in its website as lacking updates or, simply, as a being down. 2. METHODS The sample for this study included 780 companies with website from manufacturing, services and other sectors (NACE Rev.2 codes 10-95) established in Spain and that were active in 2008. The dataset consists of a panel of economic and online data for these firms between years 2008 and 2014. The economic data were retrieved from the SABI database [6]. The online information was obtained by accessing the corporate websites with the Wayback Machine of the Internet Archive [7], which is a public and free repository of snapshots of about 484 billion web pages. It captures and stores websites on a daily basis, allowing users to access them and track their evolution over time. To account for changes in the corporate websites, we queried the Wayback Machine with the URL of each company’s website and checked the homepage for each year under study. The observed changes were coded into the variable Web_status, which could take four different values: “1” if the website was down; “2” if the website was unchanged; “3” if the website had experimented minor changes, such as including or removing pictures or 1 Department of Economics and Social Sciences, Universitat Politècnica de València, Valencia (Spain) 2 Department of Economics and Social Sciences, Un iversitat Politècnica de València, Valencia (Spain) 1 sections; and “4” if the website had experimented major changes, such as having a completely renewed design compared to the previous year’s version. For assessing the relation between the WWW and firms’ activity status, a multi-period logistic regression was used, where the dependent variable was the firm’s status (Active or Inactive) and the variable Web_status was used as explanatory along with the periodspecific effects. 3. RESULTS Table 1 shows the estimation results, including the estimated regression coefficients (β), Odds Ratios (OR), Standard Errors (SE) and p values. As one can see, the observed web status have a statistically significant effect on the probability of being active. Table 1 - Multi-period logistic regression with website status Variables (Intercept) β 3.483 OR 32.557 SE 0.598 p value 0.000 Web_status(2) 1.628 5.094 0.227 0.000 Web_status(3) 3.423 30.661 0.390 0.000 Web_status(4) 3.970 52.985 0.727 0.000 Log-likelihood −349.349 Notes: Time dummies were included. As the activity in the website increases, the probability of a firm being active increases too. The estimate that corresponds to having an unchanged website (Code 2) is positive, which means that having just a working website increases the probability that a firm is active with respect to having a down website (Code 1, which was used as the baseline level). Concretely, the probability that a firm with an unchanged website is active is 5 times that of a firm whose website is down, as the OR indicates. Updating websites rather to a minor (Code 3) or major (Code 4) extent also increases the probability that a firm is active, as expected. The probability of a firm being active when it moderately changed its website is 30 times compared to a firm whose website is down, while it is more than 50 times when a website has been totally renewed. These results are in line with what was hypothesized: healthy firms invest more in maintaining and updating their websites, so the more activity they evidence in their website, the more probable is that they are active. It is important to remark that this does not mean that updating websites contributes to firms remaining active, but it is a strong reflect of the firm's active status. 4. CONCLUSIONS This work analyzed and confirmed the connection of a company’s activity status to the corporate website’s activity status. Multi-period logistic regression estimations pointed out that the corporate website clearly reflects the firm’s activity status. These results open up new possibilities for monitoring business demography. Web data capture a firm’s status, while access to corporate websites is open and inexpensive, making it possible to build online indicators to monitor and forecast business death rates. This could be done in a very short period thanks to the digital nature of the WWW, which allows 2 firms’ information to be automatically gathered and analyzed. This way, policy-makers and other consumers of official statistics would have the chance to obtain short-term estimates of the business demography, which could turn into more informed decisions. REFERENCES [1] Eurostat, and OECD. 2007. Eurostat-OECD Manual on Business Demography Statistics. Luxembourg: Office for Official Publications of the European Communities. [2] Arora, S. K., Youtie, J., Shapira, P., Gao, L., and Ma, T. (2013). Entry strategies in an emerging technology: a pilot web-based study of graphene firms. Scientometrics, 95, 1189-1207. [3] Gök, Abdullah, Alec Waterworth, and Philip Shapira. (2015). “Use of web mining in studying innovation.” Scientometrics, 102, 653 – 671. [4] Li, Yin, Sanjay Arora, Jan Youtie, and Philip Shapira. In Press. “Using web mining to explore Triple Helix influences on growth in small and mid-size firms.” Technovation. [5] Blazquez, Desamparados, and Josep Domenech. In Press. “Web Data Mining for Monitoring Business Export Orientation.” Technological and Economic Development of Economy. [6] Bureau van Dijk. 2010. “SABI: Sistema de Análisis de Balances Ibéricos.” CD-ROM (Version 36.1). [7] Kahle, Brewster, http://archive.org/web/. and Bruce Gilliat. 3 2016. “Wayback Machine.”