Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Information Technology and Firm-Level Clustering: An Exploratory Point-Pattern Comparison of Services and Manufacturing Matt Wimble Eli Broad School of Business Michigan State University [email protected] Harminder Singh Eli Broad School of Business Michigan State University [email protected] Vallabh Sambamurthy Eli Broad School of Business Michigan State University [email protected] This exploratory study examines the impact of Information Technology (IT) on the geographic clustering of establishments within-firms, comparing service establishments to manufacturing establishments. Prior empirical studies had conflicting results as to the impact of IT on clustering, possibly resulting from using areal (area) data for analysis. Point-pattern analysis resolves many of the problems arising from area aggregations known as the Modifiable Area Unit Problem (MAUP). Using a sample includes 36,405 establishments from 844 firms we decompose the spatial clustering impacts of IT-intensive and non-IT intensive firms using point-pattern analysis. While the methods used in this study are applied towards within-firm clustering, the methodological insights should useful for any researcher interested in spatial impacts of IT. The study will focus upon the relation between IT and spatial clustering, necessitating discussion of how clustering is conceptualized in this study. Central to the concept of clustering is the concept of spatial scale or distance. In this study clusters are defined in relation to a specific spatial scale, which indicate an abnormally large number of points exist within a given area. It is important to note that notation of an area is integral to the notion of clustering and that areas have actual physical dimensions using distance measures. The implication is that when referring to a cluster one must refer to the size of the area in terms of distance. The opposite of spatial clustering is spatial regularity (Bailey and Gatrell, 1995). MAUP occurs in spatial data and is defined as “a problem arising from the imposition of artificial units of spatial reporting on geographical phenomenon resulting in the generation of artificial spatial patterns (Heywood, 1998).” MAUP results from two issues: 1) scale effects and 2) aggregation effects. Scale effects refer to the idea that spatial scales vary widely between zones. For example, a zip code in Manhattan is a radically different size from a zip code in North Dakota. Aggregation effects refer to the idea that statistical results will vary when smaller groups are grouped into larger groups or simply when an arbitrarily different zoning scheme is used, an example of this is shown in table 1. The project will use modified Point-Pattern measures which have been used in Astronomy, Plant Biology, Criminology, and Epidemiology to examine geospatial clustering (Bailey and Gatrell, 1995). Point-Pattern measure have two advantages in that they do not 1) result in MAUP issues and 2) do not make the assumption that the process is spatially continuous. Table 1. Aggregation effects from MAUP mean: 3.75; var: 2.6 mean: 3.75 ; var: 0.50 mean: 3.75; var: 0.00 1 mean: 3.75; var: 0.93 mean: 3.17; var: 2.11 Using a sample of 36,405 establishments for 844 firms the paper finds evidence that IT results in greater clustering at one spatial scale in services and the opposite effect, greater spatial regularity, at a different spatial scale for manufacturing establishments. Evidence is presented that increased IT is correlated with less clustering among manufacturing establishments within firms at spatial scales of 30-40 miles, 240-350 miles and 940-1340 miles. The study will also show increased clustering for service establishments within firms at 2-4 miles. This methodological approach provides a more detailed decomposition of the relationship between IT and location, which could be used to explain prior empirical research that appears to be in conflict. Methodology Data The data for this study is at the establishment (building) level. It includes establishments that are part of for-profit private commercial firms, for firms with greater than 10,000 employees, and firms with more than 10 establishments. Data is from the Harte-Hanks database, which is a commercial data source used for marketing to businesses, for the year 2000. The Harte-Hanks database includes detailed measures of IT assets at the establishment-level as well as location information geocoded as a specific longitude and latitude. The study region is the continental United States in order to minimize boundary effects by using a region with a significant real water boundary. The sample includes 36,405 establishments for 844 firms. The services sample includes 29,829 sites for 554 firms. The manufacturing sample includes 6,516 sites for 290 firms. Consistent with prior literature IT intensity is measured PCs per employee at the firm-level and then binary coded to indicate if the firm is above or below the mean. General Methodological Issues Most work to date on the geospatial impacts of information technology have used areal data to draw conclusions, which can give rise to erroneous conclusions as a result of the Modifiable Areal Unit Problem (MAUP). Of concern to research relating to clustering is the second-order spatial effects or the tendency of a point in closer spatial proximity to other point. The spatial dependence between point pairs can be more formally expressed as the intensity: λ(s) - mean number of events per unit area at s: where dsi and dsj are small regions around si and sj, and dsi, dsj are their respective areas (Bailey and Gatrell , 1995). Point-Pattern Measures The point-pattern measure used in this study are Ghat nearest-neighbor measure, implemented using the SPLANCS library in R. Since this measures make distance calculations in Euclidean space, a transformation of data out of the spherical projection form that is longitude and latitude is necessary. The Albers conic projection was used because provides a projection with little distortion for continental US projections (Weisstein, 2004). Prior to analysis the data is sorted by sector, then by IT intensity, and then by firm. An index file is created in order to calculate the nearest neighbor statistics on a firm-by-firm basis. Ghat will be calculated using the nndistG and Ghat commands in SPLANCS. The resulting vector is recorded in a matrix. This method was chosen to deal with severe non-homogeneous spatial distribution of the background data. The point-pattern measure used will be Ghat1, which is known as nearestneighbor measures. Ghat measures point-to-point neighbor distance. Ghat is typically displayed as a cumulative density function (CDF) with the distance on the x axis. Confidence intervals can 2 be calculated using relatively simple sampling strategies to indicate whether or not a given set of events displays significant second-order spatial effects at a given distance. Ghat is calculated as X is the point-event distance and W is the event-event distance. Where: w or x = is an arbitrary distance; #( ) = number of nearest neighbor points within w(x). For w = 0, Ghat = 0. As w increases, Ghat(w) rises monotonically to 1. Interpretation of Ghat clustering is indicated by a rapid rise for small w, slower for larger w, indicates many nearest neighbor events are quite close. Regularity is indicated by a flat or very slow increase for small w, more rapid increase for larger w, indicates n-n point are more distant. Regularity is indicated by a rapid rise for small x, slower for larger x, indicates events fill the study region. The use of point-pattern measures for this analysis is not without serious issues. Point-pattern measures were originally developed to analyze data where it could be reasonably assumed that events occur anywhere in the spatial region in question, such as tree location on some plot of land or stars in the sky, this assumption is know as a spatially homogeneous background population. Point-pattern measures have been adapted to situations where this assumption does not hold, such as epidemiologists examining cancer clusters or criminologists examining crime clusters. Ghat measures were calculated at 2 mile increments for the first 100 miles, at 10 mile increments from 100 to 500 miles and at 20 mile increments from 500 to 2500 miles for a total of different 190 spatial scales. Table 2. Summary of findings Proposition IT-intensive firms will exhibit both greater spatial clustering and greater spatial regularity Spatial clustering of IT-intensive firms will manifest at a smaller spatial scale that the scale at which spatial regularity will manifest Support? Partial Partial Spatial clustering will vary significantly from manufacturing to services. Yes References Available upon request 3 Comment Find support for clustering in services and spatial regularity in manufacturing Find support for clustering in service establishments at 2-4 miles and support for spatial regularity at 30-40 miles, 240-350 miles and 940-1340 miles Clustering occurs in services establishments, but spatial regularity occurs in manufacturing