Download doc - Michigan State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Information Technology and Firm-Level Clustering: An Exploratory
Point-Pattern Comparison of Services and Manufacturing
Matt Wimble
Eli Broad School of Business
Michigan State University
[email protected]
Harminder Singh
Eli Broad School of Business
Michigan State University
[email protected]
Vallabh Sambamurthy
Eli Broad School of Business
Michigan State University
[email protected]
This exploratory study examines the impact of Information Technology (IT) on the geographic
clustering of establishments within-firms, comparing service establishments to manufacturing
establishments. Prior empirical studies had conflicting results as to the impact of IT on clustering,
possibly resulting from using areal (area) data for analysis. Point-pattern analysis resolves many
of the problems arising from area aggregations known as the Modifiable Area Unit Problem
(MAUP). Using a sample includes 36,405 establishments from 844 firms we decompose the
spatial clustering impacts of IT-intensive and non-IT intensive firms using point-pattern analysis.
While the methods used in this study are applied towards within-firm clustering, the
methodological insights should useful for any researcher interested in spatial impacts of IT.
The study will focus upon the relation between IT and spatial clustering, necessitating
discussion of how clustering is conceptualized in this study. Central to the concept of clustering is
the concept of spatial scale or distance. In this study clusters are defined in relation to a specific
spatial scale, which indicate an abnormally large number of points exist within a given area. It is
important to note that notation of an area is integral to the notion of clustering and that areas have
actual physical dimensions using distance measures. The implication is that when referring to a
cluster one must refer to the size of the area in terms of distance. The opposite of spatial
clustering is spatial regularity (Bailey and Gatrell, 1995).
MAUP occurs in spatial data and is defined as “a problem arising from the imposition of
artificial units of spatial reporting on geographical phenomenon resulting in the generation of
artificial spatial patterns (Heywood, 1998).” MAUP results from two issues: 1) scale effects and
2) aggregation effects. Scale effects refer to the idea that spatial scales vary widely between
zones. For example, a zip code in Manhattan is a radically different size from a zip code in North
Dakota. Aggregation effects refer to the idea that statistical results will vary when smaller groups
are grouped into larger groups or simply when an arbitrarily different zoning scheme is used, an
example of this is shown in table 1. The project will use modified Point-Pattern measures which
have been used in Astronomy, Plant Biology, Criminology, and Epidemiology to examine
geospatial clustering (Bailey and Gatrell, 1995). Point-Pattern measure have two advantages in
that they do not 1) result in MAUP issues and 2) do not make the assumption that the process is
spatially continuous.
Table 1. Aggregation effects from MAUP
mean: 3.75; var: 2.6
mean: 3.75 ; var: 0.50
mean: 3.75; var: 0.00
1
mean: 3.75; var: 0.93
mean: 3.17; var: 2.11
Using a sample of 36,405 establishments for 844 firms the paper finds evidence that IT results in
greater clustering at one spatial scale in services and the opposite effect, greater spatial regularity,
at a different spatial scale for manufacturing establishments. Evidence is presented that increased
IT is correlated with less clustering among manufacturing establishments within firms at spatial
scales of 30-40 miles, 240-350 miles and 940-1340 miles. The study will also show increased
clustering for service establishments within firms at 2-4 miles. This methodological approach
provides a more detailed decomposition of the relationship between IT and location, which could
be used to explain prior empirical research that appears to be in conflict.
Methodology
Data
The data for this study is at the establishment (building) level. It includes establishments that are
part of for-profit private commercial firms, for firms with greater than 10,000 employees, and
firms with more than 10 establishments. Data is from the Harte-Hanks database, which is a
commercial data source used for marketing to businesses, for the year 2000. The Harte-Hanks
database includes detailed measures of IT assets at the establishment-level as well as location
information geocoded as a specific longitude and latitude. The study region is the continental
United States in order to minimize boundary effects by using a region with a significant real
water boundary. The sample includes 36,405 establishments for 844 firms. The services sample
includes 29,829 sites for 554 firms. The manufacturing sample includes 6,516 sites for 290 firms.
Consistent with prior literature IT intensity is measured PCs per employee at the firm-level and
then binary coded to indicate if the firm is above or below the mean.
General Methodological Issues
Most work to date on the geospatial impacts of information technology have used areal data to
draw conclusions, which can give rise to erroneous conclusions as a result of the Modifiable
Areal Unit Problem (MAUP).
Of concern to research relating to clustering is the second-order spatial effects or the tendency of
a point in closer spatial proximity to other point. The spatial dependence between point pairs can
be more formally expressed as the intensity: λ(s) - mean number of events per unit area at s:
where dsi and dsj are small regions around si and sj, and dsi, dsj are their respective areas (Bailey
and Gatrell , 1995).
Point-Pattern Measures
The point-pattern measure used in this study are Ghat nearest-neighbor measure, implemented
using the SPLANCS library in R. Since this measures make distance calculations in Euclidean
space, a transformation of data out of the spherical projection form that is longitude and latitude
is necessary. The Albers conic projection was used because provides a projection with little
distortion for continental US projections (Weisstein, 2004).
Prior to analysis the data is sorted by sector, then by IT intensity, and then by firm. An index file
is created in order to calculate the nearest neighbor statistics on a firm-by-firm basis. Ghat will be
calculated using the nndistG and Ghat commands in SPLANCS. The resulting vector is recorded
in a matrix. This method was chosen to deal with severe non-homogeneous spatial distribution of
the background data. The point-pattern measure used will be Ghat1, which is known as nearestneighbor measures. Ghat measures point-to-point neighbor distance. Ghat is typically displayed
as a cumulative density function (CDF) with the distance on the x axis. Confidence intervals can
2
be calculated using relatively simple sampling strategies to indicate whether or not a given set of
events displays significant second-order spatial effects at a given distance. Ghat is calculated as
X is the point-event distance and W is the event-event distance. Where: w or x = is an arbitrary
distance; #( ) = number of nearest neighbor points within w(x). For w = 0, Ghat = 0. As w
increases, Ghat(w) rises monotonically to 1. Interpretation of Ghat clustering is indicated by a
rapid rise for small w, slower for larger w, indicates many nearest neighbor events are quite close.
Regularity is indicated by a flat or very slow increase for small w, more rapid increase for larger
w, indicates n-n point are more distant. Regularity is indicated by a rapid rise for small x, slower
for larger x, indicates events fill the study region. The use of point-pattern measures for this
analysis is not without serious issues. Point-pattern measures were originally developed to
analyze data where it could be reasonably assumed that events occur anywhere in the spatial
region in question, such as tree location on some plot of land or stars in the sky, this assumption
is know as a spatially homogeneous background population. Point-pattern measures have been
adapted to situations where this assumption does not hold, such as epidemiologists examining
cancer clusters or criminologists examining crime clusters. Ghat measures were calculated at 2
mile increments for the first 100 miles, at 10 mile increments from 100 to 500 miles and at 20
mile increments from 500 to 2500 miles for a total of different 190 spatial scales.
Table 2. Summary of findings
Proposition
IT-intensive firms will exhibit both greater spatial
clustering and greater spatial regularity
Spatial clustering of IT-intensive firms will manifest at
a smaller spatial scale that the scale at which spatial
regularity will manifest
Support?
Partial
Partial
Spatial clustering will vary significantly from
manufacturing to services.
Yes
References
Available upon request
3
Comment
Find support for clustering in services and
spatial regularity in manufacturing
Find support for clustering in service
establishments at 2-4 miles and support for
spatial regularity at 30-40 miles, 240-350 miles
and 940-1340 miles
Clustering occurs in services establishments,
but spatial regularity occurs in manufacturing