Computational Social Science
1. A Knowledge Discovery and Data Mining workshops in Anaheim in 1991. Now, there is
much more to it with two journals.
2. Why such success in KDD?
a. Exploaratory
b. Ecumenical
i. Not fixated on a single class of techniques
c. Externally Focused
d. Computational Social Science is the interdisciplinary study of complex social
systems and their investigation through computational modeling and related
e. CSS is a convergence of
i. New data collections and availability
ii. New methods, languages and formalisms
iii. New questions and theory that have associations with many of the social
f. Social Science is central
i. Science- how are social systems formed?
ii. Engineering- How do we build systems that make sure the people
interactions with the system go well
iii. Business- what do the customers want and how can we best determine
their purchasing patterns?
iv. Government
g. Examples of existing work
i. Six degrees of Separation with Milgram’s work
ii. TRANSIMS- a city level transportation simulator in the 90s
3. London in 1854
a. Very filthy place
b. Cholera Epidemic in the 1850s.
i. Could not find reason for cholera but John Snow thought it was because of
the sewage water interacting with drinking water. This contaminated water
was then used by people and they got sick.
ii. Theory was not popular.
iii. John Snow attempts to spatially map the data he collects about the deaths
related with cholera.
iv. He did not have computation to really work with the data
4. Themes
a. Relational, Temporal and Spatial Models
i. What does snow need to represent?
1. Entities
2. Attributes
3. Relationships
4. Spatio-temporal extent and variation
ii. What can we respreset
1. Probabilistic models
2. Issue: Modeling social systems
3. Explain algorithm behavior
a. Association rules
4. Issue: acceptance by social scientists
b. Casual Analysis
i. Statistical association alone is insufficient to distinguish among different
casual models
c. Performing Quasi-Experiments to find causality
i. Snow performed this on two different water companies that were
supplying water to a specific region
ii. One company’s water proved to be eight times more deadly
d. Local Methods and Global Models
5. CSS and Privacy
a. Snow had much freer access to data due to the lax of privacy laws