Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mapping Ideas from Cyberspace to Realspace. Funded by NSF CyberEnabled Discovery and Innovation (CDI) program. Award # 1028177. (2010-2014) http://mappingideas.sdsu.edu/ Cyberspace and realspace: Distilling useful information Li An San Diego State University [email protected] Presented at 2013 NSF CDI Project Meeting San Diego, California, August 8, 2013 Principle Investigator: Dr. Ming-Hsiang Tsou [email protected], (Geography), Co-PIs: Dr. Dipak K Gupta (Political Science), Dr. Jean Marc Gawron (Linguistic), Dr. Brian Spitzberg (Communications), Dr. Li An (Geography). San Diego State University, USA. Cyberspace data—climate change and global warming •Search on Yahoo •Record • website location (IP registered address) • rank (r) of the website (measure of popularity) •Map all websites: (x, y, r) •Geoprocess: krigging/interpolation (county) •Correct background noise Krigging maps of climate change Time 4 What do such data represent? Cyberspace data Total population (county) Why? Some people do not have/use website Some do not express their opinion on web, … Classification of cyber search data • Blog • Commercial websites • Educational • Entertainment • Forum • Governmental • Informational • News • NGO • Social media website • Special Interest Groups • Offline An example of classified data (Climate change) 03-04-12 Edu 11-11-11 Edu 03-05-13 Edu 07-01-12 Edu 11-04-12 Edu An example of classified data (Global warming) 11-10-11 Edu 03-03-12 Edu 03-03-13 Edu 07-01-12 Edu 11-04-12 Edu What do such data represent? Why? Some educators do not have a website Some do not express their opinion on web, … Edu Edu Edu Cyberspace data Gov Total population (county) Gov Gov Why? Some government agencies do not have a website Some do not express their opinion on web, … News What happens on the ground? •Socioeconomic and demographic data •Climate data •Political membership data at county level (partially available) 9 The link between real and cyber space Y = f (independent variables) Edu Cyberspace data Gov Y = f (independent variables) ----- if we use all realspace data Edu Edu Total population (county) Gov Gov ----- if we use cyberspace data for Y News Challenge Data type Reliability Large coverage Realspace data High Difficult Slow Cyberspace data Variable Easy Quick Return time Regression “backward”: explore data usefulness • Assume realspace data = f (independent variables) reasonable results (with good fit) Often unavailable • If cyberspace data = f (independent variables) reasonable results (with good fit) Shall we increase our confidence on the reliability of cyberspace data (in place of realspace data)? Discrete-time stepwise regression: hypothesis • For some topics or keywords (such as climate change and global change), people’s attitudes and perceptions are largely constant over a reasonable time scale. Therefore: • Predictor variables should be largely the same over time • If predictor variables that are selected change over time • either attitudes and perceptions change • or the measure of attitudes and perceptions is questionable [Note values of predictor variables are largely constant] Measure: consistency index (CI) Candidate variables Time 1 Time 2 X1 x* x Time 3 Time 4 x X2 X3 x x X4 X5 x x x x x x x x x X6 X7 Time 5 x x X8 X9 X10 x x x * Variable X1 is selected as a significant predictor at 0.05 level. 𝑐𝑜𝑢𝑛𝑡𝑠 𝑜𝑓 𝑐ℎ𝑒𝑐𝑘 𝑚𝑎𝑟𝑘𝑠 19 CI = = = 0.38 𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑐𝑜𝑢𝑛𝑡𝑠 10 × 5 Consistency index example (Pro-GW T1 COMMLONE T2 T3 T4 x DemocraticPercEvan EMPRATIO FOREIGN HIGHPLUS Intercept x x x x x MaxDrySpel N10_OWNVAC x x x x x x x x x x x x x x x N10ASIAN x x N10AVGHHLD x N10FAMHHLD x x x x N10HU_OCC T5 x N10M65OVER x x x x x N10NONREL x x x N10ONERACE x x x x x N10W65PLUS PrecipIndx x x STATEBORN x x x x x x SummerDays x x x x x VETERANS x x Counts CI 57 0.57 Measure: consistency index (CI) CI value Edu Gov Locational accurate Pro Unclassified data Climate change 0.59 0.45 0.57 0.45 0.41 Global warming 0.85 0.58 0.50 0.57 0.36 Tentative conclusion: 1) Search results from educational websites for “global warming” are most consistent; 2) Data classification helps in extracting useful information. Why this regression-based approach? • Incorporate mechanism (independent variables) • Capture big trend and pattern • Some degree of fuzziness allowed • Allow for data from multiple locations (all US counties) and multiple times (5 here) • Allow for calculation of one overall index (CI) Acknowledgement • NSF CDI Program • SDSU Department of Geography • The whole SDSU CDI team. In particular: • Evan Casey, Elias Issa, Ninghua Wang, and Sarah Wandersee Thank you 18 Tentative decision Work on search results from education websites on “global warming” (“climate change” Okay) GW030313, Edu Point data interpolation/krigging • A good way to create spatially contiguous data (surface)? • What other methods? 21 Hazard • An indicator for instantaneous risk or potentiality • For an event to take place • Related to internal and external features • Great flexibility • At a time POINT or average at a time interval • At individual or aggregate levels 22 How to derive hazard ? hazard • Theoretical hazard curves • Weibull Distribution Weibull Distribution α = 1.5 23 α=1 • Gompertz Distribution α=0 α = -0.5 • Exponential Distribution… • Empirical hazard calculation Time • Counts of events • Timing of events Hazard of change • Encapsulate historical events 23 Unit A Unit B Unit C 0 20 40 23 60 80 time units What is survival analysis? A class of statistical methods for studying the occurrence and timing of events • Unique dependent variable: hazard • Great in handling imprecise time measures • Great in handling time-dependent variables • Great in handling information uncertainty and handling imbalanced data