Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Big Data and Complex Networks Analytics Timos Sellis, CSIT Kathy Horadam, MGS Big Data – What is it? Most commonly accepted definition, by Gartner (the 3 Vs) “Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” 2 Big Data – some stats • high-volume, high-velocity and high-variety > 2 million emails sent 100,000 tweets 571 websites added 250,000 items sold on amazon 34,722“likes” $272,020 spend on web shopping Every minute… (http://www.domo.com/blog/blog /2012/06/08/how-much-data-iscreated-every-minute/) 3 Complex Networks – What is it? Network with significant topological features common in real-world networks eg most technological, biological and social networks Rapidly expanding field bringing together mathematics, engineering, computer science, sociology, epidemiology, physics, biology. 4 Big Data and Complex Network Synergies •Both share interesting properties – Large scale (volume) – Complexity (variety) – Dynamics (velocity) •Interesting analytics algorithms •Many applications with both characteristics (social networks, utility networks, security, etc) 5 Big Data - Research Issues (1) •Main stream –Infrastructure and Architectures (New large scale data architectures, Cloud architectures) –Models (Data representation, storage, and retrieval) and –Data Access (Query processing and optimization, Privacy, Security) 6 Big Data - Research Issues (2) •Complex Data Analytics –Computational, mathematical, statistical, and algorithmic techniques for modelling high dimensional data, large graphs, and complex (interrelated) data –Learning, inference, prediction, and knowledge discovery for large volumes of dynamic data sets –Data retrieval and data mining to facilitate pattern discovery, trend analysis and anomaly detection –Dimensionality reduction, sparse data 7 Big Data - Research Issues (3) •Highly Streaming Data –Positional streams –Social network data –Mobile app data –Game data 8 Big Data - Research Issues (4) •Data Integration –Findability and search –Information fusion of multiple data sources –Semantic integration –Recommendation systems 9 Networks- Research Issues (1) Analytics Mathematical models of simpler networks do not show the significant topological features. – Network structure and community detection – Knowledge discovery, especially of characteristic small communities (motifs) in large networks – Bipartite networks 10 Networks- Research Issues (2) Dynamics – Algorithm development: machine learning, high dimensional data, large networks – New topological, statistical techniques – Eg. persistent homology: track connectivity changes RMIT could be a national leader if we could develop this further 11 Networks- Research Issues (3) Detection and Prediction –Identification of influential or hidden nodes or communities across networks –Structural anomaly detection (via supervised or unsupervised learning) –Model transmission or flow through network 00 !! Correlation=94% Data Fit 50 0 06 June 2001 1st June 2002 1st June 2003 1st June 2004 1st June 2005 1st June 2006 1st June 2007 year Fitting period 1st June 2008 1st June 2009 1st June 2010 1st June 2011 1st June 2001 Extrapolation 12 Networks- Research Issues (4) •Location and Spatial Networks – Prioritised habitats 13 Possible Research Themes (1) • Situation Awareness applications (Disaster Management, Fault detection) • Resource Management applications (Ecology, environment, power network management) • Public Health applications (Epidemics, medical records) • Financial and Forensic applications (Fraud detection, money laundering) • Smart cities applications (Transport, Energy) 14 Possible Research Themes (1) • Security applications (Biometrics, computer and information security) • Positioning Technologies applications (Agriculture, Forest health, real-time tracks, large mobile networks) • Education (Learning analytics) 15 RMIT today • High-interest, cutting-edge and well-funded research in: • Large scale Data Integration – Data quality, etc • Sensor networks – Data driven complex networks, Sensor network data, Distributed Sensor Networks • Complex Networks/Graphs – network/graph models and structure detection, graph mining, network/graph analysis, prediction, identification and security • Positioning apps/technologies • Power and Transport networks, network analysis for detecting possible problems, streamed metering data, real time analytics 16 RMIT today - Examples Former Employees Current Employees Contractors Insiders Trusted Business Cloud Providers Partners Anomaly detection Smart metering Money laundering Epidemic spread Biometric Identification 17 RMIT tomorrow • Foster collaboration between many disciplines towards large scale information management. For example, planners, designers and technologists can collaborate on designing buildings fitted with sensors using intelligent optimisation techniques. • Plan for a major collaborative effort, like a CRC. • Build long term partnerships with key international and national public and private organizations. 18 Preliminary SWOT analysis Strengths Weaknesses 1. 2. 3. 4. 5. 6. 1. No major results/history in the area 2. Big data and complex networks on its own is not recognised as an RMIT strength Infrastructure/data management Complex network dynamics Location based services Information retrieval Optimization Theoretical analysis Opportunities Threats 1. NICTA funding potential for RMIT centre 1. A couple of CoE proposals 2. Cover different application areas, submitted compared to on-going activities 2. Some other on-going efforts 3. Identify a short term impact opportunity (CRCs, government CoE) 4. Identify an opportunity that can attract 3. Fragmentation based on an industry sector (e.g. logistics, energy disciplines, due to cultural and positioning/mobile applications) difference 19