Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Aggregating and Managing Big rEaltime Data in the Cloud: application to intelligent transport for Smart Cities Parisa Ghodous U. Claude Bernard, LIRIS Saint Martin d’Hères, 10th April, 2015 Urbanization’s rapid progress has modernized many people’s lives and engendered big issues Traffic Congestion, Energy Consumption and Pollution! + Urban computing! n Use data generated in cities, e.g., traffic flow, human mobility and geographical data for a continuous improvement of people’s lives, city operation systems, and the environment n Connects urban sensing, data management, data analytics, and service providing into a recurrent process 3 4 Key challenges + Urban data! 5 Environmental monitoring data Meteorological data (humidity, temperature, barometer pressure, wind speed, and weather conditions crawled from websites Mobile phone signals Identifying behaviours, citywide human mobility for detecting urban anomalies, city’s functional regions & urban planning Geographical data Commuting data Traffic monitoring and prediction, Urban planning, routing, and energy consumption analysis, POI, land use Traffic data Loop sensors, surveillance cameras, and floating cars, floating car data Social Networks data Social structure: a graph denoting relationship, interdependency, or interaction between users. User-generated social media, texts, photos, and videos, which contain user’s behaviour/interests Economy City’s economic dynamics: transaction records of credit cards, stock prices, housing prices, and people’s incomes Energy City’s energy consumption: obtained directly from sensors or inferred from data sources implicitly, e.g. from the GPS trajectory of a vehicle + Applications in Urban computing! Urban planning • Gleaning Underlying Problems in Transportation Networks • Discover Functional Regions • Detecting a City’s Boundary Transportation • Improving Driving Experiences • Improving Taxi Services: dispatching, recommendation, ridesharing • Improving Public Transportation Systems: bus, subway, bike Environment • Air quality • Noise pollution Social & Entertainment • • • • • Energy • Gas consumption • Electricity consumption Economy • Finding trends of city economy • Business placement Safety & Security • Detecting traffic anomalies: distance based, statistics based • Disaster detection and evacuation Estimate user similarity Finding local experts in a region Location recommendation Itinerary planning Life patterns and styles understanding 6 Urban computing framework! Service providing Urban planning, ease traffic, save energy, reduce air pollution n Urban data analytics Data mining, machine learning, visualization Urban data management Spatio-‐temporal index, stream, trajectory graph data management n Urban sensing & data acquisition n Energy consumption & privacy n Loose-controlled and non-uniform distributed sensors n Unstructured, implicit & noise data Computing with heterogeneous data n Learn mutually reinforced knowledge from heterogeneous data Social media n Both effective and efficient learning ability Meteorology Human Traffic Air mobility Energy quality n Visualization POI Road networks Urban sensing & data acquisition Participatory sensing, crowd sensing, mobile sensing n Hybrid systems blending the physical and virtual worlds 7 8 Overview of existing approaches + Current transport projects and apps! http://www.itsoverview.its.dot.gov avril 11, 2015 + Current transport projects and apps! TRAFFIC MANAGEMENT à Plan urban mobility Sensors: C-S (e.g.,google) Notification: push/pull Monitoring Collaborative (e.g. Wayze, Copenhagen Wheel) - Traffic thermometer (e.g., incident detection, Insight,Dublin) information exchange using crowdsourcing - Public transport monitor (e.g., Industrial transport, Urban insight, Cubic; smart ticketing,AllbikesNow,Buzzcar) Recommendation - Public transport monitor, (Optimod, Lyon, LUTB) (urban logistics) - Parking places - Infrastructure (ETINA intelligent traffic lights, LAPI lectures de plaques, Télépéage) - Guidance: Waze, google maps, Walkscore STANDARDIZATION OF ITS Urban ITSArchitectures avril 11, 2015 + Big data and intelligent transport! n Transdec: big data for transportation http://imsc.usc.edu/intelligent-transportation.html n How big data drives intelligent transportation, Rocky Moutain Institute http://www.greenbiz.com/blog/2012/08/15/how-big-data-drives-intelligent-transportation n Real-Time Data Capture and Management http://www.its.dot.gov/data_capture/data_capture.htm n Traffic analytics avril 11, 2015 12 Tailoring urban big data storage services Vehicles Position & Energy levels! Unexpected events communication! avril 11, 2015 Queue length at the recharging stations location! avril 11, 2015 Decision making for the autonomous vehicles to help piloting the vehicles to their destination ! avril 11, 2015 Ensuring vehicles availability, service continuity! avoiding accidents! avril 11, 2015 Ensuring optimal recharging, through mobile recharging units! avril 11, 2015 Real time problems with greedy tasks requiring ! heavy treatment! § Lots of data (volume) § Continuous (velocity) § Image, sound, compass, energy level, localisation… (variety) avril 11, 2015 + Problem statement! n Data collection (what sources ?: compass, video stream, LADAR…) n Data storage (keep or not and how long : missed parked car or someone crossing the road) n Data communication strategy optimise network (rate of communication, who’s initiative) n Scalability ( if we need extra vehicles: make it work with a 100 and with a 1000) n polyglot programming (different programming for different needs) n Data à information (image video à information de localisation) avril 11, 2015 + Objectives! n Develop service using big data for decision making n Using Cloud and Streaming as tools n Insuring that big data, cloud and streaming work well together 22 Managing transport big data in Smart Cities + Our vision: everything as a service! Decision making support services Data analytics Services Integration & aggregation Extended UnQL platform Data storage Services Neo4J CouchDB Clean data collections MongoDB Data cleaning & processing Services PIG HADOOP Data harvesting Services REST FLUME 23 24 Making global transport decisions avril 11, 2015 Application server (receiving thousands of requests) Storage as a Service Fragmented and duplicated data Data streams Data storage demands Data streams On demand data Recurrent data 27 Looking for a taxi avril 11, 2015 29 Disseminating events avril 11, 2015 31 Making global transport decisions avril 11, 2015 Data streams (clients position/ requests) Predict request crowd at in a city region @ a given hour Compute a recommendation Battery charge place, Target region according to traffic On demand data Prediction Recommendation requests State of the traffic organized by region Traffic situation Recommendation 34 Research milestones + Ongoing work! n QDB benchmark extends YCSB: FaultTolerance, Recoverability and TimeBehaviour n Pivot data model for representing NoSQL stores data models n Sample application: Shopping system1 (ProductInfo) n Document data stores: MongoDB, Couchbase, VoltDB, Redis, Neo4J n n 35 Cluster of four Ubuntu 12.04 servers deployed with extra large VM instances (8 virtual cores and 14 GB of RAM) in Windows Azure2 Distributed polyglot (big) database engineering n Model2Roo: engineering data storage solutions for given data collections n ExSchema for supporting the maintenance of a polyglot storage solution 1 McMurtry, D., Oakley, A., Sharp, J., Subramanian, M., Zhang, H.: Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence Microsoft patterns & practices, Microsoft (2013) 2 http://www.windowsazure.com/ 3 http://forge.puppetlabs.com/puppetlabs/ 4Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki + Future directions! n Balanced crowd-sensing n Data is non-uniformly distributed in geographical and temporal spaces. In some locations, we may have the data much more than what we really need. n In the places where we may not have enough data or even do not have data at all, some incentives that can motivate users to contribute data should be considered How to configure the incentive for different locations and time periods so as to maximize the quality of the received data (e.g., the coverage or accuracy) for a specific application is yet to explore. A down-sampling method, e.g., compressive sensing, could be useful to reduce a system’s communication loads. n n n n n n n n n Skewed data distribution Managing and indexing multimode data sources Knowledge fusion Exploratory and interactive visualization for multiple data sources Algorithm integration Intervention-based analysis and prediction 36 + Future directions! ü n Balanced crowdsensing Skewed data distribution n n n n n n n n Having the entire dataset may be always infeasible in an urban computing system Some information is transferrable from the partial data to the entire dataset: the travel speed of taxis on roads can be transferred to other vehicles that are also traveling on the same road segment Some information do not: the traffic volume of taxis on a road may be different from private vehicles Managing and indexing multimode data sources Knowledge fusion Exploratory and interactive visualization for multiple data sources Algorithm integration Intervention-based analysis and prediction 37 + Future directions! Balanced crowdsensing ü Skewed data distribution n Managing and indexing multimode data sources ü n Hybrid index that can simultaneously manage multiple types of data (e.g., spatial, temporal and social media) Knowledge fusion n Exploratory and interactive visualization for multiple data sources n Algorithm integration n Intervention-based analysis and prediction n 38 + Future directions! ü ü ü n Balanced crowdsensing Skewed data distribution Managing and indexing multimode data sources Knowledge fusion n n n n n Learn mutually reinforced knowledge from multiple data sources Deep understanding of each data source and an effective usage of different data sources in different parts of a computing framework Exploratory and interactive visualization for multiple data sources Algorithm integration Intervention-based analysis and prediction 39 + Future directions! ü ü ü ü n Balanced crowdsensing Skewed data distribution Managing and indexing multimode data sources Knowledge fusion Exploratory and interactive visualization for multiple data sources n n n n n Investigate the implicit relationship among multiple data sources through an exploratory visualization in spatial and spatio-temporal spaces Which factor is more prominent in impacting the air quality of a given location or in a given time period? What is the major root cause of PM2.5 in the winter of Sao Paolo? Algorithm integration Intervention-based analysis and prediction 40 + Future directions! ü ü ü ü ü n Balanced crowdsensing Skewed data distribution Managing and indexing multimode data sources Knowledge fusion Exploratory and interactive visualization for multiple data sources Algorithm integration: to provide an end-to-end urban computing scenario n n n n Combine data management techniques with machine learning algorithms to provide a both efficient and effective knowledge discovery ability. Integrating spatio-temporal data management algorithms with optimization methods, to solve the large-scale dynamic ridesharing problem. Visualization techniques should be involved in a knowledge discovery process, working with machine learning and data mining algorithms. Intervention-based analysis and prediction 41 + Future directions! ü ü ü ü ü ü n Balanced crowdsensing Skewed data distribution Managing and indexing multimode data sources Knowledge fusion Exploratory and interactive visualization for multiple data sources Algorithm integration Intervention-based analysis and prediction: predict the impact of a change in a city’s setting n How a region’s traffic will change if a new road is built in the region? n To what extent the air pollution will be reduced if we remove a factory from a city? How people’s travel patterns will be affected if a new subway line is launched? n 42 + Parisa Ghodous Genoveva Vargas-Solar Catarina Ferreira Christine Collet Gavin R. Kemp + Urban sensing & data acquisition! n Traditional sensing and measurement: installing sensors dedicated to some applications n Passive crowd sensing n Participatory sensing 45 + Urban sensing & data acquisition! n Traditional sensing and measurement n Passive crowd sensing: wireless cellular networks are built for mobile communication between individuals to sense city dynamics (e.g., predict traffic conditions and improve urban planning) n n n n n Sensing City Dynamics with GPS-Equipped Vehicles: mobile sensors continually probing the traffic flow on road surfaces processed by infrastructures that produce data representing citywide human mobility patterns Ticketing Systems of Public Transportation (e.g., model the city-wide human mobility using transaction records of RFID-based cards swiping) Wireless Communication Systems (e.g., call detailed records CDR) Social Networking Services (e.g., geo-tagged posts/photos, posts on natural disasters analysed for detecting anomalous events and mobility patterns in the city) Participatory sensing 46 + Urban sensing & data acquisition! n Traditional sensing and measurement n Passive crowd sensing n Participatory sensing: people obtain information around them and contribute to formulate collective knowledge to solve a problem (i.e., human as a sensor) n n Human crowd-sensing: users willingly sense information gathered from sensors embedded in their own devices (e.g., GPS data from a user’s mobile phone used to estimate real- time bus arrivals) Human crowd-sourcing: users are proactively engaged in the act of generating data: reports on accidents, police traps, or any other road hazard (e.g. Waze), citizens turning into cartographers, to create open maps of their cities 47 + Urban data management! Harness a variety of heterogeneous data to quickly answer users’ instant queries, e.g. predicting traffic conditions and forecasting air pollution n Stream and Trajectory Data Management n n n n n Data reduction techniques for trajectories Noise filtering techniques for trajectories Techniques for indexing and query trajectories Techniques dealing with uncertainty of a trajectory Trajectory pattern mining n Graph Data Management n Hybrid Indexing Structures 48 + Urban data management! Harness a variety of heterogeneous data to quickly answer users’ instant queries, e.g. predicting traffic conditions and forecasting air pollution n Stream and Trajectory Data Management n Graph Data Management: represent urban data, such as road networks, subway systems, social networks, and sensor networks Find the top-k tourist attractions around a user that are most popular in the past three months n Graphs usually associated with a spatial property, resulting in many spatial graphs [Angles and Gutierrez 2008] n For example, the node of a road network has a spatial coordinate and each edge denoting a road segment has a spatial length Graphs also contain temporal information. n n n For instance, the traffic volume traversing a road segment changes over time, and the travel time between two landmarks is time dependent: st-graphs [Hong and Zheng et al. 2014] Hybrid Indexing Structures 49 + Urban data management! Harness a variety of heterogeneous data to quickly answer users’ instant queries, e.g. predicting traffic conditions and forecasting air pollution n Stream and Trajectory Data Management n Graph Data Management n Hybrid Indexing Structures: harness a variety of data and integrate them into a data mining model using hybrid indexing structures that can well organize different data sources n n n Combining POIs, road networks, traffic, and human mobility data simultaneously A city partitioned into grids by using a quad-tree- based spatial index where each leaf node (grid) of the spatial index maintains two lists storing the POIs and road segments Each road segment ID points to two sorted list: a list of taxi IDs sorted by their arrival time 𝑡𝑎 at the road segment; a list of drop-off and pick-up points of passengers sorted by the pick-up time (𝑡𝑝) and drop-off time (𝑡𝑑). 50 + Urban data management! Harness a variety of heterogeneous data to quickly answer users’ instant queries, e.g. predicting traffic conditions and forecasting air pollution n Stream and Trajectory Data Management n Graph Data Management n Hybrid Indexing Structures: harness a variety of data and integrate them into a data mining model using hybrid indexing structures that can well organize different data sources n n n Combining POIs, road networks, traffic, and human mobility data simultaneously A city partitioned into grids by using a quad-tree- based spatial index where each leaf node (grid) of the spatial index maintains two lists storing the POIs and road segments Each road segment ID points to two sorted list: a list of taxi IDs sorted by their arrival time 𝑡𝑎 at the road segment; a list of drop-off and pick-up points of passengers sorted by the pick-up time (𝑡𝑝) and drop-off time (𝑡𝑑). 51 + Knowledge fusion across heterogeneous data! Harness a variety of heterogeneous data sources to effectively fusion the knowledge n Fusion different data sources at a feature level: put together the features extracted from different data sources into one feature vector before feeding it into a data analytics model n Use different data at different stages (e.g., first partition a city into disjoint regions by major roads and then use human mobility data to glean the problematic configuration of a city’s transportation network) n Feed different data sets into different parts of a model simultaneously n Infer the functional regions in a city using road network data, points of interests, and human mobility learned from a large number of taxi trips. 52 + Knowledge fusion across heterogeneous data! Harness a variety of heterogeneous data sources to effectively fusion the knowledge n Fusion different data sources at a feature level: put together the features extracted from different data sources into one feature vector before feeding it into a data analytics model n Use different data at different stages (e.g., first partition a city into disjoint regions by major roads and then use human mobility data to glean the problematic configuration of a city’s transportation network) n Feed different data sets into different parts of a model simultaneously n Infer the functional regions in a city using road network data, points of interests, and human mobility learned from a large number of taxi trips. 53 + Urban data visualization! Not solely about displaying raw data and presenting results, about detecting and describing patterns, trends, and relations in data, motivated by certain purposes of investigation n Spatial distributions changing over time (i.e., spaces in time) n Profiles of local temporal variation distributed over space (i.e., time in spaces) [Andrienko 2010] 54