Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
bigDAARE: Big Data Analytics for Renewable Energy Mark J. Embrechts Dept. Industrial and Systems Engineering Rensselaer Polytechnic Institute, Troy, NY, USA • • • • • What is Data Mining? Data Mining Big Data Analytics Data-Driven Science Data-Driven Engineering Renewable Energy Challenges Promise of Big Data Analytics for Renewable Energy Science: what is possible Engineering: turn science into an everyday commodity (cheap, safe, reliable, resilient, …) CFES 2012-2013 Annual Conference January 25, 2013 bigDAARE Data Mining as a Structured Process data prospecting and surveying database select selected data preprocess & transform transformed data make model Interpretation& rule formulation bigDAARE Early Motivating Applications for Data Mining • • • Database Marketing Algo-Trading Market Basket Analysis www.information-drivers.com/market_basket_analysis.php bigDAARE How is data mining different? • • • • • • Different from what? (statistics) Data can be unstructured (e.g., text) Data can come from different sources and have conflicts/missing data/outliers Usually a data information step is required Data pre-processing requires most of the time Data might not fit in memory Data mining is the process of the automated discovery of new, interesting, potential useful information from large amounts of data bigDAARE Data Mining Pyramid Wisdom Understanding Knowledge Information Data Information: reduction and structuring of data without loss of domain-specific knowledge bigDAARE Data Mining Challenges • Large/huge/humongous data sets - Data sets can be rich in the number of data - Data sets can be rich in the number of attributes - Unlabeled data (data labeling might be expensive) - Data quality an data uncertainty • Data preprocessing and feature definition for structuring data - Data representation - Attribute/Feature selection - Transforms and scaling • Scientific data mining - Classification, multiple classes, regression - Continuous and binary attributes - Large datasets - Nonlinear Problems • Erroneous data, outliers, novelty, and rare events - Erroneous & conflicting data - Outliers - Rare events - Novelty detection • Smart visualization techniques • Feature selection & Rule formulation • Special outcomes: Associations (e.g., NETFLIX), nuggets, enrichment • Recent challenges: causality, active learning, multi-classes, big data bigDAARE Docking ligands is a nonlinear problem: Example crowdsourcing for anthrax drug bigDAARE Data-Driven Science Data-Driven Engineering • • Example of data-driven science: Cholera epidemic root causes Examples of data-driven engineering - Denoising of windmill sensor data - Sensors for the early detection of levee failure - Sensing network instability - Load forecasting - Forecasting reactive power - Histology and electrophysiology for early failure detection of brain-implant devices Science: what is possible Engineering: turn science into an everyday commodity (cheap, safe, reliable, resilient, …) bigDAARE Design of Capacitor Batteries with QSPR bigDAARE Data Mining Big Data Analytics © NYT May 5, 2012 Mystery of Big Data’s Parallel Universe Brings Fear, and a Thrill bigDAARE A brief history of Big Data http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/ • 4-19-2010 - Danah Boyd, “Privacy and Publicity in the context of Big Data.” Keynote WWW 2010 http://www.danah.org/papers/talks/2010/WWW2010.html • May 2011 - James Manyika et al. “Big Data: The next frontier for innovation, competition, and productivity.” (McKinsey Global Institute Report). http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation • 6-6-2011 - A very short history of Big Data. • March 27, 2012 – WIRED Magazine: Webcast: Obama goes big on big data. http://www.wired.com/cloudline/2012/03/obama-big-data/ bigDAARE http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/ Big Data Analytics Pyramid Wisdom Understanding Knowledge Information Data Data Fusion/Sensor Processing/ Crowd Sourcing/Open Source bigDAARE How is Big Data Analytics Different? • • • • • • • • • Different from what? (data mining) Data is unstructured Data comes from different sources and has conflicts/missing data/outliers Usually a data fusion step is required Data are dynamic Often has a crowdsourcing component (e.g., Twitter) Often sensor processing steps are required (domain-specific) Because of the size of processed data things have to be done differently Involves high-performance computing and specialized algorithms Big data analytics is the process of the automated discovery of potentially actionable/auctionable knowledge from diverse large data sources, where some of these data sources often have a crowdsourcing aspect bigDAARE Orders of Magnitude of Data (source: Wikipedia) Typical for Big Data 1 petabyte = 1000 terabytes = 1000 x bigDAARE Capacity human memory Size internet archive 2004 …. All Climate data Entire Library of congress Google server farm 2004 Specific algorithms for big data analytics Modified algorithms • • • • Neural network infrastructure for regression/classification models Deep belief networks for ICA/feature detection Labeling of unlabeled data Semi-supervised learning and active learning New Algorithms • • • • Outlier detection Data cleansing with ICA and deep belief networks Data completion algorithms for partially filled data Data anonymization algorithms bigDAARE Denoising wind turbine sensor data with stacked auto-encoders Calculate ICA Components Make filter by identifying & removing noise ICA’s Original Data + Noise ICA Components Filter to Remove Noise Cleansed Data Do inverse with noise ICAs removed with rectangular pseudo-inverse bigDAARE Smart grid • • bigDAARE Can we find wasteful customers? Can we detect early indications for grid instability? http://emileglorieux.blogspot.de/2011/04/smart-grid-report-1.html Big Data and Renewable Energy Challenges • • • • • • • • • • • • • • Dynamic integration of renewable energy sources Net stability, reactive sources and control … predictive optimization Wind farm/turbine allocation and control Solar panel direction control Energy storage (e.g., capacitor batteries, flywheels, …) Impact of unusual weather patterns The weather is … unpredictable Energy pricing Environmental impact (noise, earthquakes, microclimates, …) Data anonymization and data privacy Identifying wasteful customers Improving energy efficiency … Load/wind/weather forecasting New energy source paradigms - cogeneration superheated steam from solar energy - windmill/flywheel combo units - smart transformers for PDAs, cell phones … - charging of gadgets from body-powered energy bigDAARE Thank you for your attention!!! [email protected] bigDAARE bigDAARE History of data mining in a nutshell Data mining is the process of automatically extracting valid, novel, potentially useful bigDAARE and ultimately comprehensible information from very large databases