Download Data Mining → Big Data Analytics • Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
bigDAARE: Big Data Analytics for Renewable Energy
Mark J. Embrechts
Dept. Industrial and Systems Engineering
Rensselaer Polytechnic Institute, Troy, NY, USA
•
•
•
•
•
What is Data Mining?
Data Mining  Big Data Analytics
Data-Driven Science  Data-Driven Engineering
Renewable Energy Challenges
Promise of Big Data Analytics for Renewable Energy
Science: what is possible
Engineering: turn science into an everyday commodity
(cheap, safe, reliable, resilient, …)
CFES 2012-2013 Annual Conference
January 25, 2013
bigDAARE
Data Mining as a Structured Process
data prospecting
and surveying
database
select
selected
data
preprocess
& transform
transformed
data
make model
Interpretation&
rule formulation
bigDAARE
Early Motivating Applications for Data Mining
•
•
•
Database Marketing
Algo-Trading
Market Basket Analysis
www.information-drivers.com/market_basket_analysis.php
bigDAARE
How is data mining different?
•
•
•
•
•
•
Different from what? (statistics)
Data can be unstructured (e.g., text)
Data can come from different sources and have conflicts/missing data/outliers
Usually a data  information step is required
Data pre-processing requires most of the time
Data might not fit in memory
Data mining is the process of the automated discovery of new, interesting,
potential useful information from large amounts of data
bigDAARE
Data Mining Pyramid
Wisdom
Understanding
Knowledge
Information
Data
Information: reduction and structuring of data without loss of domain-specific knowledge
bigDAARE
Data Mining Challenges
• Large/huge/humongous data sets
- Data sets can be rich in the number of data
- Data sets can be rich in the number of attributes
- Unlabeled data (data labeling might be expensive)
- Data quality an data uncertainty
• Data preprocessing and feature definition for structuring data
- Data representation
- Attribute/Feature selection
- Transforms and scaling
• Scientific data mining
- Classification, multiple classes, regression
- Continuous and binary attributes
- Large datasets
- Nonlinear Problems
• Erroneous data, outliers, novelty, and rare events
- Erroneous & conflicting data
- Outliers
- Rare events
- Novelty detection
• Smart visualization techniques
• Feature selection & Rule formulation
• Special outcomes: Associations (e.g., NETFLIX), nuggets, enrichment
• Recent challenges: causality, active learning, multi-classes, big data
bigDAARE
Docking ligands is a nonlinear problem: Example crowdsourcing for anthrax drug
bigDAARE
Data-Driven Science  Data-Driven Engineering
•
•
Example of data-driven science: Cholera epidemic root causes
Examples of data-driven engineering
- Denoising of windmill sensor data
- Sensors for the early detection of levee failure
- Sensing network instability
- Load forecasting
- Forecasting reactive power
- Histology and electrophysiology for early failure detection of brain-implant devices
Science: what is possible
Engineering: turn science into an everyday commodity
(cheap, safe, reliable, resilient, …)
bigDAARE
Design of Capacitor Batteries with QSPR
bigDAARE
Data Mining  Big Data Analytics
© NYT May 5, 2012
Mystery of Big Data’s Parallel Universe Brings Fear, and a Thrill
bigDAARE
A brief history of Big Data
http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/
•
4-19-2010 - Danah Boyd, “Privacy and Publicity in the context of Big Data.” Keynote WWW 2010
http://www.danah.org/papers/talks/2010/WWW2010.html
•
May 2011 - James Manyika et al. “Big Data: The next frontier for innovation, competition, and
productivity.” (McKinsey Global Institute Report).
http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation
•
6-6-2011 - A very short history of Big Data.
•
March 27, 2012 – WIRED Magazine: Webcast: Obama goes big on big data.
http://www.wired.com/cloudline/2012/03/obama-big-data/
bigDAARE
http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/
Big Data Analytics Pyramid
Wisdom
Understanding
Knowledge
Information
Data
Data Fusion/Sensor Processing/
Crowd Sourcing/Open Source
bigDAARE
How is Big Data Analytics Different?
•
•
•
•
•
•
•
•
•
Different from what? (data mining)
Data is unstructured
Data comes from different sources and has conflicts/missing data/outliers
Usually a data fusion step is required
Data are dynamic
Often has a crowdsourcing component (e.g., Twitter)
Often sensor processing steps are required (domain-specific)
Because of the size of processed data things have to be done differently
Involves high-performance computing and specialized algorithms
Big data analytics is the process of the automated discovery
of potentially actionable/auctionable knowledge from diverse large data sources,
where some of these data sources often have a crowdsourcing aspect
bigDAARE
Orders of Magnitude of Data (source: Wikipedia)
Typical for Big Data
1 petabyte = 1000 terabytes = 1000 x
bigDAARE
Capacity human memory
Size internet archive 2004
….
All Climate data
Entire Library of congress
Google server farm 2004
Specific algorithms for big data analytics
Modified algorithms
•
•
•
•
Neural network infrastructure for regression/classification models
Deep belief networks for ICA/feature detection
Labeling of unlabeled data
Semi-supervised learning and active learning
New Algorithms
•
•
•
•
Outlier detection
Data cleansing with ICA and deep belief networks
Data completion algorithms for partially filled data
Data anonymization algorithms
bigDAARE
Denoising wind turbine sensor data with stacked auto-encoders
Calculate ICA
Components
Make filter by identifying & removing noise ICA’s
Original Data
+
Noise
ICA Components
Filter to
Remove Noise
Cleansed
Data
Do inverse with noise ICAs removed
with rectangular pseudo-inverse
bigDAARE
Smart grid
•
•
bigDAARE
Can we find wasteful customers?
Can we detect early indications for grid instability?
http://emileglorieux.blogspot.de/2011/04/smart-grid-report-1.html
Big Data and Renewable Energy Challenges
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Dynamic integration of renewable energy sources
Net stability, reactive sources and control … predictive optimization
Wind farm/turbine allocation and control
Solar panel direction control
Energy storage (e.g., capacitor batteries, flywheels, …)
Impact of unusual weather patterns
The weather is … unpredictable
Energy pricing
Environmental impact (noise, earthquakes, microclimates, …)
Data anonymization and data privacy
Identifying wasteful customers
Improving energy efficiency …
Load/wind/weather forecasting
New energy source paradigms
- cogeneration superheated steam from solar energy
- windmill/flywheel combo units
- smart transformers for PDAs, cell phones …
- charging of gadgets from body-powered energy
bigDAARE
Thank you for your attention!!!
[email protected]
bigDAARE
bigDAARE
History of data mining
in a nutshell
Data mining is the process of automatically extracting valid, novel, potentially useful
bigDAARE
and ultimately comprehensible information from very large databases
Related documents