Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Big data, smart grids, and smart meters Ke Wang Simon Fraser University www.cs.sfu.ca/~wangk Smart meters • From 1 read a month to smart meter reading every 5 minutes. • Better understanding of customer segmentation, behavior and how pricing influences usage – time-of-use pricing saves money and reduce energy generation. • Improve efficiency of electrical generation and scheduling Power signatures of three different residential appliance categories • Kettle and light bulb are mostly resistive (阻性负载) • Motors (e.g., fans, heaters) are inductive (电感负载) • Devices containing a power supply (e.g., laptops) are capacitive (电容性负载) • The problem: breakdown the total power demand p(t) measured by a smart meter into various components pi(t) that are attributed to specific appliances i: P(t)=P1(t)+P2(t)+….+Pn(t). From big data to big value • Increased profitability, reduced carbon footprint, increased safety, enhanced regulatory interaction and improved customer satisfaction. A wide range of forecasts using smart meter data • When and where equipment downtime and power failures are most likely to occur • Which customers are most likely to feed energy back to the grid, and under what circumstances • Which customers are most likely to respond to energy conservation and demand reduction incentives • How much excess energy will be available, when to sell it KDD Process Social Media Mining Data Measures Mining and Essentials Metrics 88 Data Mining The process of discovering hidden patterns in large data sets It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems • Extracting or “mining” knowledge from large amounts of data, or big data • Data-driven discovery and modeling of hidden patterns in big data • Extracting implicit, previously unknown, unexpected, and potentially useful information/knowledge from data Social Media Mining Data Measures Mining and Essentials Metrics 99 Supervised Learning – Classification/prediction F(x): true class function (usually not known) Input: D, training example (x,F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 Output: G(x), a class model learned from D 71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 1 0 0 1 ? Goal: minimize error E[(F(x)-G(x))2] for future examples x drawn from same distribution as in D. Classification Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Learning algorithm Induction Learn Model Model 10 Training Set Tid Attrib1 Attrib2 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Attrib3 Apply Model Class Deduction Linear Regression In linear regression, the class attribute y is a continuous label. We use a linear function to model the relation between y and feature set x: where w represents the vector of regression coefficients • We search for the w and e using the provided dataset and the labels y – The least squares is often used to solve the problem Social Media Mining Data Measures Mining and Essentials Metrics 12 12 Unsupervised Learning - clustering • Clustering is a form of unsupervised learning – Unlike supervised learning, examples do not labeled classes, i.e., unlabeled data • The goal is to group together similar examples and group dissimilar examples apart – Need a similarity measure for a pair of examples Social Media Mining Data Measures Mining and Essentials Metrics 13 13 Four potential values of data analytics 1. Managing smart meter data 2. Monitoring the distribution grid 3. Optimizing unit commitment 4. Forecasting and scheduling loads 1. Managing smart meter data • Challenges – Data storage costs can explode due to increased data volumes – Deal with corrupted and noisy data – Report generation and analytics can be slow – Ensure privacy 2. Monitoring the distribution grid • Identify abnormal conditions and take action to prevent power delivery disruptions and optimize overall grid reliability. • Challenges: – Real time monitoring involves large volumes of high-velocity data. – Correlations between network events and network failures – Pinpoint fault locations and identify solutions 3. Optimizing unit commitment • Optimize the scheduling of generation assets. – Wind and solar energy sources are heavily weatherdependent and intermittent, requiring analysis of large weather data sets to forecast output • Challenges – Predict which units need to be operational to meet but not exceed demand. – Optimize its energy source mix and avoid both unanticipated excess capacity and costly market purchasing. 4. Forecasting and scheduling loads • Accurate demand forecasting is essential to energy planning and trading. • Challenges – understand which parameters—weather, day of the week or month, holidays, prior usage, price incentives and others—actually drive demand. – large volumes of historical information must be analyzed and correlations identified Case Studies at BC Hydro • Load curve data cleansing • Identify contaminated equipment • Outage prediction Case Study 1: Load curve data cleansing • Power consumption data collected at high frequency, the heartbeat of power utilities and valuable assets for smart meter applications • Often missing, corrupted, and noisy due to transmission error, device faulty, random events, unknown factors, etc. • Corrupted data do not repeat in future, thus, do not represent “patterns” and should be repaired. Smoothing curve techniques References • [1] Jiyi Chen, Wenyuan Li, Adriel Lau, Jiguo Cao and Ke Wang. Automated Load Curve Data Cleansing in Power Systems. IEEE Transactions on Smart Grid, September 2010, Vol. 1, No. 2, pp 213-221. • [2] Zhihui Guo, Wenyuan Li, Adriel Lau, Tito IngaRojas, and Ke Wang. Detecting X-Outliers in Load Curve Data in Power Systems. IEEE Transactions on Power Systems, Vol. 27, No. 2, May 2012, 875884. Case Study 2: Identify contaminated transformers • OLYCHLORINATED biphenyal (PCB) based dielectric fluids were used for heat insulation in old transformers. • PCB was later known as harmful to humans and environments • The problem: identify PCB contaminated transformers and replace their oil. • Oil sampling is very expensive – Hermetically sealed bushing structure without drainage valve. – Shut down power transmission of affected areas Solution 1 • Given a set of transformers, sample a few transformers and build a classifier to predict the PCB status of remaining transformers. – Active learning: interactively request sampling carefully chosen transformers. • Must minimize the sum of false positive cost (i.e., sampling cost) and false negative cost (i.e., leaving contaminated objects unidentified). Sealed bushing structure without drainage valve …….. Active learning • Step 1: Get initial labeled data L and unlabeled data U • Step 2: build a classifier M using L • Step 3: If M is satisfied, done • Step 4: Choose most uncertain examples from U, label them, and update L and U, go to Step 2. Solution 1 • Drawback: it is hard to specify the false positive cost and false negative cost. Solution 2 • Clearance threshold – Instead of exact cost, specify a maximum allowed probability that a transformer is PCB when it is cleared as non-PCB by a method – E.g., at most 1 hazard in 100 transformers. • Given a collection of transformers with known PCB status, and a clearance threshold t, we want to clear many transformers by dividing them into groups. – A group of transformers is cleared if any random case chosen from the group has less than t probability of having PCB. Recursive grouping Does (N=0,n=1) and (N=100,n=10) give the same estimation? References • [3] Yin Chu Yeh, Wenyuan Li, Adriel Lau, and Ke Wang. Identifying PCB Contaminated Transformers through Active Learning. IEEE Transactions on Power Systems, Vol. 28, No. 4, Nov. 2013, 3999-4006 • [4] Ryan McBride, Ke Wang, Wenyuan Li. Classification by CUT: Clearance Under Threshold. IEEE ICDM 2014 conference, December 2014. Case Study 3: Outage Prediction • Every winter, greater Vancouver area will experience some power outages caused by storms. • Outage Prediction – Build software to predict if feeders/equipment are at a high-risk of outages during a forecasted storm, by • Accessing historical data about outages and feeders. • Accessing relevant external data (e.g., weather conditions during previous outages). 35 General Approach Use BC Hydro data and External data to derive records of tree/storm related outages: BC Hydro Data External Data Entity Length Result Vegetation Wind Speed In Storm Feeder F1 5 km Outage 0.2 NDVI 25 m/s N Yes Feeder F2 15 km NonOutage 0.4 NDVI 20 m/s W Yes Feeder F2 24 km NonOutage 0.5 NDVI 10 m/s E No • Then build a model to examine a new case such as: Entity Length Result Vegetation Wind Speed In Storm Feeder F3 4 km ? 0.4 NDVI 50 m/s N Yes • to infer the chance of an outage in ? to better distribute resources, mitigate risk. 36 Conclusion • Many opportunities for big data analytics to help in power and energy industries. • The keys: historical data, “know how” in domain applications, and techniques in data analytics. • Collaboration between data scientists and engineers in power industries is crucial. Acknowledgements • 3 CRD grants supported by NSERC and BC Hydro, Canada. • Collaboration between BC Hydro and Simon Fraser University from 2009 to present. • 3 MSc student graduated, and 1 PhD student. Employed by Shanghai Stock Exchange, Microsoft (Seattle), and Google. • Publications in IEEE Transactions on Smart Grids and IEEE Transactions on Power Systems, International Conference on Data Mining, and several other papers. • Software being used by BC Hydro.