Download DAT204: Real-time Data Management with Microsoft StreamInsight

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia, lookup

Extensible Storage Engine wikipedia, lookup

Relational model wikipedia, lookup

Clusterpoint wikipedia, lookup

Object-relational impedance mismatch wikipedia, lookup

Database model wikipedia, lookup

Transcript
The Value of Timely Analytics
$ value of analytics
Web Analytics – Ad placement,
Financial Services, Smart Grids,
Monitoring – Systems mgmt, Health Care,
Manufacturing, etc.
Forecasting in Enterprises
Historical Trend Analysis
years
months
days
Time of interest
hrs
min
sec
Present
Current Products for Analytics
Load barrier is dictated by current choices of
the solution, e.g., loading into databases,
persisting into files. This is intrinsic because
in current approaches no processing can be
done till the data is loaded.
Facts/sec.
Custom-built solutions that carry huge
development and customization costs
100000
10000
Active DW analytics
1000
Traditional DW Analytics
years
months
100
days
hrs
min
sec
Time of interest
Present
ET time in ETL
Load time in ETL
Operational Intelligence Platform
Sources
Data Bus
Caching
Processing
Distribution
Visualization
Refresh (Push)
Operational Analytics
Devices, Sensors
Reference
Data
Microsoft
StreamInsight
Automated Decisions
Message Bus
Cache
Operational Dashboard
(Ticking - Snapshot)
Refresh
(Push)
Web servers
Reporting Dashboard
(Refreshed)
In-memory Database
ETL
Re-compute
(Pull)
Static Reports
Intra-Day Cubes
Stock tickers &
News feeds
Service Broker
ETL
Historic
Cubes
Mining, Validation,
“What-If” Scenarios
The Need for an Event-Driven Platform
Analytical results need to reflect important changes in business reality
immediately and enable responses to them with minimal latency
Database Applications
Event-driven Applications
Query Paradigm
Ad-hoc queries or requests
Continuous standing queries
Latency
Seconds, hours, days
Milliseconds or less
Data Rate
Hundreds of events/sec
Tens of thousands of events/sec or
more
Query Semantics
Declarative relational analytics
Declarative relational and temporal
analytics
request
response
Event
input
stream
output
stream
Scenarios for Event-Driven Applications
Latency
Months
CEP Target Scenarios
Days
Relational Database Applications
hours
Operational Analytics Applications,
e.g., Logistics, etc.
Data Warehousing
Applications
Web Analytics Applications
Minutes
Seconds
100 ms
Monitoring
Applications
Manufacturing Applications
< 1ms
0
10
100
1000
10000
Financial trading
Applications
100000
~1million
Aggregate Data Rate (Events/sec.)
6
Overview: Microsoft StreamInsight
.NET
C#
LINQ
Application
Development
Event sources
Devices, Sensors
Event
StreamInsight Engine
Pagers & Monitoring devices
Standing Queries
Event
Event
Event
Event
Event
Event
Event
Output Adapters
Input Adapters
Web servers
Event targets
Application at Runtime
`
KPI Dashboards,
SharePoint UI
Event
Trading stations
Event stores & Databases
C_ID
C_NAME
C_ZIP
Stock tickers & News feeds
Event stores & Databases
Static reference data
7
Virtuous Cycle: Monitor, Manage, Mine
CEP advantage
Industry trends
• Data acquisition costs are
negligible
• Raw storage costs are
small and continue to
decrease
Monitor
KPIs
Record raw
data (history)
Manage
business via
KPI-triggered
actions
• Processing costs are
non-negligible
• Process data
incrementally, i.e., while
it is in flight
• Avoid loading while still
doing the processing
you want
• Seamless querying for
monitoring, managing
and mining
• Data loading costs
continue to be significant
Mine historical data
Devise new KPIs
8
Example Scenarios
Manufacturing:
• Sensor on plant floor
• React through device
controllers
• Aggregated data
• 10,000 events/sec
Web Analytics:
• Click-stream data
• Online customer behavior
• Page layout
• 100,000 events /sec
Financial Services:
• Stock & news feeds
• Algorithmic trading
• Patterns over time
• Super-low latency
• 100,000 events /sec
Power, Utilities:
• Energy consumption
• Outages
• Smart grids
• 100,000 events/sec
Visual trend-line and KPI monitoring
Batch & product management
Automated anomaly detection
Real-time customer segmentation
Algorithmic trading
Proactive condition-based maintenance
Asset Specs &
Parameters
Stream Data Store &
Archive
Data Stream
Data Stream
Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds
Event Processing Engine
Lookup
• Threshold queries
• Event correlation from multiple
sources
• Pattern queries
9
Power Utilities
Scenario: Smart grid
Instrument households with smart power meters
Continuous, up-to-date insight into your grid, including generation,
distribution, and demand
StreamInsight advantage
Scales to smart grids requirements
Scale to millions of meters
Hundreds of thousands of meter readings per second
Write validation, editing, estimation (VEE) rules declaratively in LINQ
Scale to the high data volumes expected in smart grids
React in almost real-time to changing grid conditions to avoid power
outages
Financial Services
Scenario: Real-time Risk
Continuous insight into market conditions and exposure
Continuous low-latency market monitoring
Manage risks across traders and per desk with aggregate and
individual thresholds
StreamInsight advantage:
Implement risk monitoring declaratively in LINQ
Detect and notify in near real-time on risk
No change to models or LINQ code necessary for back-testing over
historical data
Web Analytics
Scenario: Real-time Behavioral Targeting
Continuously analyze online behavior per user
Identify relevant content before the next click
Define content behind next click based on detected online
behavior
StreamInsight advantage:
Scale to millions of concurrent online users
Immediate insight - real time analytics
Web logs no longer processed offline in batches
Correlate across your web farms and applications
Retail (Online and Traditional)
Scenario: Real-Time Coupon
Provide most relevant/appealing coupon
Maximize expected individual customer revenue
Correlate current sales transaction with customer purchase history
StreamInsight advantage
Track current market basket as a real-time stream
Use StreamInsight lookup pattern to correlate current market basket
with purchase history
Easily scale to internet retail with millions of concurrent sessions
Event Types
StreamInsight events in use the .NET type system
Events are structured and can have multiple fields
Fields are typed using the .NET framework types
StreamInsight engine provisioned timestamp fields
capture all the different temporal event characteristics
Event sources populate time stamp fields
All calculations done based on “business time”
Timestamps/Met
adata
…
Long
pumpID
…
String
Type
String
Location
…
…
Double
flow
…
Double
pressure
…
Event Streams & Adapters
A stream is a sequence of events
Defined over a .NET type
Possibly infinite
Stream characteristics:
Event/data arrival patterns (steady, bursty)
Out of order events: Order of arrival of events does not match the order
of their application timestamps
Adapters
Receive/get events from the data source
Enqueue events for processing in the engine
Insertions of new events
Changes to event durations
15
StreamInsight Query Features
Operators over streams
Calculations (PROJECT)
Correlation of streams from different data sources (JOIN)
Check for absence of activity with a data source (EXISTS)
Selection of events from streams (FILTER)
Stream partitioning (GROUP & APPLY)
Aggregation (SUM, COUNT, …)
Ranking and heavy hitters (TOP-K)
Temporal operations: hopping window, sliding window
Extensibility – to add new domain-specific operators
LINQ Query Examples
LINQ Example – JOIN, PROJECT, FILTER:
from e1 in MyStream1
join e2 in MyStream2
on e1.ID equals e2.ID
where e1.f2 == “foo”
select new { e1.f1, e2.f4 };
Join
Filter
Project
LINQ Example – GROUP&APPLY, WINDOW:
from e3 in MyStream3
group e3 by e3.i into SubStream
from win in SubStream.HoppingWindow(
FiveMinutes,ThreeSeconds)
select new { i = SubStream.Key,
a = win.Avg(e => e.f) };
Grouping
Window
Project &
Aggregate
Extensibility SDK
Built-in operators do not cover all functionality
Need for domain-specific extensions
Integrate with functionality from existing libraries
Support for extensions in the CEP platform:
User-defined operators, functions, aggregates
Code written in .NET, deployed as .NET assembly
Query operators and LINQ can refer to functionality of the assembly
Temporal snap-shot operator framework
Interface to implement user-defined operators
Manages operator state and snapshot changes
Framework does the heavy lifting to deal with intricate temporal
behavior such as out-of-order events
StreamInsight Deployment Alternatives
Web servers
Data Sources
StreamInsight
Sensors
StreamInsight
Devices
Feeds
Event processing engines are deployed at multiple places
on different scales
• At the edge – close to the data source
• In the mid-tier – consolidate related data sources,
• In the data center – historical archive, mining, large scale
correlation.
Aggregation &
Correlation
StreamInsight
StreamInsight
StreamInsight
StreamInsight
StreamInsight
StreamInsight
Complex Analytics &
Mining
StreamInsight
CEP for lightweight processing and filtering
StreamInsight
CEP for aggregation and correlation of in-flight
events
StreamInsight
CEP for complex analytics including historical data
20
StreamInsight Deployment
Lightweight embedded engine
StreamInsight is available as a set of DLLs
StreamInsight can be included into your applications
Low footprint, small overhead
Facilitates deployments close to the data source
StreamInsight Windows service
Runs the engine as a Windows service
Applications can share incoming streams
Well-suited for more centralized deployments
Installation
Small, lightweight MSI
Installs in 2 minutes
SQL Server 2008 R2 Capabilities by Edition
Parallel Data
Warehouse
Workload
Standard
Enterprise
Datacenter
Custom/Packaged OLTP Apps
4 procs,
64GB RAM,
Backup Compression
8 procs,
2TB RAM,
Adv. Security,
Backup Compression
>8 procs,
OS Max,
Adv. Security,
Backup Compression
N/A
1 VM/license
4 VMs/license,
Resource Governor
App & Multi-Server Mgmt
(up to 25 instances)
Unlimited Virtualization, Resource
Governor, App & Multi-Server Mgmt
(> 25 instances)
N/A
Scale-Up DW,
Data Compression
Scale-Up DW,
Data Compression
Scale-Out DW
10s of TBs, Up to 30 TB
with FastTrack
10s of TBs
10s - 100s of TBs
Enterprise-Scale BI, Master Data
Services, PowerPivot Mgmt
>5000 events/sec &
< 5 s latency
Server Consolidation
Data Warehousing
Business Intelligence
Dept/Team BI
Enterprise-Scale BI,
Master Data Services, PowerPivot
Mgmt
Complex Event Processing
(StreamInsight)
<5000 events/sec &
> 5 sec latency
<5000 events/sec &
> 5 s latency
Integrated with SSIS,
SSAS and SSRS
Future coverage
StreamInsight Solutions
Scenarios:
Manufacturing
Utilities
Oil & Gas
Financial
Services
Web Analytics
Telco
Alarming
AMI/SmartGrid
Well Monitoring
Risk Management
Behavioral
Targeting
CDR
Aggregation
Notifications
Outage
Management
Operational
Intelligence
Market Monitoring
OSIsoft
Matrikon
Telvent
ICONICS
OSIsoft
Matrikon
Lab49
MSFT AdCenter
XBox
DPE
Hitachi
Consulting
Lab49
MSFT AdCenter
XBox
DPE
Load Monitoring
Real-Time Analysis
ISV:
SI:
OSIsoft
Matrikon
ICONICS
Recap: Microsoft StreamInsight
Development experience with .NET, C#, LINQ
and Visual Studio 2008
CEP Application
Development
CEP platform from Microsoft to build event-driven
applications
Event sources
CEP Engine
Event
`
C_ID
C_NAME
Output Adapters
Input Adapters
Standing Queries
Event-driven applications are fundamentally
Event
Event
different from traditional database
Event
Event
applications:
queries are
continuous,
Event
consume and produce streams, and Event
compute results
incrementally
Event
Event
Flexible adapter SDK with high
performance to connect to different
event sources and sinks
Event targets
C_ZIP
The CEP platform does the heavy lifting
for you to deal with temporal
characteristics of event stream data
Static reference data
24
For More Information
StreamInsight main page & download :
http://www.microsoft.com/sqlserver/2008/en/us/R2-complexevent.aspx
StreamInsight blog: http://blogs.msdn.com/streaminsight/
StreamInsight MSDN documentation:
http://msdn.microsoft.com/enus/library/ee362541(SQL.105).aspx
StreamInsight E-clinics on Microsoft e-learning
https://www.microsoftelearning.com/eLearning
ASI07-INT | Real-Time Event Integration with Microsoft SQL Server 2008 R2
StreamInsight and Microsoft BizTalk Server
BIE202 | Data Integration at Microsoft: Technologies and Solution Patterns
BIP302 | Enabling Real-time Business Insight, Analytics and Reporting
DAT23-HOL | Querying Events in Microsoft SQL Server 2008 R2 StreamInsight Using
LINQ
DAT20-HOL | Working with the Microsoft SQL Server 2008 R2 StreamInsight Event
Flow Debugger
www.microsoft.com/teched
www.microsoft.com/learning
http://microsoft.com/technet
http://microsoft.com/msdn
Sign up for Tech·Ed 2011 and save $500
starting June 8 – June 31st
http://northamerica.msteched.com/registration
You can also register at the
North America 2011 kiosk located at registration
Join us in Atlanta next year