Download Clients are looking for new ways to combat cyber crime... especially since 100K new malware items are introduced every week.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
IBM Cloud Architecture Center
Cyber threat intelligence
Clients are looking for new ways to combat cyber crime in real time,
especially since 100K new malware items are introduced every week.
Signature learning methods are not enough and a scalable, machine
learning architecture is required to analyze data in real time and in batch
to detect advanced persistent threats.
This solution is based on the Data and Analytics Reference Architecture.
IBM Cloud Architecture Center
1
Runtime flow
1. Data is collected both from internal sources such as
network probes, DNS, NetFlow, AD logs, and network
logs and from external sources such as blacklist/
whitelist providers.
5. Machine learning models that have been developed in
SPSS are deployed in InfoSphere Streams, which
scores them in real time to analyze network, user, and
traffic behavior.
2. Most structured sources of data are sent first to the
Security Information and Event Manager (SIEM),
which acts as an integration layer and converts all
incoming data into a single format.
6. Custom blacklists from the client and other data
sources such as AD logs are used for lookup to enrich
and pinpoint user activity. IBM I2® Analyst’s
Notebook® Big Sheets interface supplied by
BigInsights is used for visualization by security
analysts.
3. InfoSphere® Streams picks up both the streaming
flow data (such as DNS and NetFlow) and the
processed data from the SIEM system. It computes
simple analytics (such as traffic in/out per server and
number of requests made/failed to a DNS domain),
which are used in developing machine learning
models.
7. User lookup information is ingested from the
enterprise Active Directory to establish exactly which
user was involved in a particular traffic flow.
4. All raw data and output from Streams is sent to the
data repository (IBM BigInsights®). Machine learning
models are run against longer data sets to detect
advanced persistent threats. Additional models from
the stats language R are deployed in BigInsights.
IBM Cloud Architecture Center
2
Components
COMPONENT
DEFINITION
PRODUCT
Data sources
Includes different information sources that
may contain data of interest for cyber
security such as NetFlow, DNS, network logs,
and more.
Streaming computing
Includes stream processing systems that
ingest and process large volumes of highly
dynamic, time-sensitive continuous data
streams.
InfoSphere Streams
Data integration
Copies and correlates information from
disparate sources to produce meaningful
associations related to primary business
dimensions.
InfoSphere Information Server
Data repositories
Organizes the data stored in the cloud
environment into repositories supplied by the IBM BigInsights for Apache Hadoop
cloud provider.
Enterprise user directory
Provides storage for and access to user
information to support authentication,
authorization, or profile data.
Actionable insight
Includes SaaS or on-premises applications
which are used to derive information from
the data in a convincing and understandable IBM SPSS family
manner upon which an organization can take
an action.
Enterprise data
Includes metadata about the data and
IBM DB2®
systems of record for enterprise applications.
This is one product mapping for this scenario. For other applicable products, see the Data and Analytics Reference Architecture and the
other scenarios that are based on it.
3
IBM Cloud Architecture Center
Business drivers
01
02
03
Improve situational awareness of
network security from both
external and internal sources.
Introduce a new service in the
marketplace which capitalizes on
leveraging the network backbone
provided by the telecom provider
and analytics to provide per-client
and per-industry threat analysis.
Monetize their investment in a
cyber threat intelligence solution
by delivering it as a managed
service to their enterprise clients.
Requirements
Functional requirements
•
•
•
•
•
The solution must support both structured and
unstructured data sources.
The solution must be able to process real-time flow
data such as NetFlow, DNS, and IP Flow.
The vendor should have pre-defined parses to parse
network flow traffic for the data sources mentioned
above.
The solution must be able to provide real-time threat
indicators without relying on signature evaluation of
malware.
The solution must have analytics capabilities that can
help users identify, correlate, and dynamically exploit
emerging trends in data sets and data flows.
Non-functional requirements
Elasticity
• Provide ability to connect with multiple data sources
(such as DNS, firewall, NetFlow) within the network.
Scalability
• Ability to scale to processing substantially large volumes
of data (from 5 TB/day to 20 TB/hour).
Security
• Provide support for segmenting access to data to users
with different levels of privilege and mission scope.
For more information and for the latest solutions and other assets, visit developer.ibm.com/architecture
IBM Cloud Architecture Center
© Copyright IBM Corporation 2016
4