Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IBM Cloud Architecture Center Cyber threat intelligence Clients are looking for new ways to combat cyber crime in real time, especially since 100K new malware items are introduced every week. Signature learning methods are not enough and a scalable, machine learning architecture is required to analyze data in real time and in batch to detect advanced persistent threats. This solution is based on the Data and Analytics Reference Architecture. IBM Cloud Architecture Center 1 Runtime flow 1. Data is collected both from internal sources such as network probes, DNS, NetFlow, AD logs, and network logs and from external sources such as blacklist/ whitelist providers. 5. Machine learning models that have been developed in SPSS are deployed in InfoSphere Streams, which scores them in real time to analyze network, user, and traffic behavior. 2. Most structured sources of data are sent first to the Security Information and Event Manager (SIEM), which acts as an integration layer and converts all incoming data into a single format. 6. Custom blacklists from the client and other data sources such as AD logs are used for lookup to enrich and pinpoint user activity. IBM I2® Analyst’s Notebook® Big Sheets interface supplied by BigInsights is used for visualization by security analysts. 3. InfoSphere® Streams picks up both the streaming flow data (such as DNS and NetFlow) and the processed data from the SIEM system. It computes simple analytics (such as traffic in/out per server and number of requests made/failed to a DNS domain), which are used in developing machine learning models. 7. User lookup information is ingested from the enterprise Active Directory to establish exactly which user was involved in a particular traffic flow. 4. All raw data and output from Streams is sent to the data repository (IBM BigInsights®). Machine learning models are run against longer data sets to detect advanced persistent threats. Additional models from the stats language R are deployed in BigInsights. IBM Cloud Architecture Center 2 Components COMPONENT DEFINITION PRODUCT Data sources Includes different information sources that may contain data of interest for cyber security such as NetFlow, DNS, network logs, and more. Streaming computing Includes stream processing systems that ingest and process large volumes of highly dynamic, time-sensitive continuous data streams. InfoSphere Streams Data integration Copies and correlates information from disparate sources to produce meaningful associations related to primary business dimensions. InfoSphere Information Server Data repositories Organizes the data stored in the cloud environment into repositories supplied by the IBM BigInsights for Apache Hadoop cloud provider. Enterprise user directory Provides storage for and access to user information to support authentication, authorization, or profile data. Actionable insight Includes SaaS or on-premises applications which are used to derive information from the data in a convincing and understandable IBM SPSS family manner upon which an organization can take an action. Enterprise data Includes metadata about the data and IBM DB2® systems of record for enterprise applications. This is one product mapping for this scenario. For other applicable products, see the Data and Analytics Reference Architecture and the other scenarios that are based on it. 3 IBM Cloud Architecture Center Business drivers 01 02 03 Improve situational awareness of network security from both external and internal sources. Introduce a new service in the marketplace which capitalizes on leveraging the network backbone provided by the telecom provider and analytics to provide per-client and per-industry threat analysis. Monetize their investment in a cyber threat intelligence solution by delivering it as a managed service to their enterprise clients. Requirements Functional requirements • • • • • The solution must support both structured and unstructured data sources. The solution must be able to process real-time flow data such as NetFlow, DNS, and IP Flow. The vendor should have pre-defined parses to parse network flow traffic for the data sources mentioned above. The solution must be able to provide real-time threat indicators without relying on signature evaluation of malware. The solution must have analytics capabilities that can help users identify, correlate, and dynamically exploit emerging trends in data sets and data flows. Non-functional requirements Elasticity • Provide ability to connect with multiple data sources (such as DNS, firewall, NetFlow) within the network. Scalability • Ability to scale to processing substantially large volumes of data (from 5 TB/day to 20 TB/hour). Security • Provide support for segmenting access to data to users with different levels of privilege and mission scope. For more information and for the latest solutions and other assets, visit developer.ibm.com/architecture IBM Cloud Architecture Center © Copyright IBM Corporation 2016 4