Download Operational-Log Analysis for Big Data Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geographic information system wikipedia , lookup

Pattern recognition wikipedia , lookup

Neuroinformatics wikipedia , lookup

Theoretical computer science wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

Corecursion wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Operational-Log Analysis for Big Data Systems: Challenges and
Solutions
ABSTRACT
Big data systems (BDSs) are complex, consisting of multiple interacting hardware and software
components, such as distributed computing nodes, databases, and middleware. Any of these
components can fail. Finding the failures' root causes is extremely laborious. Analysis of BDSgenerated logs can speed up this process. The logs can also help improve testing processes,
detect security breaches, customize operational profiles, and aid with any other tasks requiring
runtime-data analysis. However, practical challenges hamper log analysis tools' adoption. The
logs emitted by a BDS can be thought of as big data themselves. When working with large logs,
practitioners face seven main issues: scarce storage, unscalable log analysis, inaccurate capture
and replay of logs, inadequate log-processing tools, incorrect log classification, a variety of log
formats, and inadequate privacy of sensitive data. Some practical solutions exist, but serious
challenges remain. This article is part of a special issue on Software Engineering for Big Data
Systems.
EXISTING SYSTEM
Essentially, BDSs designed to process big data usually emit big data (captured in logs)
themselves. Of course, not all BDSs generate large volumes of logs. Also, small systems might
generate big data. However, most BDS-emitted logs will exhibit at least one big data
characteristic. To leverage log data, developers need ways to effectively deliver, store, and
crunch large volumes of data. Each of these processes poses challenges. When analyzing large
logs for industrial projects at IBM and Ericsson.
Disadvantages of Existing System:
1. Scarce storage,
2. Unsalable log analysis,
3. Inaccurate capture and replay of logs,
4. Inadequate log-processing tools,
5. Incorrect log classification,
6. A variety of log formats, and
7. Inadequate privacy of sensitive data.
PROPOSED SYSTEM
In Proposed System, to pinpoint a problem’s root cause, analysts typically examine operational
data—logs and traces— generated by the BDS components. A log or trace is a sequence of
temporal events captured during a particular execution of a system. For example, a log can
contain software execution paths, events triggered during software execution, or user activities.
No clear distinction exists between logs and traces. Often, the term “log” represents how a
program is used (such as security logs), whereas “tracing” captures a program’s elements that are
invoked in a given execution of the system. Tracing is used for debugging and program
understanding. In this article, we primarily use the term “log.”
Advantages of Proposed System:
1. We provide some solutions for few challenges.
SYSTEM REQUIREMENTS
Hardware Requirements:

Processor
-
Pentium –IV

Speed
-
1.1 Ghz

Ram
-
256 Mb

Hard Disk
-
20 Gb

Key Board
-
Standard Windows Keyboard

Mouse
-
Two or Three Button Mouse

Monitor
-
SVGA
Software Requirements:

Operating System
:
Windows XP

Coding Language
:
Java