Download big data a prolific use of information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Ojulari Moshood
Cameron University - IT4444 Capstone
2013
"BIG DATA A PROLIFIC USE OF INFORMATION"
Abstract: The idea of big data is to better use the information generated by individual to
remake and improve our businesses, Security, health care, and economy. While some people
consider big data as a meme and a marketing term that opens door to new approach to
understanding the world and decision making [7]. Big data present vastly new opportunities to
us. Giving individuals the right to their data enables data to become an asset that people own
which in turn gives them ability to trade for service or whatsoever [3]. This creates a new
environment of people who treat their data like how they treat their money. It also enables the
next generation of interactive data analysis with real time answers [6]. The goal of this paper is
to enlighten its readers about Big Data and its benefits.
Keywords: Big Data, Privacy, Security, Big Data Analytic, Data Mining, Data Warehousing,
Hadoop.
Introduction: Big data is data too big, too fast, or too complex for existing tools to capture,
manage, keep, and process. People often refer big data as data from social media or search
engines but that is not big data, the real big data are data like credit cards, photograph, mobile
phones, GPS logs, web-browsing trails, network data, sensor, email, and so on [1]. These are
things that we tend to neglect that show trends in people's behavior, which is one of the major
key in big data. Big data promises is to help better engineer the systems we have now in our
society to work more efficiently with the use information obtained from data analysis, Which
helps brings nonnegotiable facts into mix, enabling managers to base vital decisions on solid
information accumulated from a rich variety of sources and delivered in real time [3]. Big data is
faced with challenges due to the current technology not be able to handle the velocity, volume,
and variety of data and the algorithm for analyzing such massive amount of data and also there is
a lot of privacy concern as to who, whom would be using the information.
Section 2 discuss big data analytics and warehousing and gives examples of data analytics
can help a business maximize profit, In section 3, discuss big data mining and its importance,
section 4 discuss big data security, section 5 discuss big data & privacy issues that rises do to
individual information been shared by multiple sources around the world. Section 6 gives an
insight to the future of big data and how it will help better engineer a more informative driven
society.
Big Data Analytics & Warehousing: Big data analytic is a fast growing and
influential practice [5]. It analyzes various data types i.e. (video, gps tracking, web-browser
trails, sensors, email, social media) with the use of advance analytic techniques to process, clean,
and transform structured unstructured acquired data to unearth patterns, hidden correlations and
other fruitful information. Such information can be use in decision-making, improve business
performance, security, and so on. Big data analytics can be done using apache Hadoop, an open
source software framework that enables the distributed processing of large data sets across
clusters of commodity servers [9]. Instead of a single server to thousands of machines with a
very high degree of fault tolerance that does not relying on high-end hardware, rather the
resiliency of the clusters comes from the software’s ability to handle and detect failures at the
application layer [8]. Hadoop provides two basic services namely, MapReduce and Hadoop
Distributed File System (HDFS). MapReduce is a programming model use to simplify data
processing across large datasets, While HDFS is a distributed file system with high fault
tolerance and is designed to be deployed on low cost hardware, and it also provides highthroughput access to application for large data set [9]. Retail Company like Wal-Mart and kohl's
uses data analytic in sales, pricing, demo graph and weather data to tailor product selections at
particular stores to determine the timing of price markdowns [7].
Data warehousing is a database use for storing, reporting, and data analysis [11]. It stores current
as well as historical data, which are used for creating trending reports used in decision-making,
or for future prediction. It is usually a central repository of data, crafted by combining data form
one or more sources organized to facilitate management decision making. Data warehouses a
constructed through data cleaning, data integration, data transformation, data loading, and
periodic data refreshing [10].
Image source:http://whatsthebigdata.com/2012/12/06/the-future-of-big-data-infographic/
Big Data Mining: Is a sophisticated technique that analyzes large variety, and volumes of
data, for determining patterns and relations, using advanced statistical analysis, and modeling
techniques. The main objective is to find relation in patterns that can be leverage for improving
the business [11]. It helps unveil puzzling but useful associations and to better understand known
association [12]. For example, a retailer discovered that almost half of the customers who bought
cigarette on Friday also bought cologne—an association that led the retailer to display the twoproduct side-by-side or when it rains people tend to sign in to various social networks. This helps
retailers determine the most effective sales floor layout [11].
Big Data Security: Big data security analytics is simply a collection of security data sets
so large and complex that it becomes difficult (or impossible) to process using on-hand database
management tools or traditional security data processing applications [13]. It can help lower
cyber security risks through analyzing a large amount of behavioral data to distinguish between
legitimate or malicious activities through in-depth analysis of forensics and network traffic, fraud
detection report or logs, and customer data, to identify the known and unknowns to create or
better enhance our current security systems. Big Data analytic can also help detect DDoS attacks
by creating a MapReduce based detection algorithm in Hadoop, to simple count the total volume
of the number of web page requests from a client or by calculating the spending time and the
bytes count for each request of the URL and comparing the access sequence and spending time
among other clients trying to access the same server to detect if a clients is infected [14]. For
example, James used 6 IP addresses and five user IDs and 14 different accounts. With big data
security analytics techniques security experts will be able to make the most accurate security
decisions from the information extracted from real-time analysis of various data set or to create
an entire new set of security capabilities.
Big Data & Privacy: Protecting data privacy becomes harder as information is been
shared widely among different parties around the world, and protecting data stored in distributed
systems or data been shared is very important because there can be serious consequences if such
data is released without the data owners knowledge [16]. As more information regarding
individuals’ health, financials, location, and online activity spreads, concerns arise about
profiling, tracking, discrimination, exclusion, government surveillance and loss of control of
such information. Big data challenges some of the most fundamental concepts of privacy law,
including the definition of “personally identifiable information”, the role of individual control,
and the principles of data minimization and purpose limitation [15].
Future of Big Data: The future of big data is to potential re-engineer the marketplace,
security, schools, businesses, public health, and the society at large with its power of prediction,
which helps reveal obscured trend in structured and unstructured data set in real time which
helps business correlate with customer activity online with right promotions and marketing
campaigns, tracking sales transactions and also to help prediction system failure beforehand or to
help the government in areas like security, economic advancement, public school, census, and
much more to foster a better and more informative driving society. For more information, check
"http://datadrivendetroit.org/projects/". Challenges in big data will help foster or create new
career fields and technology such as data scientist, data analyst, new analytic platforms,
products, systems and many more. Putting Big Data to work, to drive innovation or to reform
current innovation process will not be trouble-free but if appropriate investment is made towards
it, I believe that Big Data will usher a new surge of technological advancement that will help
change the world.
Conclusion: As the world is becoming more information driven, Big Data will continue to grow at a
fast pace and will help engineer a new universe driven by data. Big Data holds potential for
making an enormous advancement in many scientific disciplines and expanding the profitability
and success of many small and big enterprises [6]. However, some companies are already using
some of the power of Big Data analytics in crucial decision-making to gain competitive
advantage over other competitors. The challenges of Big Data is not just the data scale, but it
also a diverse range of others, such as lack of structure, algorithm, privacy, provenance,
timeliness, and visualization. Big Data will help us uncover knowledge that no one has
discovered before. I encourage everyone to participate in this great journey to a world where data
means vivacity.
References
1. Liyakasa, Kelly. "Big Data Analytics Can Help Improve Information Security." CRM
Magazine 16.11 (2012): 11. Academic Search Premier. Web. 1 Feb. 2013.
2. Johnson, Jeanne E. "Big Data + Big Analytics = Big Opportunity." Financial
Executive 28.6 (2012): 50-53. Business Source Premier. Web. 1 Feb. 2013.
3. Huwe, Terence K. "Big Data, Big Future." Computers in Libraries 32.5 (2012): 20-
22. Academic Search Premier. Web. 1 Feb. 2013.
4. Ritchey, Diane. "Big Data, Big Security." Security: Solutions for Enterprise Security
Leaders 49.7 (2012): 28-30. Business Source Premier. Web. 1 Feb. 2013.
5. Russom, Philip. "Big Data Analytics." Tdwi.org. Tdwi Research, 2011. Web. 15 Feb. 2013.
6. "Challenges and Opportunities with Big Data." Http://www.cra.org. N.p., n.d. Web. 09 Feb.
2013. <http://www.cra.org/ccc/docs/init/bigdatawhitepaper.pdf>.
7. The Age of Big Data. Steve Lohr. New York Times, Feb 11, 2012.
http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html
8. "What Is Hadoop?" IBM. N.p., n.d. Web. 25 Mar. 2013. <http://www01.ibm.com/software/data/infosphere/Hadoop/>.
9. Apache Hadoop. http://Hadoop.apache.org.
10. Han, Jiawei, Kamber, Micheline. Data Mining: Concepts and Techniques. Boston, Mass:
Elsevier, 2006. Web. 1 Apr. 2013.
11. Khan, Arshad. Data Warehousing 101: Concepts and Implementation. San Jose, Calif: Khan
Consulting and Publishing, 2003. Web. 1 Apr. 2013.
12. Kamath, Chandrika. "Large Scale Data Mining and Pattern Recognition: Overview." Large
Scale Data Mining and Pattern Recognition: Overview. Lawrence Livermore National
Laboratory, 30 Aug. 2000. Web. 01 Apr. 2013.
<https://computation.llnl.gov/casc/sapphire/overview/overview.html>.
13. Jon, Oltsik. "Defining Big Data Security Analytics." Network World. Network World, Inc.,
01 Apr. 2013. Web. 02 Apr. 2013. <http://www.networkworld.com/community/node/82758>.
14. Y. Lee and Y. Lee, Detecting DDoS Attacks with Hadoop, ACM CoNEXT Student
Workshop, Dec.2011.
15. Tene, Omer. "Big Data for All: Privacy and User Control in the Age of Analytics." Center for
Internet and Society. N.p., 20 Sept. 2012. Web. 16 Mar. 2013.
<http://cyberlaw.stanford.edu/blog/2012/09/big-data-all-privacy-and-user-control-ageanalytics>.
16. Wong, R. C. "Big Data Privacy." J Inform Tech Softw Eng 2 (2012): e114.