Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STEGANOGRAPHY: Data Mining: SOUNDARARAJAN EZEKIEL Department of Computer Science Indiana University of Pennsylvania Indiana, PA 15705 Steganography Cryptography Data Mining Art of hiding information in ways that prevent the detection of hidden message Existence is not know Science of writing in secret code It encodes a message so it cannot be understood Discovering hidden Values in your data Warehouse That is The extraction of hidden predictive information from large database Knowledge discovery method– extraction of implicit and interesting pattern from large data collection Data Mining-- Introduction It started when we started to store data in computer( businesses) Continued improvements– technology that navigate through data in real time Examples:– – – – – – – – Single case: Web server collect data for every single cleick Logs are too big and contain gibberish Lots of data and statistics What we collected is not really useful Multiple Case:Collection of web servers with large bandwidth Think about the size of the data we collect Data Mining --- Continue It helps to design better and more intelligent business( e-learning environments) because it supported by – Massive data collection – Powerful multiprocessor computers – Good data mining algorithms It existed at least 10 years, but it is getting popular recently Example:– Winter Corporation Report • Data warehouses with as much as 100 to 200 terabytes of raw data will be operational by next year, performing nearly 2,000 concurrent queries and occupying nearly 1 petabyte (1,000 terabytes) of disk space. In the same time period, transactionprocessing databases will handle workloads of nearly 66,000 transactions per second Evolution of Data mining Evolutionary step Question Tech Product providers characteristics Data collection 60’s What was my total revenue last few years Computer, tapes, disks IBM , CDC Retrospective static data delivery Data Access 80’s What were unit sales in India last year January RDBMS(Relation al DataBases) SQL( Structured Query Languages) ODBC Oracle Sybase Informix IBM Microsoft Dynamic data delivery Data warehouse and decision support 90’s What were unit sales price in India last March? On-line analytic processing (OLAP) Multidimensional data base, data warehouses Pilot Comshare Arbor Cognos Microstrategy Dynamic data delivery in multiple level Data mining Now What will be unit price in India next month? Why? Advanced algorithms, multiprocessor computers, massive database Pilot Lockheed IBM,SGI Many more… Prospective, proactive information delivery The scope of Data mining It is similar to sifting gold from immense amount of dirt--- searching valuable information in a gigabytes data Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in a large database. • Example: Question related to target marketing – Data mining can use mailing list data– other previous data to identify the solution • Another example- Forecasting bankruptcy by identifying segments of a population likely to respond similarly to given events Automated discovery of previously unknown patterns: It sweep through the database and identify previously hidden patterns in one step – Example: Unrelated items purchased together in a store. • Detecting fraudulent credit card transactions etc Data base can be larger in both depth and breadth – High performance data mining need to analyze full depth of a database without pre-selecting subsets – Larger samples yield lower estimation errors and variances Research Rank 2001 – According to MIT’s Technology Review – Data mining is a top 10 research area Recently – According to Gartner Group Advanced Technology Research Note– data mining and AI is top 5 key research area. Multi-disciplinary field with a broad applicability Has several applications – Market based analysis – Customer relationship management – Fraud detection – Network intrusion detection – Non-destructive eavaluation – Astronomy (look up dataa) – Remote sensing data • ( look down data) – Text and mulitmedia mining – Medical imaging – Automated target recognition My point of view of Data mining Borrowing the idea from •Machine Learning •Artificial Intelligence •Statistics •High performance computing •Signal and Image Processing • Mathematical Optimization • Pattern Recognition •Natural Language processing •Steganography •Cryptography Combined ideas from several diffferent fields – Steganography-- Cryptography General view of Data mining Raw Data Target Preprocessed Transformed Data Data data Data processing Data Fusion De-noising Object Sampling Identification MRA Feature Extraction Normalization Pattern pattern recog. Dimensi on Reducti on Classification Clustering Regression Knowledge Interpreting results Visualization Validation An Iterative and Interactive Process Our Research Based On Data Preprocessing – Multiresolution Analysis – De-noising ( wavelet based methods) – Object Classifications – Feature Extraction Pattern Recognition – Classification – Clustering Visualization and Validation – Steganography – Cryptography Where we are going from here More robust , accurate, scalable algorthim – For pre-processing and pattern recognition – Wavelets– and fractals Newer data types – Video and multimedia – Multi-sensor data More complex problems – Dynamic tracking in video – Mining text, audio, video, images Investigating Steganography in images, analysis of data hiding methods, attacks against hidden information, and counter measures to attacks against digital watermarking ( detection and distortion) How data mining works? How exactly the data mining able to tell you important things that you did not know or what is going to happen next? The method/ techniques that is used to perform these feats in data mining is called modeling – Modeling is simply the act of building a model in one situation where you know the answer and then applying it to another situation that you don’t – Example: Sunken treasure ship– Bermuda shore, other ships– path- keep all these information– build the model– if the model is good– you find the treasure in the ocean – Example 2: Identify telephone customer– for example you have the information that is the model that 98% customer who makes $60K per year spend more than $80 per month on long distance • with this model new customer can be selectively targeted Most commonly used techniques Artificial Neural Networks: Non linear predictive models that learn through training and resemble biological neural networks in structure Decision Trees: Tree- shaped structures that represents set of decisions . These decisions generated rules for the classification of a dataset. Specific decision tree include classification and Regression Test(CART)and Chi Square Automated Interaction Detection (CAID) Genetic Algorithms: optimization techniques that uses processes genetic combination, mutation, and selection in a design based on the concept of evolution Nearest Neighbor Method: Rule Induction: OUR METHODS WILL BE BASED ON WAVELETS, FRACTALS, STEG, AND CRYPT Steganography Methods Lets us discuss few methods and its advantage and disadvantage 1. Least Significant Method – Idea:- Hide the hidden message in LSB of the pixels – Example:– Advantage:- quick and easy– works well in gray image – Disadvantage:- insert in 8 bit– changes color– noticeable change– vulnerable to image processing– cropping and compression Redundant method – Store more than one time--- withstand cropping Spread Spectrum – Store the hidden message everywhere STEGANALYSIS Detection Distortion Analyst observe various Various relationship between Cover, message, stego-media Steganography tool Seeing the Unseen Analyst manipulate the stego-media To render the embedded information Useless or remove it altogether DCT - Discrete Cosine Transformation – Encode • Take image • Divide into 8x8 blocks • Apply 2-D DCT--- DCT coefficients • Apply threshold value • Store the hidden message in that place 1720 5.667 • Take inverse– store as 0.3711 3.888 image – Decode • Start with modified image • Apply DCT • Find coefficient less than T • Extract bits • Combine bits and make message 1.524 3.475 -1.442 -3.356 1.625 -2.279 -4.049 -1.223 1.876 1.924 0.8995 -0.7233 219 219 217 215 217 216 215 215 216 217 215 216 216 214 214 216 218 215 214 215 210 216 216 216 215 216 214 210 218 215 212 211 215 215 211 218 215 212 212 215 215 215 217 215 213 214 217 215 215 216 215 215 216 218 216 216 218 215 211 211 213 214 216 216 7.683 -4.181 1.067 -1.97 0.4735 0.5466 -1.369 0.667 1.234 1.625 0.9234 -0.07047 -1.055 -1.524 1.152 1.637 1.016 0.3802 5.944 0.3943 -0.4591 0.1313 0.7812 3.265 0.5632 -0.939 -0.2434 0.2354 1.392 1.375 0.6552 -1.143 0.03459 -0.5425 -1.013 -0.2651 0.5696 -0.9296 -1.132 -0.02802 -0.4646 0.1831 0.9729 0.436 0.1325 -0.03665 -0.3141 -0.4749 Wavelets Transformation Wavelets are basis function w jk (t ) in continuous time. a basis is a set of linearly independent functions that can be used to produce all admissible functions f(t) f (t ) combination of basis functions b jk w jk (t ) j ,k The special feature of wavelet basis is that all functions w jk (t ) are constructed from a single mother wavelet w(t). This wavelet is is a small wave ( a pulse). Normally it starts at time t=0 and end at time t=N j w0 k (t ) w(t k ) Shifted k time = w w (2 t ) j 0 Compressed = Combine both we have wjk (t ) w(2 j t k ) Haar Wavelet :- 1909 Haar, 1984– theory, 88– daubechies Haar= 89- Mallat 2-d, mra, -- 92- bi-orthogonal figure Carrier Stego image Wavelet Transformation Thresholding Compression Message to be Hidden Error Image Inverse Transformation Extract the Hidden Message Information security and data mining Goal of intrusion detection – discover intrusion into a computer or network With internet and available tool for attacking networks– security becomes a critical component of network Misuse detection: finds intrusion by looking for activity corresponding to known techniques for intrusion Anomaly detection: the system defines the expected behavior of the network in advance What we want The tools to filter and classify information Tools to find and retrieve the relevant information when you need it Tools that adapt to your pace and needs Tools to predict information needs Tools to recommend tasks and information sources Tools than can be personalized, manually or automatically The tools should be… Non- intrusive Secure Integrated Adaptable Controllable Automatic or semi-automatic Useful For learners For educators Integrate operational data with customer, suppliers and market -- Profitable application A wide range of companies have deployed successful application of data mining Some applications area include – A pharmaceutical company can analyze its recent sales force activity and their results to improve target of high-value physician and determine which marketing activities will have the greatest impact in the next few months – A credit card companies can leverage its vast warehouse of customers transactions data to identify customers most likely to be interested in a new credit product – A diversified transportation company with a large direct sales forces can apply data mining to identify the best prospect for its services – A large consumer package goods company can apply data mining to improve its sales process to retailers Conclusion In this talk, we have discussed data mining related topics Our goals – Research – Software and algorithms – Application Our main focus is Science Data, though applicable to other data sets as well More information – check out website http://www.cosc.iup.eud/sezekiel Contact: [email protected]