Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009 Data Mining Overview What is Data Mining? The process of extracting values from a database Why do we need/use it? Predictive technology Allows for automated decision making Data Mining Overview (continued) What problems does it solve? Stock Market prediction Credit card fraud Loan approval/denial How does it work? Data analysis of a given set of information Data Mining Tools Decision Trees A series of rules that allows for automated decision. Common use: credit card and health insurance approvals Regression Analysis of the association between a dependent variable and an independent variable. Common use: prediction Neural Networks The Basis of Neural Networks Adapted from the research of Artificial Intelligence Based loosely on the biological functionality of neurons Mimics the ability to “learn” A neuron is a specialized cell that sends an electrochemical signal The Basis of Neural Networks (cont.) Each neuron has a specific function and is grouped with other neurons to be able to perform complex tasks Each neuron has a “weight” which is a determining factor in the importance of the specific function being processed How Neural Networks Work An individual neuron has a step activation function which means that it can have either a -1,0 or 1 value. A value of -1 means that it is an inhibitor and will lessen the weight of the combined neurons The individual neurons are the connected to each other as inputs and outputs. The inputs carry the values of variables of interest The outputs form predictions or control signals How Neural Networks Work (cont.) Feedforward Structure The most useful in solving real-world problems Signals flow from inputs through hidden units, eventually to the output units Input layer is used only to introduce the values of the input variables The hidden and output layer neurons are each connected to the all of the units of the preceding layer How Neural Networks Work (cont.) When the network is used, the variable values are placed in the input units and each subsequent layer, calculates the weighted sum of the outputs of the preceding layer until reaching the final layer. How Do You Apply a Neural Network Exact nature of inputs and outputs will be unknown Large quantities of data are necessary Data can be “noisy” 2 ways to set-up the network Supervised Learning Unsupervised Learning Supervised Learning Data involves historical data sets containing input variables, which correspond to an output Uses training and testing data to build a model The training data is what the neural network uses to “learn” how to predict the known output. Also used for validation Famous algorithm is back propagation Uses the data to adjust the weights to minimize the error in its predictions. Unsupervised Learning Very uncommon to use Attempts to locate clusters within the input data regardless of variable Supervised Learning only uses input variables from a training set Advantages to Using a Neural Network High Accuracy Able to approx. complex non-linear mapping Noise Tolerance Flexible with respect to missing and noisy data Ease of maintenance Can be implemented in parallel hardware Can be updated with new data, making them dynamic Disadvantages to Using a Neural Network Poor Transparency Operate as “black boxes” with little/no knowledge of the algorithms used Trial-and-Error Design The selection of hidden nodes and training parameters are heuristic Data Hungry Requires large amounts of data to be accurate which also means more computing power Applications of Neural Networks Detection of medical phenomena Recognizes predictive patterns to prescribe appropriate treatment Stock market prediction Large numbers of factors are introduced and used by technical analysts Credit assignment Identifies most relevant characteristics and classifies applicants as good or bad credit risks