Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics and Data Mining with an Ant Colony Algorithm The aim of the project is to discover new knowledge about post synaptic protein activity with a new data mining method Proteins are made up from a sequence of amino acids. Predicting the functionality of a protein from its sequence is very difficult. A synapse is the point where two nerve cells communicate with each other by transmission of a chemical known as neurotransmitter. Study of post synaptic protein activity is important in understanding the nervous system. Data mining is performed on a large set of bioinformatics data Classification in data mining is the process of discovering, from training data, a set of rules predicting the class of a record. In this project a record consists of data describing a protein, and the class to be predicted is the presence or absence of post-synaptic activity in a protein. These rules can then be applied to new unclassified data to classify it. Rules are of the form: IF <Term1, Term2, ... ,Termn> THEN <class value> An example of a rule predicting post synaptic activity: IF (NEUROTR_ION_CHANNEL is present) THEN (post-synaptic activity is present) The new data mining algorithm is based on the behaviour of ants To find the shortest path to food, or around an obstacle, ants release pheromones as they move towards their goal, and other ants are attracted to this pheromone. Although there may be more than one path to a goal, the shortest path will accumulate the largest amount of pheromone within a given time, therefore becoming more attractive to subsequent ants. An abstract model of this principle has been used in data mining to discover classification rules. Results The algorithm discovered a set of classification rules that were able to classify the data with an accuracy of over 98%.More importantly, the rules that were discovered are very easily interpretable biologists as they have few terms and can be understood independently from one another. Creating comprehensible rules is a key aim of data mining. Mean Accuracy Run (%) Mean Rule Count Term To Rule Ratio 1 98.26+-0.16 101.5+-0.19 1.21 2 98.26+-0.12 101.3+-0.17 1.21 This project was conducted by James Smaldon and Dr. Alex A. Freitas, for more details please contact James Smaldon ([email protected])