Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Human genetic variation wikipedia , lookup
Koinophilia wikipedia , lookup
Genome (book) wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Genetic drift wikipedia , lookup
Gene expression programming wikipedia , lookup
Group selection wikipedia , lookup
GAs and Feature Weighting Rebecca Fiebrink MUMT 611 31 March 2005 1 Outline • Genetic algorithms – History of GAs – How GAs work • Simple model • Variations • Feature selection and weighting – Feature selection – Feature weighting • Using GAs for feature weighting – Siedlecki and Sklansky – Punch – The ACE project 2 of 25 Part 1: Genetic Algorithms 3 History of GAs • 1859: Charles Darwin, Origin of Species • 1950’s – 60’s: Computers simulate evolution • 1960’s – 70’s: John Holland invents genetic algorithms – Adaptation in Natural and Artificial Systems, book published 1975 – Context of “adaptive systems,” not just “optimization” • 1980’s – Present: Further exploration, widespread adoptation – Kenneth De Jong – Goldberg: Genetic Algorithms in Search, Optimization, and Machine Learning (“classic” textbook published 1989) • Related areas: evolutionary programming, genetic programming 4 of 25 (Carnegie Mellon GA online FAQ, Cantu-Paz 2000) How GAs work • Problem is a search space – Variable parameters dimensions – Problem has minima and maxima within this space – Minima and maxima are hard to find • Potential solutions describe points in this space – It is “easy” to calculate “goodness” of any one solution and compare it to other solutions • Maintain a set of potential solutions – – – – Guess a set of initial solutions Combine these solutions with each other Keep “good” solutions and discard “bad” solutions Repeat combination and selection process for some time 5 of 25 A simple GA Start Evaluate Population Terminate? Select Parents Produce Offspring Stop 6 of 25 Mutate Offspring GA terminology • • • • • • • • • • Population: The set of all current potential solutions Deme: A subset of the population that interbreeds (might be the whole population) Chromosome: A solution structure (often binary); contains one or more genes Gene: A feature or parameter Allele: A specific value for a gene Phenotype: A member of the population, a real-valued solution vector Generation: A complete cycle of evaluation, selection, crossover, and mutation in a deme Locus: A particular position (bit) on the chromosome string Crossover: The process of combining two chromosomes; simplest method is single-point crossover where two chromosomes swap parts on either side of a random locus Mutation: A random change to a phenotype (i.e., changing an allele) 7 of 25 Crossover Illustrated Parent 1 1011010111101 Parent 2 + 101101 0010100 1101010010100 110101 0111101 Child 1 Child 2 8 of 25 GA Variations • Variables: – – – – – Population size Selection method Crossover method Mutation rate How to encode a solution in a chromosome • No single choice is best for all problems (Wolpert & Macready 1997, cited in CantuPaz 2000). 9 of 25 More variations: Parallel GAs • Categories: – Single-population Master/Slave – Multiple population – Fine-grained – Hybrids Master Slaves 10 of 25 (Cantu-Paz 2000) Applications of GAs • • • • • • • • • NP problems (e.g., Traveling Salesman) Airfoil design Noise control Fluid dynamics Circuit partitioning Image processing Liquid crystals Water networks Music (Miettinen et al. 1999, Coley 1999) 11 of 25 Part 2: Feature Selection and Weighting 12 Features • What are they? – A classifier’s “handle” to the problem – Numerical or binary values representing independent or dependent attributes of each instance to be classified • What are they in music? – Spectral centroid – Number of instruments – Composer – Presence of Banjo – Beat histogram -… 13 of 25 Feature Selection – Why? • “Curse of dimensionality”: – The size of the training set must grow exponentially with the dimensionality of the feature space • The set of best features to use may vary according to classification scheme, goal, or data set • We need a way to select only a good subset of all possible features – May or may not be interested in “best” subset 14 of 25 Feature Selection – How? • Common approaches – Dimensionality reduction (principle component analysis or factor analysis) – Experimentation: Choose a subset, train a classifier with it, and evaluate its performance. • Example: <centroid, length, avg_dynamic> – A piece might have vector <3.422, 523, 98> (values) – Try selections <1, 1, 0>, <0, 1, 1> … ? (1=yes, 0=no) – Which works best? • Setbacks in experimentation: – For n potential features, there are 2n possible subsets: a huge search space! – Evaluating each classifier takes time – Choose vectors using sequential search, branch, and bound, or GAs 15 of 25 Feature Weighting • Among a group of selected (useful) features, some may be more useful than others • Weight the features for optimal classifier performance • Experimental approach: Similar to feature selection – Example: Say <1, 0, 1> is optimal selection for <centroid, length, avg_dynamic> feature set • Try weights like <.5, 0, 1>, <1, 0, .234983>, … ? • Practical and theoretical constraints of feature selection are magnified 16 of 25 Part 3: Using GAs for Feature Weighting 17 GAs and Feature Weighting • A chromosome is a vector of weights – Each gene corresponds to a feature weight • E.g., <1, 0, 1, 0, 1, 1, 1> for selection • E.g., <.3, 0, .89, 0, .39, .91, .03> for weighting • The fitness of a chromosome is a measure of its performance training and validating an actual classifier 18 of 25 Siedlecki and Sklansky • 1989: A note on genetic algorithms for largescale feature selection • Choose feature subset for Knn classifier • Propose using GAs when feature set > 20 • Binary chromosomes (0/1 for each feature) • Goal: seek the smallest/least costly subset of features for which classifier performance above a certain level 19 of 25 Results • Compared GAs with exhaustive search, branch and bound, and (p,q)-search on sample problem – Branch and bound and (p, q)-search did not come close enough (within 1% error) to finding a set that performed optimally – Exhaustive search used 224 evaluations – GA found optimal or near-optimal solutions with 1000-3000 evaluations, a huge savings over both exhaustive search and branch and bound • GA exhibits slightly more than linear increase of time complexity with added features • GA also outperforms other methods on a real problem with 30 features 20 of 25 Punch et al. • 1993: Further research on feature selection and classification using genetic algorithms • Approach: Use GAs for feature weighting for Knn classifier: classifier “dimension warping” • Each chromosome is a vector of real-valued weights – Mapped exponentially to range [.01, 10) or 0 • Also experimented with “hidden features” representing combination (multiplication) of 2 other features 21 of 25 Results • Feature weighting can outperform simple selection, especially on large data set with noise – Best performance when feature weighting follows binary selection • 14 days to compute results • Parallel version: nearly linear speedup when strings passed to other processors for fitness evaluation 22 of 25 Recent work • Minaei-Bidgoli, Kortemeyer, and Punch, 2004: Optimizing classification ensembles via a genetic algorithm for a web-based educational system – Incorporate GA feature weighting in a multiple-classifier system (vote system) – Results show over 10% improvement in accuracy of the classifier ensemble when GA optimization is used 23 of 25 MIR Application • ACE project – Accept arbitrary type and number of features, arbitrary taxonomy, training and testing data, and a target classification time – Classify data using best means available; e.g., make use of multiple classifiers, feature weighting – Parallel GAs for feature weighting can improve solution time and quality 24 of 25 Conclusions • GAs are powerful and interesting tools, but they are complex and not always well-understood • Feature selection/weighting involves selecting from a huge space of possibilities, but it is a necessary task • GAs are useful for tackling the feature weighting problem • Faster machines, parallel computing, and better understanding of GA behavior can all benefit the feature weighting problem • This is relevant to MIR: we want to do good and efficient classification! 25 of 25