Information about Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine...

Published on July 18, 2007

Author: xllora

Source: slideshare.net

A byproduct benefit of using probabilistic model-building genetic algorithms is the creation of cheap and accurate surrogate models. Learning classifier systems---and genetics-based machine learning in general---can greatly benefit from such surrogates which may replace the costly matching procedure of a rule against large data sets. In this paper we investigate the accuracy of such surrogate fitness functions when coupled with the probabilistic models evolved by the x-ary extended compact classifier system (xeCCS). To achieve such a goal, we show the need that the probabilistic models should be able to represent all the accurate basis functions required for creating an accurate surrogate. We also introduce a procedure to transform populations of rules based into dependency structure matrices (DSMs) which allows building accurate models of overlapping building blocks---a necessary condition to accurately estimate the fitness of the evolved rules.

Motivation • Competent GBML – Use competent GAs to approach GBML problems – Take advantage of competent GA scalability – Provide insight about problem structure – χeCCS by Llorà, Sastry, Goldberg & de la Ossa (2006) • Rule matching may thread practical applications – Even for small dimensional problems (MUX 20), rule matching may take more than 85% of the execution time in XCS – As dimensionality or cardinality of training sets increase, rule matching rules the overall execution time – Efficient implementation (Llora & Sastry, 2006) still require matching rules GECCO 2007 Llorà, Sastry, Yu & Goldberg 2

Motivation • Competent GAs – Byproduct: Models and problem structure insight – Revision of the fitness relaxation for expensive fitness evaluations – Idea: Build a cheap surrogate fitness accurate enough – Successfully applied to GA (Sastry, Lima & Goldberg, 2006) – Help cut down the number of fitness evaluations • GBML – Can we transfer the same ideas to GBML approaches? – What are the requirements needed for competent GBML to benefit from fitness relaxation? GECCO 2007 Llorà, Sastry, Yu & Goldberg 3

Outline • Overview χeCCS • Evaluation relaxation • Fitness inheritance using least squares fitting • Fitness inheritance and χeCCS • Results • Conclusions GECCO 2007 Llorà, Sastry, Yu & Goldberg 4

χ-ari Extended Compact Classifier System • No reinforcement learning is used • A competent GA is in charge of the learning • The idea: – A population of single rules – For each rule we compute its fitness – The χ-ari extended compact genetic algorithm – Niching to maintain different accurate rules (restricted tournament replacement) GECCO 2007 Llorà, Sastry, Yu & Goldberg 5

Maximally Accurate and General Rules • Accuracy and generality can be compute as n t + (r) + n t# (r) n t + (r) quot;(r) = quot;(r) = nt nm • Fitness should combine accuracy and generality f (r) = quot;(r) # $(r)% ! ! • Such measure can be either applied to rules or rule sets ! GECCO 2007 Llorà, Sastry, Yu & Goldberg 6

Maximally Accurate and General Rules GECCO 2007 Llorà, Sastry, Yu & Goldberg 7

Extended Compact Genetic Algorithm • A Probabilistic model building GA (Harik, 1999) – Builds models of good solutions as linkage groups • Key idea: – Good probability distribution → Linkage learning • Key components: – Representation: Marginal product model (MPM) • Marginal distribution of a gene partition – Quality: Minimum description length (MDL) • Occam’s razor principle • All things being equal, simpler models are better – Search Method: Greedy heuristic search GECCO 2007 Llorà, Sastry, Yu & Goldberg 8

Marginal Product Model (MPM) • Partition variables into disjoint sets • Product of marginal distributions on a partition of genes • Gene partition maps to linkage groups MPM: [1, 2, 3], [4, 5, 6], … [l-2, l -1, l] ... xl-2 xl-1 xl x1 x2 x3 x4 x5 x6 {p000, p001, p00#, p010, p011, p01#, p100, p101, p10#, p110, p111, p11# … }(27 probabilities) GECCO 2007 Llorà, Sastry, Yu & Goldberg 9

Minimum Description Length Metric • Hypothesis: For an optimal model – Model size and error is minimum • Model complexity, Cm – # of bits required to store all marginal probabilities • Compressed population complexity, Cp – Entropy of the marginal distribution over all partitions • MDL metric, Cc = Cm + Cp GECCO 2007 Llorà, Sastry, Yu & Goldberg 10

Building an Optimal MPM • Assume independent genes ([1],[2],…,[l]) • Compute MDL metric, Cc • All combinations of two subset merges Eg., {([1,2],[3],…,[l]), ([1,3],[2],…,[l]), ([1],[2],…,[l-1,l])} • • Compute MDL metric for all model candidates • Select the set with minimum MDL, • If , accept the model and go to step 2. • Else, the current model is optimal GECCO 2007 Llorà, Sastry, Yu & Goldberg 11

χeCCS Models for Different Multiplexers Building Block Size Increases

Fitness Inheritance using Least Squares • Proposed by Sastry, Lima & Goldberg (2006) • Surrogate is a regression using basis identified by BBs • A simple example: [1,3] [2] [4] • The schemas represented are – {0*0*, 0*1*, 0*#*, 1*0*, 1*1*, 1*#*, #*0*, #*1*, #*#*, *0**, *1**, *#**, ***0 , ***1 , ***#} • Recode the individuals by GECCO 2007 Llorà, Sastry, Yu & Goldberg 13

Fitness Inheritance using Least Squares • Recoding defines matrix A • Normalize the fitness GECCO 2007 Llorà, Sastry, Yu & Goldberg 14

Fitness Inheritance using Least Squares • Solve using least squares • Once solved, the fitness surrogate take the following form GECCO 2007 Llorà, Sastry, Yu & Goldberg 15

Fitness Inheritance and χeCCS • Two different problems Hidden XOR 6-input multiplexer GECCO 2007 Llorà, Sastry, Yu & Goldberg 16

Hidden XOR • Evolved rules and model • Surrogate accuracy GECCO 2007 Llorà, Sastry, Yu & Goldberg 17

6-input Multiplexer • The evolved solution and model • The surrogate is totally off GECCO 2007 Llorà, Sastry, Yu & Goldberg 18

6-input Multiplexer • The key = missing basis • χeCCS is able to solve the problem quickly, reliably, and accurately • However, the model basis are not accurate enough to build a proper surrogate GECCO 2007 Llorà, Sastry, Yu & Goldberg 19

Overlapping BBs using DSMGA • Proposed by Yu, Yassine, Goldberg and Chen (2003) • Based on organizational theory • Main property = DSMGA model builder (DSMcluster) deals with overlapping building blocks • The main issue = translate a populations or rules into a dependency structure matrix (DSM) • The intuition = specific bits are the ones responsible for the kind of linkage we seek GECCO 2007 Llorà, Sastry, Yu & Goldberg 20

Jumping to the Results • DSMcluster model for the hidden XOR – [i0 i1 i2] [i3] [i4] [i5] • DSMcluster model for the 6-input multiplexer – [i0 i1] <i2 i3 i4 i5> – It identifies a BB [i0 i1] of variables interacting with a bus <i2 i3 i4 i5> – Translated into χeCCS language: [i0 i1 i2] [i0 i1 i3] [i0 i1 i4] [i0 i1 i5] – The right model which provides the right set of basis GECCO 2007 Llorà, Sastry, Yu & Goldberg 21

Conclusions • The matching process is crucial and expensive • Efficient implementations can take us far to a point • Relaxation can get rid of the need of matching • For some types of problems overlapping BBs are required • DSMGA provides the proper machinery to identify the proper basis for such a surrogate GECCO 2007 Llorà, Sastry, Yu & Goldberg 22

Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques Xavier Llorà1,2, Kumara Sastry2, Tian-Li Yu3, David E. Goldberg2 1 National Center for Supercomputing Applications. University of Illinois at Urbana-Champaign 2 Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign 3 Department of Electrical Engineering, National Taiwan University Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 GECCO 2007 HUMIES 23

Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques Xavier Llora`1, Kumara Sastry2, Tian-Li Yu3, and David E. Goldberg2

Read more

Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques Xavier Llor`a1, Kumara Sastry2, Tian-Li Yu2, and David E. Goldberg2

Read more

Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques (2007)

Read more

« Previous post - - Next post » Do not match, inherit: Fitness surrogates for genetics-based machine learning techniques 28 March 2007 . Llorà X. Sastry ...

Read more

... Fitness Surrogates for Genetics-Based Machine Learning Techniques. ... genetics-based machine learning ... Do not Match, Inherit: Fitness Surrogates ...

Read more

Publication » Population Size and Genetic Drift in Fitness ... Do not match, inherit: fitness surrogates for genetics-based machine learning techniques.

Read more

... over two specific Genetics-based Machine Learning ... Do not match, inherit: fitness surrogates for genetics-based machine learning techniques ...

Read more

Machine Learning Techniques. Articles, experts, ... Machine Learning Scientist at Amazon, Software Development Engineer at Microsoft Corporation, ...

Read more

Linkage Learning, Rule Representation, and the ...

Read more

## Add a comment