Information about Modeling XCS in class imbalances: Population sizing and parameter settings

Published on July 14, 2007

Author: kknsastry

Source: slideshare.net

This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.

The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.

The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.

Framework New instance Information based Knowledge on experience extraction Data Learner Domain model Predicted Output Examples Consisting Counter-examples of In real-world domains, typically: Higher cost to obtain examples of the concept to be learnt So, distribution of examples in the training dataset is usually imbalanced Applications: Fraud detection Medical diagnosis of rare illnesses Detection of oil spills in satellite images Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 2 GECCO’07

Framework Do learners suffer from class imbalances? Training Minimize the Learner Set global error num. errorsc1 + num. errorsc 2 error = Biased towards number examples the overwhelmed class Maximization of the overwhelmed class accuracy, in detriment of the minority class. And what about incremental learning? – Sampling instances of the minority class less frequently Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 3 GECCO’07

Aim Facetwise analysis of XCS for class imbalances How can XCS create rules of the minority class When XCS will remove these rules Population size bound with respect to the imbalance ratio Until which imbalance ratio would XCS be able to learn from the minority class? Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 4 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Outline 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 5 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Description of XCS 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions In single-step tasks: Environment Match Set [M] Match Set [M] Problem Minority Majority classinstance instance 1C A PεF num as ts exp 1C A PεF num as ts exp Selected 3C A PεF num as ts exp 3C A PεF num as ts exp action 5C A PεF num as ts exp 5C A PεF num as ts exp Population [P] Population [P] 6C A PεF num as ts exp 6C A PεF num as ts exp Match set Match set REWARD … … generation generation 1C A PεF num as ts exp 1C A PεF num as ts exp Prediction Array 1000/0 2C A PεF num as ts exp 2C A PεF num as ts exp 3C A PεF num as ts exp 3C A PεF num as ts exp … c1 c2 cn 4C A PεF num as ts exp 4C A PεF num as ts exp 5C A PεF num as ts exp 5C A PεF num as ts exp 6C A PεF num as ts exp 6C A PεF num as ts exp Random Action Nourished niches Starved niches … … Action Set [A] 1C A PεF num as ts exp Deletion Classifier 3C A PεF num as ts exp Selection, Reproduction, Parameters Mutation 5C A PεF num as ts exp Update 6C A PεF num as ts exp … Genetic Algorithm Problem niche: the schema defines the relevant attributes for a particular problem niche. Eg: 10**1* Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 6 GECCO’07

Outline 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 7 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Study XCS capabilities to provide representatives of starved niches: – Population covering – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers Derive a bound on the population size to guarantee that XCS will learn starved niches Depart from theory developed for XCS – (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS – (Butz, Goldberg & Lanzi, 04): Learning time bound – (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche support – (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design. Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 8 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Assumptions – Problems consisting of n classes – One class sampled with a lower frequency: minority class num. instances of any class other than the minority class ir = num. instances of the minority class – Probability of sampling an instance of the minority class: 1 Ps(min) = 1 + ir Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 9 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 10 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Population Initialization 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Covering procedure – Covering: Generalize over the input with probability P# – P# needs to satisfy the covering challenge (Butz et al., 01) Would I trigger covering on minority class instances? – Probability that one instance is covered, by, at least, one rule is (Butz et. al, 01): Population Input specificity length Initially 1 – P# Population size Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 11 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Population Initialization 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Probability to apply covering on the first minority class instance l = 20 Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 12 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 13 GECCO’07

1. Description of XCS Creation of Representatives of 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem Starved Niches 5. Analysis of Deviations 6. Results 7. Conclusions Assumptions – Covering has not provided any representative of starved niches – Simplified model: only consider mutation in our model. How can we generate representative of starved niches? – In the population there are: • Representative of nourished niches • Overgeneral classifiers – Specifying correctly all the bits of the schema that represents the starved niche Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 14 GECCO’07

1. Description of XCS Creation of Representatives of 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem Starved Niches 5. Analysis of Deviations 6. Results 7. Conclusions Summing up, time to get the first representative of a starved niche n: number of classes μ: Mutation probability km: Order of the schema Time to extinction Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 15 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 16 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Population size bound to guarantee that there will be representatives of starved niches – Require that: – Bound: n: number of classes μ: Mutation probability km: Order of the schema Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 17 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Population size bound to guarantee that representatives of starved niches will receive a genetic opportunity: – Consider θGA = 0 – We require that the best representative of a starved niche receive a genetic event before being removed – Population size bound: n: number of classes ir: Imbalance ratio Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 18 GECCO’07

Outline 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 19 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Design of Test Problems 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions One-bit problem Condition length (l) 000110 :0 Value of the left-most bit – Only two schemas of order one: 0***** and 1***** Parity problem Condition length (l) Number of 1 mod 2 01001010 :1 Relevant bits ( k) – The k bits of parity form a single building block 1 Ps(min) = Undersampling instances of the class labeled as 1 1 + ir Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 20 GECCO’07

Outline 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 21 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS on the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions XCS configuration α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6 selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir Evaluation of the results: – Minimum population size to achieve: TP rate * TN rate > 95% – Results are averages over 25 seeds Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 22 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS on the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions N remains constant up to ir = 64 N increases linearly from ir=64 to ir=256 N increases exponentially from ir=256 to ir=1024 Higher ir could not be solved Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 23 GECCO’07

Outline 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 24 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Analysis of the Deviations 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Inheritance Error of Classifiers’ Parameters – New promising representatives of starved niches are created from classifiers that belong to nourished niches. – These new promising rules inherit parameters from these classifiers. This is specially delicate for the action set size (as). – Approach: initialize as=1. Subsumption – An overgeneral classifier of the majority class may receive ir positive reward before receiving the first negative reward – Approach: set θsub>ir Stabilizing the population before testing – Overgeneral classifiers poorly evaluated – Approach: introduce some extra runs at the end of learning with the GA switched off. Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 25 GECCO’07

Outline 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 26 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS+PCM in the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions N remains constant up to ir = 128 For higher ir, N slightly increases We only have to guarantee that a representative of the starved niche will be created Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 27 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS+PCM in the Parity Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Building blocks of size 3 need to be processed Empirical results agree with the theory Population size bound to guarantee that a representative of the niche will receive a genetic event Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 28 GECCO’07

Outline 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 29 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Conclusions and Further Work 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions We derived models that analyzed the representatives of starved niches provided by covering and mutation A population size bound was derived We saw that the empirical observations met the theory if four aspects were considered: – as initialization – Subsumption – Stabilization of the population XCS really robust to class imbalances Further analysis of the covering operator Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 30 GECCO’07

Modeling XCS in Class Imbalances: Population Size and Parameter Settings Albert Orriols-Puig1,2 David E. Goldberg2 Kumara Sastry2 Ester Bernadó-Mansilla1 1Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle, Ramon Llull University 2Illinois Genetic Algorithms Laboratory Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana Champaign

Motivation And what about incremental learning? Sampling instances of the minority class less frequently This influences the mechanisms of XCS (Orriols & Bernadó, 2006) Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 32 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Analysis of the Deviations 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Niched Mutation vs. Free Mutation – Classifiers can only be created if minority class instances are sampled Inheritance Error of Classifiers’ Parameters – New promising representatives of starved niches are created from classifiers that belong to nourished niches – These new promising rules inherit parameters from these classifiers. This is specially delicate for the action set size (as). – Approach: initialize as=1. Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 33 GECCO’07

1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Analysis of the Deviations 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Subsumption – An overgeneral classifier of the majority class may receive ir positive reward before receiving the first negative reward – Approach: set θsub>ir Stabilizing the population before testing – Overgeneral classifiers poorly evaluated – Approach: introduce some extra runs at the end of learning with the GA switched off. We gather all these little tweaks in XCS+PMC Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 34 GECCO’07

... Population Size and Parameter Settings ... population sizing and scalability in XCS has ... high class imbalances, the population size increased ...

Read more

Modeling XCS in Class Imbalances: Population Size and Parameter Settings . Cached. Download Links ... Population Size and Parameter Settings}, year = {}} ...

Read more

... imbalances: population size and parameter ... Modeling XCS in class imbalances: population ... class that are sampled to XCS|on population ...

Read more

Modeling XCS in class imbalances: population size and ... optimization heuristic over a range of heuristic tuning parameter settings and problem ...

Read more

Towards Adapting XCS for Imbalance ... XCS is robust to class imbalance. ... Modeling XCS in Class Imbalances: Population Size and Parameter Settings.

Read more

View Albert Orriols-Puigyz's professional profile. Publications: 1 | Citations: ... Modeling XCS in Class Imbalances: Population Size and Parameter Settings.

Read more

Modeling XCS in class imbalances: Population sizing and parameter settings This paper analyzes the scalability of the population size required in XCS to ...

Read more

## Add a comment