Published on July 14, 2007
Towards Billion Bit Optimization via Efficient Estimation of Distribution Algorithms Kumara Sastry1,2 David E. Goldberg1, Xavier Llorà1,3 1Illinois Genetic Algorithms Laboratory (IlliGAL) 2Materials Computation Center (MCC) 3National Center for Super Computing Applications (NCSA) University of Illinois at Urbana-Champaign, Urbana, IL 61801 email@example.com, firstname.lastname@example.org, email@example.com http://www.illigal.uiuc.edu Supported by AFOSR FA9550-06-1-0096 and NSF DMR 03-25939. Computational results were obtained using CSE’s Turing cluster.
Billion-Bit Optimization? Strides w/ genetic algorithm (GA) theory/practice. Solving large, hard problems in principled way. Moving to practice in important problem domains. Still GA boobirds claim: (1) no theory, (2) too slow, and (3) just voodoo. How demonstrate results achieved so far in dramatic way? DEG lunch questions: A million? Sure. A billion? Maybe. Naïve GA approach/implementation goes nowhere: ~100 terabytes memory for population storage. ~272 random number calls. 2
Roadmap Motivation Robust, scalable, and efficient GA designs Toward billion-variable optimization Theory Keys Implementation Keys Efficiency Keys Results Why this matters in practice? Challenges to using this in real-world. Summary and Conclusions 3
Three Os and Million/Billion Decisions The Os all have many decisions to make: Nano, Bio, and Info Modern systems increasingly complex: ~105 parts in a modern automobile ~107 parts in commercial jetliner. Increased complexity increases appetite for large optimization. Will be driven toward routine million/billion variable problems. “We get the warhead and then hold the world ransom for... 1 MILLION dollars!” 4
Competent and Efficient GAs Robust, scalable and efficient GA designs available. Competence: Solve hard problems quickly, reliably, and accurately (Intractable to tractable). Efficiency: Develop speedup procedures (tractability to practicality). Principled design: [Goldberg, 2002] Relax rigor, emphasize scalability/quality. Use problem decomposition. Use facetwise models, and patchquilt integration using dimensional analysis. Test algorithms on adversarial problems 5
Aiming for a Billion Theory & algorithms in place. Focus on key theory, implementation, & efficiency enhancements. Theory keys: Problem difficulty. Parallelism. Implementation key: compact GA. Efficiency keys: Various speedup. Memory savings. Results on a billion-variable noisy OneMax. 6
Theory Key 1: Master-Slave Linear Speedup Speed-up: Max speed-up at Near linear speed-up until [Cantu-Paz & Goldberg, 1997; Cantú-Paz, 2000] 7
Theory Key 2: Noise Covers Most Problems Adversarial problem design [Goldberg, 2002] Fluctuating R P Noise Deception Scaling Blind noisy OneMax 8
Implementation Key: Compact GA Simplest probabilistic model building GA [Harik, Lobo & Goldberg, 1997; Baluja, 1994; Mühlenbein & Paaß, 1996] Represent population by probability vector Probability that ith bit is 1 Replace recombination with probabilistic sampling Selectionist scheme New population evolution through probability updates Equivalent to GA with steady-state tournament selection and uniform crossover 9
Compact Genetic Algorithm (cGA) Random initialization: Set probabilities to 0.5 Model Sampling: Generate two candidate solutions by sampling the probability vector Evaluation: Evaluate the fitness of two sampled solutions Selection: Select the best among the sampled solutions Probabilistic model update: Increase the proportion of winning alleles by 1/n 10
Parallel cGA Architecture Processor #1 Processor #2 Processor #np Sample bits 1- l/np Sample bits l/np+1- 2l/np Sample bits (np-1)l/np+1- l Collect partial sampled solutions and combine Parallel fitness evaluation of sampled solutions Broadcast fitness values of sampled solutions Select best individual Select best individual Select best individual Update probabilities Update probabilities Update probabilities 11
cGA is Memory Efficient: O(l) vs. O(l1.5) Simple GA: Compact GA: Frequencies instead of probabilities (4 bytes) Parallelization reduces memory per processor by factor of np Orders of magnitude memory savings via efficient GA Example: ~32 MB per processor on a modest 128 processors for billion-bit optimization 12
Vectorization Yields Speedup of 4 Vectorize costly code segments with AltiVec/SSE2 Generate 4 random numbers at a time Sample 4 bits at a time Update 4 probabilities at a time SIMD instruction set allows vector operations on 128-bit registers Equivalent to 4 processors per processor 13
Other Efficiencies Yield Speedup of 15 Bitwise operations Limited floating-point operations Inline functions Avoid using mod and division operations Precomputing bit sums and indexing Parallel, vectorized, and efficient GA: Memory scales as Θ(l/np); Speedup scales as 60np ~32 MB memory, and ~104 speedup with 128 processors Solves 65,536-bit noisy OneMax problem in ~45 minutes on a 3GHz PC. 14
Experimental Procedure 128 – 256 processor partition of 1280-processor Apple G5 Xserve Population was doubled till cGA converged to at least l-1 out of l bits set to optimal values For l > 223; Population size fixed according to theory. Number of independent runs l ≤ 218 (262,144): 50 l ≤ 225 (33, 554, 432): 10 l > 225 (33, 554, 432): 1 Compare cGA performance with Sequential hillclimber (sHC) Random hillclimber (rHC) 15
Compact GA Population Sizing Noise-to-fitness variance ratio Error tolerance # Components (# BBs) Signal-to-Noise ratio # Competing sub-components Additive Gaussian noise with variance σ2N Population sizing scales: O(l0.5 log l) [Harik, et al, 1997] 16
Compact GA Convergence Time Problem size (m·k ) Selection Intensity Convergence time scales: O(m0.5) GA scales as: O(m log m) [Miller & Goldberg, 1995; Goldberg, 2002; Sastry & Goldberg, 2002] 17
Scalability on OneMax 18
EDA Solves Billion-Bit Noisy OneMax GA scales Θ(l·logl·(1+σ2N/σ2f)) Solved 33 million (225) bit problem to optimality. Solved 1.1 billion (230) bit problem with relaxed, but guaranteed convergence 19
Do Problems Like This Matter? Yes, for three reasons: Many GAs no more sophisticated than cGA. Inclusion of noise was important because it covers an important facet of difficulty. Know how to handle deception and other problems through EDAs like hBOA. Compact GA-like algorithms can solve tough problems: Material science [Sastry et al, 2004; Sastry et al, 2005]* Chemistry [Sastry et al, 2006]**. Complex versions of these kinds of problems need million/billion-bit optimization. *chosen by the AIP editors as focused article of frontier research in Virtual Journal of Nanoscale Science & Technology, 12(9), 2005; **Best paper and Silver “Humies” award. GECCO 2006] 20
Challenges to Routine Billion-Bit Optimization What if you have large nonlinear solver (PDE, ODE, FEM, KMC, MD, whatever)? Need efficiency enhancement: Parallelization: Effective use of computational “space” Time continuation: Effective use of computational “time” Hybridization: Effective use of global and local searchers Evaluation relaxation: Effective use of expensive-accurate & cheap-inaccurate evaluations Need more powerful solvers Need highly efficient implementations 21
Summary and Conclusions Parallel and efficient implementation of compact GA. Memory and computational efficiency enhancements Solved 33 million bit noisy OneMax problem to optimality Solved 1.1 billion bit noisy OneMax problem to relaxed, but guaranteed convergence Big optimization is a frontier today: Take extant cluster computing Mix in robustness, scalability and efficiency lessons Integrate into problems. Nano, bio, and info systems are increasingly complex: Call for routine mega/giga-variable optimization Need robust, scalable and efficient methods. 22
Towards billion bit optimization via parallel estimation of distribution algorithm (2007)
... A parallel algorithm ... Towards billion bit optimization via parallel estimation of distribution algorithm ... Towards billion bit optimization via ...
—Estimation of Distribution Algorithm ... Towards billion-bit optimization via a parallel estimation of distribution algorithm.
... of discrete estimation of distribution algorithms. ... Towards billion-bit optimization via a parallel estimation of distribution algorithm ...
Towards billion bit optimization via efficient genetic algorithms 15 February 2007 . Sastry, K., Goldberg, D. E., Llorà, X. (2007). IlliGAL Report No ...
Towards billion-bit optimization via a parallel estimation of distribution algorithm. Kumara Sastry, David E. Goldberg, Xavier Llor ...
... of Technology Kanpur,Artificial Intelligence,Engineering ... Towards billion-bit optimization via a parallel estimation of distribution algorithm ...
View Paul Winward's ... Towards billion-bit optimization via a parallel estimation of distribution algorithm ... Towards Billion Bit Optimization via ...
Towards billion-bit optimization via a parallel estimation of distribution ... Multiobjective genetic algorithms for ... Constrained Optimization ...
Bibliographic content of GECCO 2007. ... Towards billion-bit optimization via a parallel estimation of distribution algorithm.