Published on June 16, 2008
Molecular Models, Threads and You Optimizing the TINKER classical molecular dynamics code while maintaining code readability Jiahao Chen Martínez Group Dept. Chemistry, CATMS, MRL and Beckman CS 498 MG presentation: 2007-12-07
Molecular models/force ﬁelds Typical energy function E = covalent bond effects + noncovalent interactions
Molecular models/force ﬁelds Typical energy function E= kb (rb − req,b )2+ κa (θa − θeq,a )2 + lnd cos (nπ) d∈dihedrals n a∈angles b∈bonds bond stretch angle torsion dihedrals + - 12 6 qi qj σij σij + + − ij rij rij rij i<j∈atoms i<j∈atoms electrostatics dispersion computation cost = O(N2)
Problem description • The state of the system is given by the position and momentum of every atom (of mass mi) (x1 , p1 , x2 , p2 , · · · , xN , pN ) ∈ R 3×2×N • Solve the system∂p partial differential equations of ∂x p ∂E i i i = =− , i = 1, · · · , N , ∂t mi ∂t ∂xi • with user-speciﬁed initial conditions (e.g. with constant temperature and pressure) • Subject to (user-speciﬁed) constraints, e.g. ﬁxed bond angles
Many parallel and serial implementations Global Package name Threads MPI Arrays NAMD CHARM++ GROMACS ✓ ✓ TINKER AMBER partly ✓ ✓ CHARMM ✓ LAMMPS ✓ NWChem ✓ ✓
Things I tried • Compiler ﬂags optimization • Cache miss reduction • Lookup tables • Parallelization with OpenMP
Compiler ﬂag optimization ﬂags gfortran 4.1.2 ifort 10.0.023 - - -O0 29.95(2) s 36.30(2) s 32.59(4) s -Os 29.92(3) s +0.77(3) % +10.22(2) % 32.12(3) s -O1 30.22(1) s -0.90(4) % +11.51(1) % -O2 29.66(3) s +0.96(1) % 30.30(2) s +16.54(2) % 30.83(2) s -O3 29.84(2) s +0.38(2) % +15.06(2) % +20.22(1)%2 CE search 28.77(2) s +3.62(3) %1 28.96(2) s 1. FFLAGS =”-falign-functions -falign-jumps -falign-labels -falign-loops -fvpt -fcse-skip-blocks -fdelete-null-pointer- checks -ffast-math -fforce-addr -fgcse -fgcse-lm -fgcse-sm -ﬂoop-optimize -fkeep-static-consts -fmerge-constants -fno- defer-pop -fno-guess-branch-probability -fno-math-errno -funsafe-math-optimizations -fno-trapping-math -foptimize- register-move -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop -fno-sched-spec -fsched-spec-load -fsched-stalled-insns -fsignaling-nans -fsingle-precision-constant -fstrength-reduce -fthread-jumps -funroll-all-loops” 2. FFLAGS =”-xN -no-prec-div -static -inline-level=1 -ip -fno-alias -fno-fnalias -fno-omit-frame-pointer -fkeep-static- consts -nolib-inline -heap-arrays 1 -pad -O3 -scalar-rep -funroll-loops -complex-limited-range”
Algorithm and time proﬁle N=6 for each time step gfortran 4.1.2 >98% Initialize Remove Move one model and unphysical Flush I/O End time step parameters motions O(N) O(N2) Update Calculate Update Calculate & record Enforce Enforce state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Calculate Calculate Calculate Calculate Calculate Add up all ... bond angle dihedral dispersion charge compo- interactions interactions interactions interactions interactions nents 9% 12% 8% 37% 26%
An unexpected cost for each time step N=6 Q: WhyRemove15% is >98% Initialize Move one model and unphysical Flush I/O End of total execution time step parameters motions O(N ) Text time spent adding Calculate & record O(N) 2 Update Calculate Update Enforce Enforce numbers!? state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Add up all Calculate Calculate Calculate Calculate Calculate ... compo- bond angle dihedral dispersion charge nents interactions interactions interactions interactions interactions 9% 12% 8% 37% 26%
A: many L2 cache misses c zero out each of the first derivative components 7 do i = 1, n do j = 1, 3 42 deb(j,i) = 0.0d0 22 other ... end do terms end do ... c sum up to get the total energy and first derivatives energy = eb + ... do i = 1, n do j = 1, 3 desum(j,i) = deb(j,i) + ... 22 other 19 terms 2 derivs(j,i) = desum(j,i) end do end do 70 of 91 cache misses per time step (n = 6) shown
A simple solution c zero out each of the first derivative components 7 do i = 1, n do j = 1, 3 26 42 deb(j,i) = 0.0d0 ... end do end do ... c sum up to get the total energy and first derivatives energy = eb + ... do i = 1, n do j = 1, 3 6 temp = deb(j,i) + ... 1 19 desum(j,i) = temp 12 derivs(j,i) = temp end do end do reduced cache misses from 92 to 41 per time step
Speedup from reducing L2 cache misses ﬂags gfortran 4.1.2 ifort 10.0.023 original 29.95(2) s 28.96(2) s with scalar 27.43(3) s 28.95(1) s replacement speedup +8.44(1) % +0.03(2) % ifort already called with scalar replacement ﬂag
Lookup tables (LUTs) • Calculations of sqrt() and exp() take up 23.8% of execution time • Idea: pre-compute values of sqrt() and exp() in an array and recall them from memory when needed • Caution: LUT should not displace too much data from L2 cache
sqrt() with LUT direct LUT LUT with linear interpolation
exp() with LUT LUT with ﬁrst-order Taylor direct LUT series reﬁnement* e =e + (x − x0 )e + O (x − x0 ) x x0 x0 2
Choice of implementation desired table expected function reﬁnement precision size speedup (doubl sqrt() 10 -4 10,764 none +118% es) exp() 10-8 6,836 Taylor +151% LUT aligned to 128-bits L2 cache = 4 MB = 512K doubles
Speedup from LUT use ﬂags gfortran 4.1.2 ifort 10.0.023 original 29.95(2) s 28.96(2) s with lookup tables 26.89(1) s 25.87(2) s speedup +10.23(2) % +7.22(3) %
Summary of serial improvements Improvement gfortran 4.1.2 ifort 10.0.023 Best compiler ﬂags +3.62(3) % +20.22(1) % L2 cache miss +8.44(2) % +0.03(1) % reduction Lookup tables +10.23(1) % +7.22(2) % 23.91(3) s 26.86(2) s Total +20.17(4) % +26.00(2) %
Parallelization targets for each time step N=6 >98% Initialize Remove Move one model and unphysical Flush I/O End time step parameters motions Text O(N) O(N2) Update Calculate Update Calculate & record Enforce Enforce state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Add up all Calculate Calculate Calculate Calculate Calculate ... compo- bond angle dihedral dispersion charge nents interactions interactions interactions interactions interactions 9% 12% 8% 37% 26%
Parallelization strategy Calculate potential energy omp sections and forces 100% omp section 50% omp section 50% Add up all Calculate Calculate Calculate Calculate Calculate ... compo- charge angle dihedral dispersion bond nents interactions interactions interactions interactions interactions 50% 16% 2% 12% 11% omp parallel do omp parallel do omp parallel do omp parallel do omp parallel do
Parallelization results gfortran 4.1.2 35 N=6 N=1000 Ideal 30 Execution time/s 25 20 15 10 # cores 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Summary • Free software can sometimes be better than non-free software • L2 cache misses can signiﬁcantly degrade performance • Lookup tables are an effective tradeoff between speed and memory vs. precision • Simple OpenMP parallelization is effective for small numbers of processors
Molecular models, threads and you Jan 27, 2015 Technology jiahao-chen ...
Molecular Models. We carry only name brand molecular models for organic & inorganic chemistry, biochemistry & lattice structures. Molymod & Orbit are 30 ...
Molecular model sets are a great way to visualize chemistry concepts in 3D! ... Building Molecular Models with Molecular Visions Models ...
The original dual-scale system of molecular models. Sets. students: designed for working individually or in groups. These two sets contain molydomes ...
Explore molecule shapes by building molecules in 3D! ... compare the model to real molecules! ... Molecular Geometry Flash Cards:
Molecular Model Building Instruction Manual Molecular Model of Caffeine ... When building a molecular model, you will often encounter molecules ...
Molecular modelling applet courtesy of ChemAxon Ltd . Creative Chemistry Molecular Models What is here? To help you ... If you can see the rotating model ...
Dr. David Austin teaches users how to make molecular models using the Molecular Visions molecular model kit
Post anything (from anywhere!), customize everything, and find and follow what you love. Create your own Tumblr blog today.
The official web site of molymod® and miniDNA® models, designed and exclusively made in England by Spiring Enterprises Limited.