Advancing Fusion Science with CGYRO using GPU-based Leadership Systems

50 %
50 %
Information about Advancing Fusion Science with CGYRO using GPU-based Leadership Systems

Published on June 3, 2019

Author: insideHPC

Source: slideshare.net

1. Advancing Fusion Science with CGYRO using GPU-Based Leadership Systems by J. Candy1, I. Sfiligoi2 and E. Belli1. 1General Atomics, San Diego, CA 2San Diego Supercomputer Center, San Diego CA Presented at GTC 2019 San Jose, CA 18-21 March 2019 ID: S9202 1 Candy/GTC/March 2019/S9202

2. Sincere thanks to • Chris Holland (UCSD) • Orso Meneghini, Sterling Smith, Ron Waltz, Gary Staebler (GA) • Nathan Howard, Alessandro Marinoni (MIT) • Walter Guttenfelder, Brian Grierson (PPPL) • George Fann (ORNL) • Klaus Hallatschek (IPP, Germany) 2 Candy/GTC/March 2019/S9202

3. OUTLINE 1 Who is General Atomics? 3 Candy/GTC/March 2019/S9202

4. OUTLINE 1 Who is General Atomics? 2 The case for fusion energy 4 Candy/GTC/March 2019/S9202

5. OUTLINE 1 Who is General Atomics? 2 The case for fusion energy 3 Mathematical formulation and GPU-based numerical solution 5 Candy/GTC/March 2019/S9202

6. OUTLINE 1 Who is General Atomics? 2 The case for fusion energy 3 Mathematical formulation and GPU-based numerical solution 4 Simulation of turbulent energy loss in a tokamak plasma 6 Candy/GTC/March 2019/S9202

7. OUTLINE 1 Who is General Atomics? 2 The case for fusion energy 3 Mathematical formulation and GPU-based numerical solution 4 Simulation of turbulent energy loss in a tokamak plasma 5 GPU performance: development and results 7 Candy/GTC/March 2019/S9202

8. Who is General Atomics? 8 Candy/GTC/March 2019/S9202

9. Who is General Atomics? 1 General Atomics (GA) is a private contractor in San Diego 9 Candy/GTC/March 2019/S9202

10. Who is General Atomics? 1 General Atomics (GA) is a private contractor in San Diego 2 The GA Magnetic Fusion division does DOE-funded research 10 Candy/GTC/March 2019/S9202

11. Who is General Atomics? 1 General Atomics (GA) is a private contractor in San Diego 2 The GA Magnetic Fusion division does DOE-funded research 3 Hosts DIII-D National Fusion Facility 11 Candy/GTC/March 2019/S9202

12. Founded on July 18, 1955 (photo 1957) The General Atomic Division of General Dynamics 12 Candy/GTC/March 2019/S9202

13. Laboratory formally dedicated on June 25th, 1959 John Jay Hopkins Laboratory for Pure and Applied Science 13 Candy/GTC/March 2019/S9202

14. Present-day Campus (2019) Retains feel of early architecture 14 Candy/GTC/March 2019/S9202

15. Doublet III (1974) 15 Candy/GTC/March 2019/S9202

16. DIII-D (Present day) 16 Candy/GTC/March 2019/S9202

17. The case for fusion energy 17 Candy/GTC/March 2019/S9202

18. Energy Use by Technology and Year energy.mit.edu/news/limiting-global-warming-aggressive-measures-needed 18 Candy/GTC/March 2019/S9202

19. Surface Temperature Anomaly energy.mit.edu/news/limiting-global-warming-aggressive-measures-needed 19 Candy/GTC/March 2019/S9202

20. Plasma theory in closed fieldline region well-understood 20 Candy/GTC/March 2019/S9202

21. Helical field perfectly confines plasma (almost) 21 Candy/GTC/March 2019/S9202

22. There is a small amount of radial energy/particle loss • Collisions (1970s): Γcollision • Turbulence (1980s): Γturbulence • Both exhibit gyroBohm scaling flux Γ ∼ v(ρ/a)2 confinement time τ = a Γ ∼ a3 vρ2 • a = torus radius • ρ = particle orbit size • v = particle velocity 22 Candy/GTC/March 2019/S9202

23. Tokamak physics spans multiple space/timescales Core-edge-SOL (CESOL) region coupling Ψ Profile Core Edge SOL CESOL 23 Candy/GTC/March 2019/S9202

24. Tokamak confinement improves with LARGE PLASMA VOLUME 24 Candy/GTC/March 2019/S9202

25. ITER Facility (35 nations) under construction in France GOAL: Simulate turbulent plasma in core (magenta) region 25 Candy/GTC/March 2019/S9202

26. Mathematical formulation and GPU-based numerical solution 26 Candy/GTC/March 2019/S9202

27. Gyrokinetic Theory for Magnetized Plasma The Cooper/Kripke Inversion 27 Candy/GTC/March 2019/S9202

28. Gyrokinetic equation for plasma species a Typically: a = (deuterium, carbon, electron) ∂ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗ Ψa + ΩNL( ha , Ψa ) = Ca Symbol definitions particles Ha = ha + zaTe Ta Ψa 28 Candy/GTC/March 2019/S9202

29. Gyrokinetic equation for plasma species a Typically: a = (deuterium, carbon, electron) ∂ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗ Ψa + ΩNL( ha , Ψa ) = Ca Symbol definitions particles Ha = ha + zaTe Ta Ψa fields Ψa = J0(γa) δφ − v c δA + v2 ⊥ Ωcac J1(γa) γa δB 29 Candy/GTC/March 2019/S9202

30. Electromagnetic GK-Maxwell Equations Coupling to fields is a MAJOR complication! k2 ⊥λ2 D + a z2 a Te Ta d3 v f0a ne δφ = a za d3 v f0a ne J0(γa) Ha 2 βe,unit k2 ⊥ρ2 s δA = a za d3 v f0a ne v cs J0(γa) Ha − 2 βe,unit B Bunit δB = a d3 v f0a ne mav2 ⊥ Te J1 (γa) γa Ha 30 Candy/GTC/March 2019/S9202

31. Gyrokinetic equation for plasma species a Typically, deuterium, some carbon, and electrons ∂ha ∂τ − i ΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗Ψa + ΩNL(ha, Ψa) = Ca E×B flow −iΩs = −i kθL 2π a cs γE 31 Candy/GTC/March 2019/S9202

32. Gyrokinetic equation for plasma species a Typically, deuterium, some carbon, and electrons ∂ha ∂τ − iΩsX ha − i Ωθ + Ωξ + Ωd Ha − iΩ∗Ψa + ΩNL(ha, Ψa) = Ca Streaming −iΩθ = v ws ∂ ∂θ 32 Candy/GTC/March 2019/S9202

33. Gyrokinetic equation for plasma species a Typically, deuterium, some carbon, and electrons ∂ha ∂τ − iΩsX ha − i Ωθ + Ωξ + Ωd Ha − iΩ∗Ψa + ΩNL(ha, Ψa) = Ca Trapping −iΩξ = − vta ws ua √ 2 1 − ξ2 ∂ ln B ∂θ ∂ ∂ξ − 1 2ua ∂λa ∂θ v ws ∂ ∂ua + √ 2vta ws 1 − ξ2 ∂ ∂ξ 33 Candy/GTC/March 2019/S9202

34. Gyrokinetic equation for plasma species a Typically, deuterium, some carbon, and electrons ∂ha ∂τ − iΩsX ha − i Ωθ + Ωξ + Ωd Ha − iΩ∗Ψa + ΩNL(ha, Ψa) = Ca Drift motion −iΩd = a vta cs b × u2 a 1 + ξ2 B B + u2 aξ2 8π B2 ( p)eff · ik⊥ρa + Ma 2av csR0 b × R JψB ∂R ∂θ ϕ − Bt B R · ik⊥ρa + a cs b × − vta Ta Fc + c B Φ∗ · ik⊥ρa 34 Candy/GTC/March 2019/S9202

35. Gyrokinetic equation for plasma species a Typically, deuterium, some carbon, and electrons ∂ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − i Ω∗Ψa + ΩNL(ha, Ψa) = Ca Gradient drive −iΩ∗ = a Lna + a LTa u2 a − 3 2 + γpv a v2 ta RBt R0B ikθρs + a LTa zae Ta Φ∗ − M2 a 2R2 0 R2 − R(θ0)2 +M2 a aR(θ0) R2 0 dR(θ0) dr + Maγp a vtaR2 0 R2 − R(θ0)2 ikθρs 35 Candy/GTC/March 2019/S9202

36. Gyrokinetic equation for plasma species a Typically, deuterium, some carbon, and electrons ∂ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗Ψa + ΩNL(ha, Ψa) = Ca Nonlinearity ΩNL(ha, Ψa) = acs ΩcD k⊥+k⊥=k⊥ b · k⊥ × k⊥ Ψa(k⊥)ha(k⊥) 36 Candy/GTC/March 2019/S9202

37. Gyrokinetic equation for plasma species a Typically, deuterium, some carbon, and electrons ∂ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗Ψa + ΩNL(ha, Ψa) = Ca Cross-species collision operator Ca = b CL ab Ha, Hb CL ab(Ha, Hb) = νD ab 2 ∂ ∂ξ 1 − ξ2 ∂Ha ∂ξ + 1 v2 ∂ ∂v νab 2 v4 ∂Ha ∂v + ma Tb v5 Ha −Hak2 ⊥ρ2 a v2 4v2 ta νD ab 1 + ξ2 + νab 1 − ξ2 + Rmom(Hb) + Rene(Hb) 37 Candy/GTC/March 2019/S9202

38. Sonic Transport Fluxes These are inputs to an independent TRANSPORT CODE particle flux Γa = k⊥ d3 v H∗ a c1aΨa energy flux Qa = k⊥ d3 v H∗ a c2aΨa momentum flux Πa = k⊥ d3 v H∗ a c3aΨa 38 Candy/GTC/March 2019/S9202

39. What do we solve for 5-dimensional distribution for every plasma species Six-dimensional array (mapped into internal 2D array in CGYRO) Ha(kx, ky, θ, ξ, v 5D mesh , t) The spatial coordinates are kx −→ radial wavenumbers ky −→ binormal wavenumbers θ −→ field-line coordinate The velocity-space coordinates are ξ = v /v −→ cosine of the pitch angle ∈ [−1, 1] v −→ speed ∈ [0, ∞] . 39 Candy/GTC/March 2019/S9202

40. Visual representation of computational mesh k0 x 1024 ky 256 θ 32 k0 x 128 ky 32 deuterium (a = 1) carbon (a = 2) electron (a = 3) ξ 24 v 8 velocity-space mesh ion-scale mesh multiscale mesh 40 Candy/GTC/March 2019/S9202

41. CGYRO optimized for challenging multiscale turbulence COMPLETE REDESIGN of world-renowned GYRO code 41 Candy/GTC/March 2019/S9202

42. Simulation of turbulent energy loss in a tokamak plasma 42 Candy/GTC/March 2019/S9202

43. CGYRO computes the turbulent flux DIII-D Tokamak at General Atomics in San Diego, CA 43 Candy/GTC/March 2019/S9202

44. CGYRO computes the turbulent flux DIII-D Tokamak at General Atomics in San Diego, CA 44 Candy/GTC/March 2019/S9202

45. Multiscale DIII-D Simulation at r/a = 0.92 ITER baseline discharge (Haskey, Grierson) 164988 0 5 10 15 20 25 30 kyρs 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 FractionalQe long-wavelength(global,full-F,etc)regime Resolution kxρs 124.0 , kyρs 31.8 Time 9 hrs on 32K cores Qi/QGB Qe/QGB pwrbal 2.5 8.2 NEO 2.7 0.0 CGYRO 0.0 8.0 45 Candy/GTC/March 2019/S9202

46. Simulation underway on Titan (NCCS) 4986 nodes = 4986 Tesla K20X GPUs 46 Candy/GTC/March 2019/S9202

47. Important locations for CGYRO Source code github.com/gafusion/gacode DOI www.osti.gov/doecode/biblio/20298 User Documentation gafusion.github.io/doc Documentary Video (for GYRO) www.youtube.com/watch?v=RLI6QW2x4Lg 47 Candy/GTC/March 2019/S9202

48. Fidelity Hierarchy (Pyramid) Range of models all the way up to leadership codes Leadership-class computing highest fidelity simulations Calibrate Reduced models for validation Machine-learning models for optimization & real-time control Train One-off heroic simulation Inform Inform Physics Validation Physics Application Physics Development 48 Candy/GTC/March 2019/S9202

49. Create TGLF-NN neural net from TGLF reduced model • 23 inputs → 4 outputs • Each dataset has 500K cases from 2300 multi-machine discharges • Trained with TENSORFLOW • Must be retrained as TGLF model is updated • TGLF itself derived from HPC CGYRO simulation ExB 49 Candy/GTC/March 2019/S9202

50. GPU performance: development and results 50 Candy/GTC/March 2019/S9202

51. CGYRO: Roadmap for efficient GPU implementation 1 Numerical algorithms selected to allow intensive threading/acceleration − Nonlinearity (nl) = FFT − Collisions (coll) = Matrix-vector multiply 51 Candy/GTC/March 2019/S9202

52. CGYRO: Roadmap for efficient GPU implementation 1 Numerical algorithms selected to allow intensive threading/acceleration − Nonlinearity (nl) = FFT − Collisions (coll) = Matrix-vector multiply 2 Key kernels have threaded (default) and accelerated variations − Smart loop order and good memory management keeps kernels similar 52 Candy/GTC/March 2019/S9202

53. CGYRO: Roadmap for efficient GPU implementation 1 Numerical algorithms selected to allow intensive threading/acceleration − Nonlinearity (nl) = FFT − Collisions (coll) = Matrix-vector multiply 2 Key kernels have threaded (default) and accelerated variations − Smart loop order and good memory management keeps kernels similar 3 Implemented GPU-aware MPI (utilizes GPUDirect and GPU-Infiniband RDMA) 53 Candy/GTC/March 2019/S9202

54. Initial thought was that nonlinearity (nl) would dominate 54 Candy/GTC/March 2019/S9202

55. Acceleration of nl exposed cost of other kernels Titan K20 GPU too small to store collision matrix 55 Candy/GTC/March 2019/S9202

56. CGYRO: Roadmap for efficient GPU implementation 1 Numerical algorithms selected to allow intensive threading/acceleration − Nonlinearity (nl) = FFT − Collisions (coll) = Matrix-vector multiply 2 Key kernels have threaded (default) and accelerated variations − Smart loop order and good memory management keeps kernels similar 3 Implemented GPU-aware MPI (utilizes GPUDirect and GPU-Infiniband RDMA) 56 Candy/GTC/March 2019/S9202

57. CGYRO: Roadmap for efficient GPU implementation !$acc loop seq do ivp=1,nv cvec_re = real(cvec(ivp)) cvec_im = aimag(cvec(ivp)) !$acc loop vector do iv=1,nv cval = cmat(iv,ivp,ic_loc) bvec(iv) = bvec(iv) + cmplx(cval*cvec_re,cval*cvec_im) enddo enddo 57 Candy/GTC/March 2019/S9202

58. CGYRO: Roadmap for efficient GPU implementation #ifdef DISABLE_GPUDIRECT_MPI !$acc update host(fsendr) #else !$acc host_data use_device(fsendr,f) #endif call MPI_ALLTOALL(fsendr,nsend,MPI_DOUBLE_COMPLEX, & f, nsend,MPI_DOUBLE_COMPLEX,lib_comm,ierr) #ifdef DISABLE_GPUDIRECT_MPI !$acc update device(f) #else !$acc end host_data #endif 58 Candy/GTC/March 2019/S9202

59. Power9 (CPU) versus Power9 + 4X V100 (GPU) 59 Candy/GTC/March 2019/S9202

60. CPU systems versus 4X V100 60 Candy/GTC/March 2019/S9202

61. GPU type comparison Stampede2, GA, Piz Daint, Titan 61 Candy/GTC/March 2019/S9202

62. Google Cloud Partition Comparison Santa Fe (last week) 62 Candy/GTC/March 2019/S9202

63. Cloud V100 compared to Summit and Cori 63 Candy/GTC/March 2019/S9202

64. OUTLINE 1 History of General Atomics? 2 The case for fusion energy 3 Mathematical formulation and GPU-based numerical solution 4 Simulation of turbulent energy loss in a tokamak plasma 5 GPU performance: development and results 64 Candy/GTC/March 2019/S9202

65. Disclaimer This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof, or those of the European Commission. 65 Candy/GTC/March 2019/S9202

#ifdef presentations

Add a comment