G1GC

50 %
50 %
Information about G1GC

Published on January 6, 2017

Author: kojilin

Source: slideshare.net

1. G1GC 介紹 2016/12/24 kojilin

2. 關於我 •koji lin •TWJUG member •LINE Fukuoka •Software Engineer •現在主要都在寫後端

3. G1GC •Java SE 7 開始 •Java SE 9 成為預設的 GC (JEP 248) •減少 pause time •但可能伴隨較低的 throughput •比起 CMS,比較不需要去調整參數比較結果 •CMS 也被認為較不適合非常大 heap 的情況

4. •heap 用 region 分割 •region 的大小會根據 Xmx 變動

5. •region 有分 young gen 和 old gen O O O E E E S S

6. •Young GC 發生時 •包含所有的 young region O O O E E E S S

7. •Young GC 發生時 •依照狀況變成 Survivor 或 Old O O O S S S O

8. 51.078: [GC pause (G1 Evacuation Pause) (young) 51.078: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 0, predicted base time: 29.00 ms, remaining time: 171.00 ms, target pause time: 200.00 ms] 51.078: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1079 regions, survivors: 42 regions, predicted young region time: 170.99 ms] 51.078: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1079 regions, survivors: 42 regions, old: 0 regions, predicted pause time: 199.99 ms, target pause time: 200.00 ms] , 0.1092812 secs] … [Eden: 2158.0M(2158.0M)->0.0B(1796.0M) Survivors: 84.0M->102.0M Heap: 2242.0M(4096.0M)->102.0M(4096.0M)] [Times: user=0.35 sys=0.03, real=0.11 secs]

9. Collection Set •Young GC •包含所有 young 世代的 region

10. old gen > InitiatingHeapOccupancyPercent

11. InitiatingHeapOccupancyPercent •決定 concurrent marking 開始時機 •找出適合回收的 old gen region •-XX:InitiatingHeapOccupancyPercent=45 •超過總 heap 45% 時開始

12. 24963.770: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 3013607424 bytes, allocation request: 0 bytes, threshold: 3006477040 bytes (70.00 %), source: end of GC]

13. Concurrent Marking •Initial marking(STW) •Root region scanning •Concurrent marking •Final mark(STW) •Clean up(part STW)

14. Initial Marking •STW •用來 marking heap 內所有直接被 root objects 直 接參照到的物件 •跟著 young gc 一起做

15. Root region scanning •Concurrently •將前一步 young gc 後的 survivor regions 參照到 的物件 mark 起來 •所以被稱作 root regions •必須在下一次 gc 前完成,不然又會出現新的 survivor region

16. Concurrent Marking •Concurrently •Live data accounting •可以透過 ConcGCThreads 調整效率

17. Remark •STW •完成 live data accounting •可以透過 ParallelGCThreads 調整效率

18. Cleanup •STW •找出有沒有完全都是垃圾的 region •依照垃圾清除效率排序 old regions •Concurrent •清空沒有 live data 的 region

19. Concurrent Marking 2016-11-30T23:12:17.318+0900: 24975.061: [GC pause (G1 Evacuation Pause) (young) (initial-mark) … 24975.097: [GC concurrent-root-region-scan-start] 24975.115: [GC concurrent-root-region-scan-end, 0.0183740 secs] 24975.115: [GC concurrent-mark-start] 24975.544: [GC concurrent-mark-end, 0.4295200 secs] 24975.545: [GC remark 2016-11-30T23:12:17.803+0900: 24975.545: [Finalize Marking, 0.0004352 secs] 24975.586: [GC cleanup 2909M->2763M(4096M), 0.0184095 secs] 24975.604: [GC concurrent-cleanup-start] 24975.604: [GC concurrent-cleanup-end, 0.0002595 secs]

20. •Mixed GC 發生時 •All young gen + some old O O O S S S O E E E

21. •Mixed GC 發生時 •All young gen + some old O O O S S S O E E E

22. •Mixed GC 發生時 •All young gen + some old O S O S O SO

23. 24988.167: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 11490, predicted base time: 11.34 ms, remaining time: 188.66 ms, target pause time: 200.00 ms] 24988.167: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 92 regions, survivors: 10 regions, predicted young region time: 17.58 ms] 24988.167: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 1.28 ms, remaining time: 0.00 ms, old: 169 regions, min: 169 regions] 24988.167: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 169 regions, expensive: 26 regions, min: 169 regions, remaining time: 0.00 ms] 24988.167: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 92 regions, survivors: 10 regions, old: 169 regions, predicted pause time: 232.41 ms, target pause time: 200.00 ms] 24988.205: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 1179 regions, reclaimable: 2358748744 bytes (54.92 %), threshold: 4.00 %]

24. Collection Set •Young GC •包含所有 young 世代的 region •Mixed GC •包含所有 young 世代的 region 加上一部份垃圾 比例最多的 old 世代的 region

25. 25000.324: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 185 regions, reclaimable: 311893632 bytes (7.26 %), threshold: 4.00 %] … 25006.772: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 113 regions, reclaimable: 170151528 bytes (3.96 %), threshold: 4.00 %]

26. Parameters for mixed gc •InitiatingHeapOccupancyPercent = 45 •G1HeapWastePercent = 5 •可以浪費的比例,mixed gc 會決定如果垃圾沒超過這個比例就不做 •主要是為了讓 mixed collection cycle 可以快一點結束 •G1MixedGCCountTarget = 8 •決定一次 concurrent mark 後最多跑幾次 mixed gc •所有對象 old regions / 8 = 每次 mixed gc CSet 最小數量 •G1MixedGCLiveThresholdPercent = 85 •尋找預計回收 region 時,live data 低於此值才會被當作對象

27. Humongous Object •Object < 50% of region size •會放在 young region •Object >= 50% of region size •直接放在 old region,並且 region 內剩餘的空 間不會再被使用

28. [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: requested by GC cause, GC cause: G1 Humongous Allocation]

29. •加大 region size •但要注意因為加大而造成 region 數減少的效能 影響 •找出 root cause •代表有物件佔著連續的記憶體 •e.g. 很大的陣列,超多欄位的類別

30. Evacuation Failure •Survivor 或 old 不夠時 •to-space exhausted •出現數次後接著就會出現 Full GC •如果是 survivor •增加 G1ReservePercent

31. •-Xms16g -Xmx16g -XX:+UseG1GC •InitiatingHeapOccupancyPercent: 45 •G1HeapWastePercent: 5 •G1MixedGCLiveThresholdPercent: 85 •G1HeapRegionSize: 8MB

32. •-Xms16g -Xmx16g -XX:+UseG1GC •InitiatingHeapOccupancyPercent: 45 •G1HeapWastePercent: 5 •G1MixedGCLiveThresholdPercent: 85 •G1HeapRegionSize: 8MB 16GB x 0.45 = 7.2GB

33. •-Xms16g -Xmx16g -XX:+UseG1GC •InitiatingHeapOccupancyPercent: 45 •G1HeapWastePercent: 5 •G1MixedGCLiveThresholdPercent: 85 •G1HeapRegionSize: 8MB Full GC

34. •因為 old gen 擠壓到 young gen 的空間,young gen collection 惡化

35. •為什麼 Full GC 可以清到只剩 < 500MB ? 但是 mixed GC 無法? •加上 log 觀察 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC -XX:+PrintGCDetails

36. 32631.357: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 7725907968 bytes, allocation request: 10047216 bytes, threshold: 7730941095 bytes (45.00 %), source: concurrent humongous allocation] .... 32654.979: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 457 regions, reclaimable: 2956666176 bytes (17.21 %), threshold: 5.00 %], 0.1106810 secs] .... [Eden: 6680.0M(6680.0M)->0.0B(536.0M) Survivors: 344.0M->280.0M Heap: 14.0G(16.0G)->7606.6M(16.0G)] [Times: user=2.31 sys=0.01, real=0.11 secs] [GC pause (G1 Evacuation Pause) (mixed) 32656.876: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: old CSet region num reached max, old: 205 regions, max: 205 regions] 32656.876: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 67 regions, survivors: 35 regions, old: 205 regions, predicted pause time: 173.84 ms, target pause time: 200.00 ms] 32656.992: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 252 regions, reclaimable: 1321193600 bytes (7.69 %), threshold: 5.00 %] [Eden: 536.0M(536.0M)->0.0B(720.0M) Survivors: 280.0M->96.0M Heap: 8142.6M(16.0G)->6029.9M(16.0G)] [Times: user=2.49 sys=0.01, real=0.12 secs] [GC pause (G1 Evacuation Pause) (mixed) ... 32659.727: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 66 regions, max: 205 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %] 32659.727: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 90 regions, survivors: 12 regions, old: 66 regions, predicted pause time: 120.51 ms, target pause time: 200.00 ms]

37. ... 32659.727: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 66 regions, max: 205 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %] 32659.727: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 90 regions, survivors: 12 regions, old: 66 regions, predicted pause time: 120.51 ms, target pause time: 200.00 ms] 32659.785: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 186 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %] [Eden: 720.0M(720.0M)->0.0B(9064.0M) Survivors: 96.0M->64.0M Heap: 6749.9M(16.0G)->5572.0M(16.0G)] [Times: user=1.20 sys=0.00, real=0.06 secs]

38. •發現 humongous allocation,尤其在逼近 old gen 使用到近滿時會頻繁觸發 mixed GC. •找到沒設計好的 http rest api,會取得完整後先 轉換成 byte[] -> json. •底層使用 okio 的 Buffer,會將大的 byte[] 轉換 成 byte[] 的 linked list. •但仍不是 root cause.

39. •調整看看 VM 參數 •InitiatingHeapOccupancyPercent = 30 •G1HeapWastePercent = 3 •G1MixedGCLiveThresholdPercent = 90 •SoftRefLRUPolicyMSPerMB = 50 •不太有用而且因為提早做而降低了 throughput •超過 30 % 後總是觸發 concurrent marking

40. •最後沒招了,heap dump 看看

41. •最後沒招了,heap dump 看看

42. •在我們的環境下,問題出在 lettuce 使用的 LatencyUtils 的 LatencyStats & PauseTracker(Weakreference) •lettuce 使用 LatencyUtils 來做 command tracking ,並且每 10 分鐘會清掉過去的 LatencyStats 的 reference *https://github.com/mp911de/lettuce/wiki/Command-Latency-Metrics

43. 該不該用 G1GC ? •照理上沒問題,但 ElasticSearch & Lucene 尚未 建議切換 •G1GC 有些 bug 會造成 index 毀損 •套用後還是要記得觀察結果 •避免過度的調整參數 •盡量使用 MaxGCPausMillis,Xmx,Xms

44. Java 9 •預設就會是 G1GC •以前的 JVM log 被廢除,參數跟格式都改變 •-Xlog:[<selection>]:[<output>]:[<decorators>] e.g. -Xlog:gc*:file=gc.log

Add a comment