So You Want To Write Your Own Benchmark

50 %
50 %
Information about So You Want To Write Your Own Benchmark

Published on December 25, 2008

Author: drorbr



Performance has always been a major concern in software development and should not be taken lightly even when commodity computers have multicore CPUs and a few gigabytes of RAM. One of the most handy, simple tools for performance testing are microbenchmarks. Unfortunately, developing correct Java microbenchmarks is a complex task with many pitfalls on the way. This presentation is about the Do's and Don'ts of Java microbenchmarking and about what tools are out there to help with this tricky task.

yo ur y r ite w ark tsk to m zni er e a nt ch Dr or B w en ou rob y ic So m n ow 8t h2 008 er 1 mb Dece

Agenda • Introduction • Java™ micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 2

Microbenchmark – simple definition 1. Start the 2. Run the code 3. Stop the 4. Report clock clock 3

Better microbenchmark definition • Small program • Goal: Measure something about a few lines of code • All other variables should be removed • Returns some kind of a numeric result 4

Why do I need microbenchmarks? • Discover something about my code: • How fast is it • Calculate throughput – TPS, KB/s • Measure the result of changing my code: • Should I replace a HashMap with a TreeMap? • What is the cost of synchronizing a method? 5

Why are you talking about this? • It’s hard to write a robust microbenchmark • it’s even harder to do it in Java™ • There are not enough Java microbenchmarking tools • There are too many flawed microbenchmarks out there 6

Agenda • Introduction • Java micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 7

A microbenchmark story: the problem The boss asks you to solve a performance issue in one of the components Blah, blah … 8

A microbenchmark story: the cause You find out that the cause is excessive use of Math.sqrt() 9

A microbenchmark story: a solution? • You decide to develop a state of the art square root approximation • After developing the square root approximation you want to benchmark it against the java.lang.Math implementation 10

SQRT approximation microbenchmark Let’s run this little piece of code in a loop and see what happens … public static void main(String[] args) { long start = System.currentTimeMillis(); // start the clock for (double i = 0; i < 10 * 1000 * 1000; i++) { mySqrt(i); // little piece of code } long end = System.currentTimeMillis(); // stop the clock long duration = end - start; System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } 11

SQRT microbenchmark results Wow, this is really fast ! Test duration: 0 (ms) 12

Flawed microbenchmark 13

SQRT microbenchmark: what’s wrong? Dynamic optimizations Garbage collection Dead code elimination The Java™ HotSpot virtual machine Classloading Dynamic Compilation On Stack Replacement 14

The HotSpot: a mixed mode system 2 Code is 1 interpreted Profiling 3 Interpreted again Dynamic or recompiled Compilation 5 Stuff 4 Happen 15

Dynamic compilation • Dynamic compilation is unpredictable • Don’t know when the compiler will run • Don’t know how long the compiler will run • Same code may be compiled more than once • The JVM can switch to compiled code at will 16

Dynamic compilation cont. • Dynamic compilation can seriously influence microbenchmark results Continuous recompilation Steady-state Interpreted execution + Compiled / Interpreted code Dynamic compilation + ≠ execution Compiled code execution 17

Dynamic optimizations • The HotSpot server compiler performs large variety of optimizations: • loop unrolling • range check elimination • dead-code elimination • code hoisting … 18

Code hoisting ? Did he just said “code hoisting”? 19

What the heck is code hoisting ? • Hoist = to raise or lift • Size optimization • Eliminate duplicated pieces of code in method bodies by hoisting expressions or statements 20

Code hoisting example a + b is a busy After hoisting the expression expression a + b. A new local variable t has been introduced Optimizing Java for Size: Compiler Techniques for Code Compaction, Samuli Heilala 21

Dynamic optimizations cont. • Most of the optimizations are performed at runtime • Profiling data is used by the compiler to improve optimization decisions • You don’t have access to the dynamically compiled code 22

Example: Very fast square root? 10,000,000 calls to Math.sqrt() ~ 4 ms public static void main(String[] args) { long start = System.nanoTime(); int result = 0; for (int i = 0; i < 10 * 1000 * 1000; i++) { result += Math.sqrt(i); } long duration = (System.nanoTime() - start) / 1000000; System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } 23

Example: not so fast? Now it takes ~ 2000 ms ?!? public static void main(String[] args) { long start = System.nanoTime(); int result = 0; for (int i = 0; i < 10 * 1000 * 1000; i++) { result += Math.sqrt(i); Single line of code } added System.out.format(quot;Result: %d %nquot;, result); long duration = (System.nanoTime() - start) / 1000000; System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } 24

DCE - Dead Code Elimination • Dead code - code that has no effect on the outcome of the program execution public static void main(String[] args) { long start = System.nanoTime(); int result = 0; for (int i = 0; i < 10 * 1000 * 1000; i++) { result += Math.sqrt(i); } Dead Code long duration = (System.nanoTime() - start) / 1000000; System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } 25

OSR - On Stack Replacement • Methods are HOT if they cumulatively execute more than 10,000 of loop iterations • Older JVM versions did not switch to the compiled version until the method exited and was re-entered • OSR - switch from interpretation to compiled code in the middle of a loop 26

OSR and microbenchmarking • OSR’d code may be less performant • Some optimizations are not performed • OSR usually happen when you put everything into one long method • Developers tend to write long main() methods when benchmarking • Real life applications are hopefully divided into more fine grained methods 27

Classloading • Classes are usually loaded only when they are first used • Class loading takes time • I/O • Parsing • Verification • May flow your benchmark results 28

Garbage Collection • JVM automatically claim resources by • Garbage collection • Objects finalization • Outside of developer’s control • Unpredictable • Should be measured if invoked as a result of the benchmarked code 29

Time measurement How long is one millisecond? public static void main(String[] args) throws InterruptedException { long start = System.currentTimeMillis(); Thread.sleep(1); final long end = System.currentTimeMillis(); final long duration = (end - start); System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } Test duration: 16 (ms) 30

System.curremtTimeMillis() • Accuracy varies with platform Resolution Platform Source 55 ms Windows 95/98 Java Glossary 10 – 15 ms Windows NT, 2K, XP, 2003 David Holmes 1 ms Mac OS X Java Glossary 1 ms Linux – 2.6 kernel Markus Kobler 31

Wrong target platform • Choosing the wrong platform for your microbenchmark • Benchmarking on Windows when your target platform is Linux • Benchmarking a highly threaded application on a single core machine • Benchmarking on a Sun JVM when the target platform is Oracle (BEA) JRockit 32

Caching • Caching • Hardware – CPU caching • Operating System – File system caching • Database – query caching 33

Caching: CPU L1 and L2 caches • The more the data accessed are far from the CPU, the more the delays are high • Size of dataset affects access cost Array size Time (us) Cost (ns) 16k 413451 9.821 8192K 5743812 136.446 Jcachev2 results for Intel® core™2 duo T8300, L1 = 32 KB, L2 = 3 MB 34

Busy environment • Running in a busy environment – CPU, IO, Memory 35

Agenda • Introduction • Java micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 36

Warm-up your code 37

Warm-up up your code • Let the JVM reach steady state execution profile before you start benchmarking • All classes should be loaded before benchmarking • Usually executing your code for ~10 seconds should be enough 38

Warm-up up your code – cont. • Detect JIT compilations by using • CompilationMXBean. getTotalCompilationTime() • -XX:+PrintCompilation • Measure classloading time • Use the ClassLoadingMXBean 39

CompilationMXBean usage import; import; long compilationTimeTotal; CompilationMXBean compBean = ManagementFactory.getCompilationMXBean(); if (compBean.isCompilationTimeMonitoringSupported()) compilationTimeTotal = compBean.getTotalCompilationTime(); 40

Dynamic optimizations • Avoid on stack replacement • Don’t put all your benchmark code in one big main() method • Avoid dead code elimination • Print the final result • Report unreasonable speedups 41

Garbage Collection • Measure garbage collection time • Force garbage collection and finalization before benchmarking • Perform enough iteration to reach garbage collection steady state • Gather gc stats: -XX:PrintGCTimeStamps -XX:PrintGCDetails 42

Time measurement • Use System.nanoTime() • Microseconds accuracy on modern operating systems and hardware • Not worse than currentTimeMillis() • Notice: Windows users • executes in microseconds • don’t overuse ! 43

JVM configuration • Use similar JVM options to your target environment: • -server or –client JVM • Enough heap space (-Xmx) • Garbage collection options • Thread stack size (-Xss) • JIT compiling options 44

Other issues • Use fixed size data sets • Too large data sets can cause L1 cache blowout • Notice system load • Don’t play GTA while benchmarking ! 45

Agenda • Introduction • Java micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 46

Java™ benchmarking tools • Various specialized benchmarks • SPECjAppServer ® • SPECjvm™ • CaffeineMark 3.0™ • SciMark 2.0 • Only a few benchmarking frameworks 47

Japex Micro-Benchmark framework • Similar in spirit to JUnit • Measures throughput – work over time • Transactions Per Second (Default) • KBs per second • XML based configuration • XML/HTML reports 48

Japex: Drivers • Encapsulates knowledge about a specific algorithm implementation • Must extend JapexDriverBase public interface JapexDriver extends Runnable { public void initializeDriver(); public void prepare(TestCase testCase); public void warmup(TestCase testCase); public void run(TestCase testCase); public void finish(TestCase testCase); public void terminateDriver(); } 49

Japex: Writing your own driver public class SqrtNewtonApproxDriver extends JapexDriverBase { private long tmp; … @Override public void warmup(TestCase testCase) { tmp += sqrt(getNextRandomNumber()); } … } 50

Japex: Test suite <testSuite name=quot;SQRT Test Suitequot; xmlns= …> <param name=quot;libraryDirquot; value=quot;C:/java/japex/libquot;/> <param name=quot;japex.classPathquot; value=quot;./target/classesquot;/> <param name=quot;japex.runIterationsquot; value=quot;1000000quot; /> <driver name=quot;SqrtApproxNewtonDriverquot;> <param name=quot;Descriptionquot; value=quot;Newton Driverquot;/> <param name=quot;japex.driverClass“ value=quot;com.alphacsp.javaedge.benchmark. japex.driver.SqrtNewtonApproxDriverquot;/> </driver> <testCase name=quot;testcase1quot;/> </testSuite> 51

Japex: HTML Reports 52

Japex: more chart types Scatter chart Line chart 53

Japex: pros and cons • Pros • Similar to JUnit • Nice HTML reports • Cons • Last stable release on March 2007 • HotSpot issues are not handled • XML configuration 54

Brent Boyer’s Benchmark framework • Part of the “Robust Java benchmarking” article by Brent Boyer • Automate as many aspects as possible: • Resource reclamation • Class loading • Dead code elimination • Statistics 55

Benchmark framework example Benchmark.Params params = new Benchmark.Params(true); params.setExecutionTimeGoal(0.5); params.setNumberMeasurements(50); Runnable task = new Runnable() { public void run() { sqrt(getNextRandomNumber()); } }; Benchmark benchmark = new Benchmark(task, params); System.out.println(benchmark.toString()); 56

Benchmark single line summary Benchmark output: first = 25.702 us, mean = 91.070 ns (CI deltas: -115.591 ps, +171.423 ps) sd = 1.451 us (CI deltas: -461.523 ns, +676.964 ns) WARNING: execution times have mild outliers, SD VALUES MAY BE INACCURATE 57

Outlier and serial correlation issues • Records outlier and serial correlation issues • Outliers indicate that a major measurement error happened • Large outliers - some other activity started on the computer during measurement • Small outliers might hint that DCE occurred • Serial correlation indicates that the JVM has not reached its steady-state performance profile 58

Benchmark : pros and cons • Pros • Handles HotSpot related issues • Detailed statistics • Cons • Each run takes a lot of time • Not a formal project • Lacks documentation 59

Agenda • Introduction • Java micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 60

Summary 1 • Micro benchmarking is hard when it comes to Java™ • Define what you want to measure and how want to do it, pick your goals • Know what you are doing • Always warm-up your code • Handle DCE, OSR, GC issues • Use fixed size data sets and fixed work 61

Summary 2 • Do not rely solely on microbenchmark results • Sanity check results • Use a profiler • Test your code in real life scenarios under realistic load (macro-benchmark) 62

Summary: resources • y/j-benchmark1.html • 02/microbenchmarks.pdf • • y/j-jtp12214/ • 63

Thank You ! 64

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

linux - How can I benchmark my HDD? - Unix & Linux Stack ...

I usually use hdparm to benchmark my HDD's. You can benchmark both the direct reads and the cached reads. You'll want to run the commands a couple of times ...
Read more

How to Properly Benchmark Your PC - Maximum PC

So you’ve found a benchmark actually works for ... That’s plenty of time to run all the benchmarks you want. ... If you’re testing your own personal ...
Read more

Robust Java benchmarking, Part 1: Issues - IBM - United States

It lays out the terrain you need to cover if you want to write your own benchmarking ... If you want to benchmark random ... so you need to choose a ...
Read more

Geocaching > Benchmark Hunting

... use one of these programs to generate the coordinates of the azimuth station and write them on your ... so that other benchmark ... If you want to know ...
Read more

Get Ahead of Your Competitors with SurveyMonkey Benchmarks ...

Now get industry benchmarks right alongside your own ... Your Competitors with SurveyMonkey Benchmarks. ... you edit a question you want to benchmark, ...
Read more


PolePosition is a benchmark test suite ... the best thing you can do is to write your own ... that you may want to customize: org.polepos.RunSeason ...
Read more

PC Benchmark Tests: What Are They, And Do They Actually ...

How To Benchmark Your Own Gear. If you want to try it ... and so enables you to ... It’s perfectly acceptable to establish your own benchmark if they ...
Read more

So, You Want To Write! | with AnnMcIndoo, Your Author's Coach

So, You Want to Write! Process & Products; About Ann; Author’s Online Course; Author’s Business Mastermind; Author’s TeleSeminar; Author’s Showcase ...
Read more

MeasureIt Update: Tool for doing MicroBenchmarks for .NET ...

In this article I argued that if you want to design ... It basically makes it easy to write benchmarks ... MeasureIt comes with its own ...
Read more