JMH Core Benchmarks, Validation Tests ---------------------------------------------------------------------------------------------------------- # JMH 1.13 (released 9 days ago) # JDK 1.8.0_102, VM 25.102-b14 # Windows 10, amd64, 10.0 These tests assess the current benchmarking environment health, including hardware, OS, JVM, and JMH itself. While the failure on these tests does not immediately means the problem with environment, it is instructive to understand and follow up on oddities in these tests. If you are sharing this report, please share it in full, including the JVM version, OS flavor and version, plus some data on used hardware. Use -h to get help on available options. --------- TIMING MEASUREMENTS TEST This test shows the minimal individual timings possible to measure. This normally affects only SampleTime and SingleShot benchmark modes. Throughput/AverageTime tests can do better since they do only a few timestamps before and after the complete iteration. System.nanoTime latency: 11.63 ¦ 0.03 ns System.nanoTime granularity: 293.11 ¦ 0.15 ns --------- COMPILER HINTS TEST This tests verifies compiler hints are working as expected. Two baseline tests run the workload in inlined and non-inlined regiments. When the workload is inlined, the optimizations should kill the workload body. Compiler hints should successfully survive in both regiments: INLINE should always inline, and DONT_INLINE should always break inlining. EXCLUDE should be neutral to inlining policy completely. Default inline policy: 0.26 ¦ 0.00 ns + @CompilerControl(INLINE): 0.26 ¦ 0.00 ns + @CompilerControl(DONT_INLINE): 16.87 ¦ 0.01 ns + @CompilerControl(EXCLUDE): 28.23 ¦ 0.13 ns Default no inline policy: 16.87 ¦ 0.01 ns + @CompilerControl(INLINE): 0.26 ¦ 0.00 ns + @CompilerControl(DONT_INLINE): 16.87 ¦ 0.01 ns + @CompilerControl(EXCLUDE): 28.07 ¦ 0.04 ns --------- THERMAL RUNDOWN TEST This test tries to heat the machine up, trying to kick in the thermal throttling. If you see the diminishing performance over time, then your system throttles, and many benchmark experiments are unreliable. 1.84 ¦ 0.02 ms/op 1.85 ¦ 0.02 ms/op 1.82 ¦ 0.02 ms/op 1.83 ¦ 0.02 ms/op 1.83 ¦ 0.03 ms/op 1.82 ¦ 0.04 ms/op 1.83 ¦ 0.02 ms/op 1.83 ¦ 0.03 ms/op 1.82 ¦ 0.01 ms/op 1.83 ¦ 0.02 ms/op 1.82 ¦ 0.01 ms/op 1.82 ¦ 0.02 ms/op 1.83 ¦ 0.00 ms/op 1.83 ¦ 0.04 ms/op 1.83 ¦ 0.02 ms/op 1.83 ¦ 0.02 ms/op 1.83 ¦ 0.02 ms/op 1.82 ¦ 0.02 ms/op --------- SCORE STABILITY TEST This test verifies the performance for a large busy benchmark is the same, regardless of the benchmark mode, and delay before the iteration. The performance should be the same across all delay values, and comparable across different benchmark modes. If there is a significant difference on different delay levels, this is usually indicative of power-saving features enabled, making bursty benchmarks unreliable. Scores are milliseconds per benchmark operation, or the reciprocal to it. Delays are injected before each iteration, and are measured in milliseconds. 0 1 10 100 1000 Throughput: 0.56 ¦ 0.00 0.56 ¦ 0.00 0.56 ¦ 0.00 0.52 ¦ 0.01 0.51 ¦ 0.01 ops/ms AverageTime: 1.80 ¦ 0.00 1.80 ¦ 0.00 1.80 ¦ 0.00 1.96 ¦ 0.03 1.95 ¦ 0.05 ms/op SampleTime: 1.80 ¦ 0.00 1.80 ¦ 0.00 1.80 ¦ 0.00 1.97 ¦ 0.04 1.96 ¦ 0.03 ms/op SingleShotTime: 1.80 ¦ 0.01 1.81 ¦ 0.02 2.22 ¦ 0.64 8.30 ¦ 0.90 8.60 ¦ 0.65 ms/op --------- THREAD SCALING TEST This test verifies the performance when scaling in multiple threads. In Throughput mode, the benchmark should scale almost linearly, at least before the number of physical cores is reached. In other modes, the timings for individual ops should stay roughly the same, at least before the number of physical cores is reached. The departure from the expected behavior might be indicative of scheduling irregularities, power saving features being enabled, process affinity enforced in virtualized environments, etc. -- these may potentially disrupt multi-threaded benchmarks correctness. Scores are relative to a single-threaded case. Threads are scaled from 1 to the number of hardware threads. 1 2 4 Throughput: 1.00x ¦ 0.00x 2.00x ¦ 0.00x 3.95x ¦ 0.03x AverageTime: 1.00x ¦ 0.00x 1.00x ¦ 0.00x 1.01x ¦ 0.01x SampleTime: 1.00x ¦ 0.00x 1.00x ¦ 0.00x 1.02x ¦ 0.00x SingleShotTime: 1.00x ¦ 0.01x 1.00x ¦ 0.01x 1.00x ¦ 0.01x --------- HELPER METHOD TEST These tests show the overheads of using the benchmark methods. Normally, only Level.Invocation helpers should affect the benchmark performance, since other helpers execute outside the benchmark path. running empty benchmark: 0.26 ¦ 0.00 ns Scope.Benchmark, Level.Trial, @Setup: 0.26 ¦ 0.00 ns Scope.Benchmark, Level.Trial, @TearDown: 0.26 ¦ 0.00 ns Scope.Benchmark, Level.Iteration, @Setup: 0.26 ¦ 0.01 ns Scope.Benchmark, Level.Iteration, @TearDown: 0.26 ¦ 0.00 ns Scope.Benchmark, Level.Invocation, @Setup: 7.03 ¦ 0.95 ns Scope.Benchmark, Level.Invocation, @TearDown: 8.30 ¦ 3.42 ns Scope.Group, Level.Trial, @Setup: 0.27 ¦ 0.01 ns Scope.Group, Level.Trial, @TearDown: 0.26 ¦ 0.01 ns Scope.Group, Level.Iteration, @Setup: 0.26 ¦ 0.00 ns Scope.Group, Level.Iteration, @TearDown: 0.26 ¦ 0.00 ns Scope.Group, Level.Invocation, @Setup: 10.58 ¦ 0.60 ns Scope.Group, Level.Invocation, @TearDown: 9.88 ¦ 2.53 ns Scope.Thread, Level.Trial, @Setup: 0.26 ¦ 0.00 ns Scope.Thread, Level.Trial, @TearDown: 0.26 ¦ 0.00 ns Scope.Thread, Level.Iteration, @Setup: 0.26 ¦ 0.00 ns Scope.Thread, Level.Iteration, @TearDown: 0.26 ¦ 0.01 ns Scope.Thread, Level.Invocation, @Setup: 15.16 ¦ 0.13 ns Scope.Thread, Level.Invocation, @TearDown: 15.02 ¦ 0.10 ns --------- BLACKHOLE CONSUME CPU TEST This test assesses the Blackhole.consumeCPU performance, that should be linear to the number of tokens. The performance can be slightly different on low token counts. Otherwise, the backoffs with consumeCPU are not reliable. Scores are (normalized) nanoseconds per token. #Tokens: 1: 2.32 ¦ 0.00 ns 2: 1.44 ¦ 0.00 ns 3: 1.21 ¦ 0.00 ns 4: 1.10 ¦ 0.04 ns 5: 1.04 ¦ 0.00 ns 6: 1.03 ¦ 0.00 ns 7: 1.03 ¦ 0.00 ns 8: 1.03 ¦ 0.00 ns 9: 1.03 ¦ 0.00 ns 10: 1.03 ¦ 0.00 ns 20: 1.21 ¦ 0.00 ns 30: 1.46 ¦ 0.01 ns 40: 1.56 ¦ 0.00 ns 50: 1.61 ¦ 0.01 ns 60: 1.64 ¦ 0.00 ns 70: 1.66 ¦ 0.00 ns 80: 1.70 ¦ 0.02 ns 90: 1.70 ¦ 0.00 ns 100: 1.71 ¦ 0.00 ns 500: 1.79 ¦ 0.00 ns 1000: 1.80 ¦ 0.01 ns 5000: 1.81 ¦ 0.01 ns 10000: 1.80 ¦ 0.00 ns 50000: 1.81 ¦ 0.00 ns 100000: 1.82 ¦ 0.01 ns 500000: 1.80 ¦ 0.00 ns 1000000: 1.80 ¦ 0.00 ns 5000000: 1.80 ¦ 0.00 ns 10000000: 1.80 ¦ 0.00 ns --------- BLACKHOLE SINGLE INVOCATION TEST This test shows the Blackhole overheads, when using a single invocation in the method, whether implicitly via return from @Benchmark, or explicitly via consume(). The performance should be the same for implicit and explicit cases, and comparable across all data types. Scores are nanoseconds per benchmark op. implicit explicit boolean: 1.93 ¦ 0.13 ns 1.84 ¦ 0.00 ns byte: 1.86 ¦ 0.06 ns 1.88 ¦ 0.13 ns short: 1.84 ¦ 0.00 ns 1.85 ¦ 0.01 ns char: 1.91 ¦ 0.15 ns 1.85 ¦ 0.01 ns int: 1.86 ¦ 0.01 ns 1.85 ¦ 0.01 ns float: 1.93 ¦ 0.00 ns 1.94 ¦ 0.01 ns long: 1.86 ¦ 0.02 ns 1.86 ¦ 0.04 ns double: 1.93 ¦ 0.00 ns 1.94 ¦ 0.01 ns Object: 2.02 ¦ 0.01 ns 2.03 ¦ 0.02 ns Array: 2.03 ¦ 0.03 ns 2.02 ¦ 0.02 ns --------- BLACKHOLE PIPELINED TEST (NORMAL) This test shows the Blackhole performance in a loop with a given number of iterations. We should normally see the uniform numbers across most data types and number of iterations. If the numbers are wildly non-uniform across the number of iteration, this is indicative of Blackhole failure, and may point to a serious JMH issue. Scores are nanoseconds per loop iteration. Scores are nanoseconds per (normalized) benchmark op. Trying loops with [1, 10, 100, 1000, 10000] iterations. 1 10 100 1000 10000 boolean: 2.98 ¦ 0.02 2.34 ¦ 0.01 2.38 ¦ 0.05 2.29 ¦ 0.00 2.29 ¦ 0.01 byte: 2.99 ¦ 0.03 2.37 ¦ 0.06 2.39 ¦ 0.03 2.29 ¦ 0.00 2.30 ¦ 0.05 short: 2.85 ¦ 0.00 2.33 ¦ 0.01 2.36 ¦ 0.10 2.31 ¦ 0.12 2.27 ¦ 0.00 char: 2.85 ¦ 0.00 2.33 ¦ 0.01 2.35 ¦ 0.05 2.28 ¦ 0.01 2.29 ¦ 0.05 int: 2.85 ¦ 0.00 2.28 ¦ 0.01 2.27 ¦ 0.00 2.21 ¦ 0.00 2.22 ¦ 0.03 float: 2.96 ¦ 0.00 2.44 ¦ 0.08 2.42 ¦ 0.06 2.33 ¦ 0.00 2.34 ¦ 0.01 long: 2.85 ¦ 0.00 2.29 ¦ 0.02 2.27 ¦ 0.00 2.21 ¦ 0.00 2.21 ¦ 0.01 double: 2.96 ¦ 0.00 2.40 ¦ 0.00 2.41 ¦ 0.01 2.33 ¦ 0.00 2.35 ¦ 0.04 Object: 2.87 ¦ 0.00 2.38 ¦ 0.05 2.38 ¦ 0.00 2.32 ¦ 0.06 2.30 ¦ 0.01 Array: 2.87 ¦ 0.00 2.36 ¦ 0.01 2.39 ¦ 0.03 2.30 ¦ 0.00 2.30 ¦ 0.01 --------- BLACKHOLE PIPELINED TEST + REAL PAYLOAD (NORMAL) This test shows the Blackhole performance in a loop with a given number of iterations. We should normally see the uniform numbers across most data types and number of iterations. If the numbers are wildly non-uniform across the number of iteration, this is indicative of Blackhole failure, and may point to a serious JMH issue. Scores are nanoseconds per loop iteration. Real payload is being injected into the benchmark. Scores are nanoseconds per (normalized) benchmark op. Trying loops with [1, 10, 100, 1000, 10000] iterations. 1 10 100 1000 10000 boolean: 34.40 ¦ 0.05 22.42 ¦ 0.07 20.04 ¦ 0.15 19.74 ¦ 0.01 19.81 ¦ 0.01 byte: 38.34 ¦ 0.40 25.24 ¦ 0.01 23.09 ¦ 0.01 29.80 ¦ 0.03 29.87 ¦ 0.02 short: 38.28 ¦ 0.04 25.41 ¦ 0.01 23.10 ¦ 0.04 22.74 ¦ 0.02 23.13 ¦ 0.01 char: 38.01 ¦ 0.05 25.34 ¦ 0.01 23.15 ¦ 0.02 22.75 ¦ 0.03 23.14 ¦ 0.02 int: 38.05 ¦ 0.22 25.20 ¦ 0.02 23.16 ¦ 0.01 22.90 ¦ 0.06 23.12 ¦ 0.02 float: 34.25 ¦ 0.11 26.23 ¦ 0.02 24.62 ¦ 0.03 24.58 ¦ 0.18 24.46 ¦ 0.02 long: 37.91 ¦ 0.02 25.16 ¦ 0.01 23.08 ¦ 0.02 22.82 ¦ 0.01 22.71 ¦ 0.01 double: 33.31 ¦ 0.07 21.59 ¦ 0.02 19.57 ¦ 0.10 19.19 ¦ 0.03 18.96 ¦ 0.02 Object: 35.35 ¦ 0.03 23.77 ¦ 0.12 21.39 ¦ 0.04 21.08 ¦ 0.03 20.79 ¦ 0.19 Array: 39.41 ¦ 0.68 26.08 ¦ 0.16 23.38 ¦ 0.04 23.11 ¦ 0.09 22.96 ¦ 0.03 --------- BLACKHOLE PIPELINED TEST (INLINE HINTS BROKEN) This test shows the Blackhole performance in a loop with a given number of iterations. We should normally see the uniform numbers across most data types and number of iterations. If the numbers are wildly non-uniform across the number of iteration, this is indicative of Blackhole failure, and may point to a serious JMH issue. Scores are nanoseconds per loop iteration. This particular test mode forces the inline of Blackhole methods, and so demolishes one of the layers in defence in depth. If this layer is broken, Blackhole should also survive. If it isn't, then JMH will have to provide more contingencies. Scores are nanoseconds per (normalized) benchmark op. Trying loops with [1, 10, 100, 1000, 10000] iterations. 1 10 100 1000 10000 boolean: 1.55 ¦ 0.00 1.08 ¦ 0.00 1.11 ¦ 0.00 1.04 ¦ 0.00 1.03 ¦ 0.00 byte: 1.55 ¦ 0.00 1.08 ¦ 0.00 1.10 ¦ 0.00 1.04 ¦ 0.00 1.03 ¦ 0.00 short: 1.55 ¦ 0.00 1.02 ¦ 0.00 1.09 ¦ 0.00 1.04 ¦ 0.00 1.03 ¦ 0.00 char: 1.55 ¦ 0.00 1.02 ¦ 0.00 1.09 ¦ 0.00 1.04 ¦ 0.00 1.03 ¦ 0.00 int: 1.55 ¦ 0.00 1.02 ¦ 0.00 1.08 ¦ 0.00 1.04 ¦ 0.00 1.03 ¦ 0.00 float: 1.68 ¦ 0.00 1.24 ¦ 0.00 1.28 ¦ 0.03 1.21 ¦ 0.00 1.16 ¦ 0.08 long: 1.55 ¦ 0.00 1.03 ¦ 0.00 1.09 ¦ 0.00 1.04 ¦ 0.00 1.03 ¦ 0.00 double: 1.68 ¦ 0.00 1.24 ¦ 0.00 1.27 ¦ 0.00 1.21 ¦ 0.00 1.21 ¦ 0.00 Object: 1.83 ¦ 0.02 1.92 ¦ 0.00 1.81 ¦ 0.00 1.80 ¦ 0.00 1.80 ¦ 0.00 Array: 1.83 ¦ 0.02 1.92 ¦ 0.00 1.82 ¦ 0.03 1.80 ¦ 0.00 1.80 ¦ 0.00 --------- BLACKHOLE PIPELINED TEST + REAL PAYLOAD (INLINE HINTS BROKEN) This test shows the Blackhole performance in a loop with a given number of iterations. We should normally see the uniform numbers across most data types and number of iterations. If the numbers are wildly non-uniform across the number of iteration, this is indicative of Blackhole failure, and may point to a serious JMH issue. Scores are nanoseconds per loop iteration. Real payload is being injected into the benchmark. This particular test mode forces the inline of Blackhole methods, and so demolishes one of the layers in defence in depth. If this layer is broken, Blackhole should also survive. If it isn't, then JMH will have to provide more contingencies. Scores are nanoseconds per (normalized) benchmark op. Trying loops with [1, 10, 100, 1000, 10000] iterations. 1 10 100 1000 10000 boolean: 32.09 ¦ 0.06 20.74 ¦ 0.02 18.28 ¦ 0.03 17.93 ¦ 0.02 17.90 ¦ 0.01 byte: 35.69 ¦ 0.32 24.74 ¦ 0.01 23.23 ¦ 0.02 28.44 ¦ 0.07 28.67 ¦ 0.01 short: 35.77 ¦ 0.03 24.47 ¦ 0.01 22.76 ¦ 0.02 22.63 ¦ 0.03 23.19 ¦ 0.02 char: 35.78 ¦ 0.03 24.44 ¦ 0.02 22.82 ¦ 0.09 22.67 ¦ 0.02 23.19 ¦ 0.01 int: 35.25 ¦ 0.07 24.71 ¦ 0.03 23.03 ¦ 0.01 22.93 ¦ 0.01 22.93 ¦ 0.01 float: 32.44 ¦ 0.03 25.38 ¦ 0.01 24.05 ¦ 0.01 23.74 ¦ 0.31 22.95 ¦ 0.01 long: 35.17 ¦ 0.02 24.53 ¦ 0.03 22.82 ¦ 0.01 22.69 ¦ 0.01 22.67 ¦ 0.02 double: 31.03 ¦ 0.06 20.20 ¦ 0.01 18.04 ¦ 0.01 17.81 ¦ 0.02 17.67 ¦ 0.01 Object: 33.72 ¦ 0.15 20.81 ¦ 0.03 18.40 ¦ 0.01 18.18 ¦ 0.04 18.29 ¦ 0.02 Array: 35.10 ¦ 0.02 22.64 ¦ 0.02 20.27 ¦ 0.02 20.17 ¦ 0.02 20.22 ¦ 0.02 --------- BLACKHOLE MERGING TEST (NORMAL) This test verifies that calling the Blackhole.consume with the same result is not susceptible for merging. We expect the similar performance across all data types, and the number of consecutive calls. If there are significant differences, this is indicative of Blackhole failure, and it is a serious JMH issue. Scores are nanoseconds per Blackhole call. Trying [1, 4, 8] consecutive Blackhole calls. 1 4 8 boolean: 4.12 ¦ 0.01 3.60 ¦ 0.01 3.48 ¦ 0.02 byte: 3.78 ¦ 0.01 3.61 ¦ 0.00 3.58 ¦ 0.00 short: 3.77 ¦ 0.00 3.62 ¦ 0.00 3.58 ¦ 0.00 char: 3.86 ¦ 0.00 3.63 ¦ 0.00 3.59 ¦ 0.00 int: 3.68 ¦ 0.00 3.47 ¦ 0.01 3.43 ¦ 0.00 float: 4.40 ¦ 0.02 4.16 ¦ 0.00 4.10 ¦ 0.00 long: 3.79 ¦ 0.00 3.53 ¦ 0.00 3.48 ¦ 0.00 double: 4.40 ¦ 0.01 4.16 ¦ 0.00 4.10 ¦ 0.00 Object: 4.58 ¦ 0.01 4.79 ¦ 0.01 4.65 ¦ 0.01 Array: 6.58 ¦ 0.02 6.19 ¦ 0.01 6.09 ¦ 0.01 --------- BLACKHOLE MERGING TEST (INLINE HINTS BROKEN) This test verifies that calling the Blackhole.consume with the same result is not susceptible for merging. We expect the similar performance across all data types, and the number of consecutive calls. If there are significant differences, this is indicative of Blackhole failure, and it is a serious JMH issue. This particular test mode forces the inline of Blackhole methods, and so demolishes one of the layers in defence in depth. If this layer is broken, Blackhole should also survive. If it isn't, then JMH will have to provide more contingencies. Scores are nanoseconds per Blackhole call. Trying [1, 4, 8] consecutive Blackhole calls. 1 4 8 boolean: 2.57 ¦ 0.00 2.57 ¦ 0.00 2.48 ¦ 0.00 byte: 2.83 ¦ 0.00 2.53 ¦ 0.00 2.61 ¦ 0.00 short: 2.83 ¦ 0.00 2.53 ¦ 0.01 2.61 ¦ 0.00 char: 2.83 ¦ 0.00 2.55 ¦ 0.00 2.57 ¦ 0.00 int: 3.09 ¦ 0.00 2.43 ¦ 0.00 2.64 ¦ 0.00 float: 3.30 ¦ 0.01 2.83 ¦ 0.00 2.81 ¦ 0.00 long: 2.83 ¦ 0.01 2.76 ¦ 0.00 2.58 ¦ 0.00 double: 3.31 ¦ 0.01 2.84 ¦ 0.00 2.88 ¦ 0.00 Object: 3.93 ¦ 0.19 3.75 ¦ 0.01 3.95 ¦ 0.03 Array: 5.56 ¦ 0.05 5.56 ¦ 0.05 5.43 ¦ 0.01