Wed 12 June 2024
Lately, I have been doing some work with Java's Panama Project SIMD APIs.
Not entirely unsurprisingly, I did not observe any performance improvement versus scalar operations on a few artificial benchmarks.
Compilers can be very efficient with loop optimization (and autovectorization), and benchmarking hand-rolled SIMD methods is tricky. The best way to understand why things perform in a certain way is to look at the generated assembly code.
I've done this before, but only with C and C++. Generating assembly for a binary is just a matter of enabling a switch at compile time (or just using godbolt)s.
On the JVM, things are a bit more tricky. Out of the box, the JDK will produce bytecode for a given object and method. But that's not very useful, since the optimizations I am interested in observing happen at a lower level.
OpenJDK bundles a plugin disassembler, hsdis
, that when used in conjunction with the PrintAssembly
option will
diassemble and print the output of HotSpot's JIT. hsdis
ships with OpenJDK's source but needs to be manually built and installed.
Instructions on how to do so are provided in https://github.com/openjdk/jdk/blob/master/src/utils/hsdis/README.md.
Jorn Vernee has an excellent article on building hsdis
with
recent JDKs using an adhoc cmake
script.
These days I manage all my development environments with Nix. While the nixpkgs
repository does not provide a binary for hsdis, building it from scratch is pretty straightforward.
I wrote a small overlay for jdk22 that enables an LLVM backend option at compile time and appends the required make incantations to the package derivation build phase:
final: prev:
{
jdk22 = prev.jdk22.overrideAttrs (old: rec {
buildInputs = old.buildInputs ++ [ final."llvm" ];
configureFlags = old.configureFlags ++ [
"--with-hsdis=llvm"
"--with-llvm=${final.llvm.dev}"
];
buildPhase = ''
${prev.buildPhase or ""}
make build-hsdis
make install-hsdis
'';
});
}
This overlay is available as a flake at
https://github.com/gmodena/hsdis-jdk22. Caveat: as of 2024-06-12 only x86_64-linux
targets are supported.
To use it in a project, it can be imported like this:
# flake.nix
{
inputs.nixpkgs.url = "nixpkgs/nixpkgs-unstable";
inputs.hsdis-jdk22.url = "github:gmodena/hsdis-jdk22";
outputs = inputs:
let
system = "x86_64-linux";
pkgs = inputs.nixpkgs.legacyPackages.${system};
hsdis-jdk = inputs.hsdis-jdk22.packages.${system}.default;
in
{
devShell.${system} = pkgs.mkShell rec {
name = "java-shell";
buildInputs = [ hsdis-jdk ];
shellHook = ''
export JAVA_HOME=${hsdis-jdk}
PATH="${hsdis-jdk}/bin:$PATH"
'';
};
};
}
nix develop
will drop us in a Java 22 development enviroment with hsdis
is available to the jdk.
We can now compile, run and disassemble some code with:
$ javac Main.java
$ java -Xbatch '-XX:-TieredCompilation' '-XX:CompileCommand=dontinline,Main::add*' '-XX:CompileCommand=PrintAssembly,Main::add*' Main
If all went well, the following output should be displayed at the cli:
CompileCommand: dontinline Main.add* bool dontinline = true
CompileCommand: PrintAssembly Main.add* bool PrintAssembly = true
============================= C2-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------
Compiled method (c2) 201 1 Main::add (4 bytes)
total in heap [0x00007ffff0688f90,0x00007ffff06891a0] = 528
relocation [0x00007ffff06890e0,0x00007ffff06890f0] = 16
main code [0x00007ffff0689100,0x00007ffff0689150] = 80
stub code [0x00007ffff0689150,0x00007ffff0689168] = 24
oops [0x00007ffff0689168,0x00007ffff0689170] = 8
scopes data [0x00007ffff0689170,0x00007ffff0689178] = 8
scopes pcs [0x00007ffff0689178,0x00007ffff0689198] = 32
dependencies [0x00007ffff0689198,0x00007ffff06891a0] = 8
[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]
--------------------------------------------------------------------------------
[Verified Entry Point]
# {method} {0x00007fff764002d8} 'add' '(II)I' in 'Main'
# parm0: rsi = int
# parm1: rdx = int
# [sp+0x20] (sp of caller)
0x00007ffff0689100: subq $0x18, %rsp
0x00007ffff0689107: movq %rbp, 0x10(%rsp)
0x00007ffff068910c: cmpl $0x0, 0x20(%r15)
0x00007ffff0689114: jne 0x2c
0x00007ffff068911a: leal (%rsi,%rdx), %eax
0x00007ffff068911d: addq $0x10, %rsp
0x00007ffff0689121: popq %rbp
0x00007ffff0689122: cmpq 0x458(%r15), %rsp ; {poll_return}
0x00007ffff0689129: ja 0x1
0x00007ffff068912f: retq
0x00007ffff0689130: movabsq $0x7ffff0689122, %r10; {internal_word}
0x00007ffff068913a: movq %r10, 0x470(%r15)
0x00007ffff0689141: jmp -0x34146 ; {runtime_call SafepointBlob}
0x00007ffff0689146: callq -0x54cab ; {runtime_call StubRoutines (final stubs)}
0x00007ffff068914b: jmp -0x36
[Exception Handler]
0x00007ffff0689150: jmp -0x2a55 ; {no_reloc}
[Deopt Handler Code]
0x00007ffff0689155: callq 0x0
0x00007ffff068915a: subq $0x5, (%rsp)
0x00007ffff068915f: jmp -0x34ec4 ; {runtime_call DeoptimizationBlob}
0x00007ffff0689164: hlt
0x00007ffff0689165: hlt
0x00007ffff0689166: hlt
0x00007ffff0689167: hlt
--------------------------------------------------------------------------------
[/Disassembly]