nowave.it

Building the OpenJDK/HotSpot disassembler with Nix

Wed 12 June 2024

Introduction

Lately, I have been doing some work with Java's Panama Project SIMD APIs.

Not entirely unsurprisingly, I did not observe any performance improvement versus scalar operations on a few artificial benchmarks.

Compilers can be very efficient with loop optimization (and autovectorization), and benchmarking hand-rolled SIMD methods is tricky. The best way to understand why things perform in a certain way is to look at the generated assembly code.

I've done this before, but only with C and C++. Generating assembly for a binary is just a matter of enabling a switch at compile time (or just using godbolt)s.

On the JVM, things are a bit more tricky. Out of the box, the JDK will produce bytecode for a given object and method. But that's not very useful, since the optimizations I am interested in observing happen at a lower level.

Disassembling Java

OpenJDK bundles a plugin disassembler, hsdis, that when used in conjunction with the PrintAssembly option will diassemble and print the output of HotSpot's JIT. hsdis ships with OpenJDK's source but needs to be manually built and installed. Instructions on how to do so are provided in https://github.com/openjdk/jdk/blob/master/src/utils/hsdis/README.md.

Jorn Vernee has an excellent article on building hsdis with recent JDKs using an adhoc cmake script.

I use nixos btw

These days I manage all my development environments with Nix. While the nixpkgs repository does not provide a binary for hsdis, building it from scratch is pretty straightforward.

I wrote a small overlay for jdk22 that enables an LLVM backend option at compile time and appends the required make incantations to the package derivation build phase:

final: prev:
{
  jdk22 = prev.jdk22.overrideAttrs (old: rec {
    buildInputs = old.buildInputs ++ [ final."llvm" ];
    configureFlags = old.configureFlags ++ [
      "--with-hsdis=llvm"
      "--with-llvm=${final.llvm.dev}"
    ];
    buildPhase = ''
      ${prev.buildPhase or ""}
      make build-hsdis
      make install-hsdis
    '';
  });
}

This overlay is available as a flake at https://github.com/gmodena/hsdis-jdk22. Caveat: as of 2024-06-12 only x86_64-linux targets are supported.

To use it in a project, it can be imported like this:

# flake.nix
{
  inputs.nixpkgs.url = "nixpkgs/nixpkgs-unstable";
  inputs.hsdis-jdk22.url = "github:gmodena/hsdis-jdk22";

  outputs = inputs:
  let
      system = "x86_64-linux";
      pkgs = inputs.nixpkgs.legacyPackages.${system};
      hsdis-jdk = inputs.hsdis-jdk22.packages.${system}.default;
    in
    {
      devShell.${system} = pkgs.mkShell rec {
        name = "java-shell";
        buildInputs = [ hsdis-jdk ];

        shellHook = ''
          export JAVA_HOME=${hsdis-jdk}
          PATH="${hsdis-jdk}/bin:$PATH"
        '';
      };
    };
}

nix develop will drop us in a Java 22 development enviroment with hsdis is available to the jdk.

We can now compile, run and disassemble some code with:

$ javac Main.java
$ java -Xbatch '-XX:-TieredCompilation' '-XX:CompileCommand=dontinline,Main::add*' '-XX:CompileCommand=PrintAssembly,Main::add*' Main

If all went well, the following output should be displayed at the cli:

CompileCommand: dontinline Main.add* bool dontinline = true
CompileCommand: PrintAssembly Main.add* bool PrintAssembly = true

============================= C2-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------

Compiled method (c2) 201    1             Main::add (4 bytes)
 total in heap  [0x00007ffff0688f90,0x00007ffff06891a0] = 528
 relocation     [0x00007ffff06890e0,0x00007ffff06890f0] = 16
 main code      [0x00007ffff0689100,0x00007ffff0689150] = 80
 stub code      [0x00007ffff0689150,0x00007ffff0689168] = 24
 oops           [0x00007ffff0689168,0x00007ffff0689170] = 8
 scopes data    [0x00007ffff0689170,0x00007ffff0689178] = 8
 scopes pcs     [0x00007ffff0689178,0x00007ffff0689198] = 32
 dependencies   [0x00007ffff0689198,0x00007ffff06891a0] = 8

[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Verified Entry Point]
  # {method} {0x00007fff764002d8} 'add' '(II)I' in 'Main'
  # parm0:    rsi       = int
  # parm1:    rdx       = int
  #           [sp+0x20]  (sp of caller)
  0x00007ffff0689100:       subq    $0x18, %rsp
  0x00007ffff0689107:       movq    %rbp, 0x10(%rsp)
  0x00007ffff068910c:       cmpl    $0x0, 0x20(%r15)
  0x00007ffff0689114:       jne 0x2c
  0x00007ffff068911a:       leal    (%rsi,%rdx), %eax
  0x00007ffff068911d:       addq    $0x10, %rsp
  0x00007ffff0689121:       popq    %rbp
  0x00007ffff0689122:       cmpq    0x458(%r15), %rsp   ;   {poll_return}
  0x00007ffff0689129:       ja  0x1
  0x00007ffff068912f:       retq
  0x00007ffff0689130:       movabsq $0x7ffff0689122, %r10;   {internal_word}
  0x00007ffff068913a:       movq    %r10, 0x470(%r15)
  0x00007ffff0689141:       jmp -0x34146            ;   {runtime_call SafepointBlob}
  0x00007ffff0689146:       callq   -0x54cab            ;   {runtime_call StubRoutines (final stubs)}
  0x00007ffff068914b:       jmp -0x36
[Exception Handler]
  0x00007ffff0689150:       jmp -0x2a55             ;   {no_reloc}
[Deopt Handler Code]
  0x00007ffff0689155:       callq   0x0
  0x00007ffff068915a:       subq    $0x5, (%rsp)
  0x00007ffff068915f:       jmp -0x34ec4            ;   {runtime_call DeoptimizationBlob}
  0x00007ffff0689164:       hlt
  0x00007ffff0689165:       hlt
  0x00007ffff0689166:       hlt
  0x00007ffff0689167:       hlt
--------------------------------------------------------------------------------
[/Disassembly]

References