Grind The Coffee Beans

2021-04-21

Ahead Class VM is one of my projects I thought that could become something.

My idea was to translate compiled JVM binaries ie. class files into C++ code, which could be compiled with a C++ compiler to produce static binaries. This would be kind of Ahead-of-time compilation (AOT).

Of course I knew this would need lot of work, and some clever tricks. However I wanted to try how far I was able to go.

Documentation

First thing I remember was excessive amount of documentation, but still not enough documentation. Unfortunately I needed to figure out many things by just trying it out. To be honest I’m sure there’s documentation about all the subject I encountered, but to find those might be hard.

I ended up googling a lot, try-and error lot of things, implementing it first wrong by following my intuition, and then fixing later on. If nothing else, this help me understand a lot about the underlying JVM internals and way of working.

Background research

Before starting all this, I wanted to check if there was existing components I could use. I checked many things, but unfortunately needed to drop them for various reasons. Most of the time either the license, code quality or ease of use was the problem.

Thus for my own (in)sanity, I decided to go with the NIH syndrome and do it myself. As said, if this would be real commercial product, I would still go thorough the available options and consider starting from there. If you just want to learn, this is most probably the best way to go…

Class loader

First I had my Java app, compiled it with javac and got a class file. While investigating the binary javap ended up being really valuable tool. I used it really a lot…

I think implementing my class loader ended up being big, and frustrating job. Implementing one by one all the sections in class file, and trying to figuring out what they mean. Of course the simple things was trivial, but some of the most complex things I couldn’t get implemented. There was always some pitfall to get myself into.

But I think I finally got it. It might be buggy and ugly, but most of the time it works - at least on some level.

Interpreter and translator

First thing to do with your code, is to interpret it. And oh that’s really tempting road. You’ll get easy wins and results. But that would not be what I wanted. I think I ditched that approach after I got working “Hello World”.

Thus I decided to focus on producing C++ code. On targeting C++ I would benefit on few things:

  • C++ supports classes, obviously benefit since Java is built on classes and no need to reinvent the wheel
  • STL with certain functionality so no need to focus on the basics
  • No need to focus on the code generator that much
  • Can compile easily for almost any target

Otherwise my design was simple compiler to output C++ code, and a support library, which would be linked with the produced code.

The most of the JVM bytecode was just trivial, and easy to produce corresponding C++ representation. However the classes, and particularly invokevirtual, was complex. I think it’s still not working correctly, and just does the bare minimum.

Dynamic loading

One issue I encountered was dynamic class loading. In normal case you start with the class file containing the main method. For my simple hello, that’s all I have. In case of more complex example, one may have multiple Classes and .class files. Thus for example MultiHello.java:

package multiclass;

class MultiHello {
    private String msg;

    MultiHello(String m) {
        this.msg = m;
    }

    void setMessage(String m) {
        this.msg = m;
    }

    void greet() {
        System.out.println("Hello " + msg);
    }
}

And MultiClass.java:

package multiclass;

class MultiClass {

    public static void main(String[] args) {
        MultiHello hello = new MultiHello("world");
        hello.greet();
        hello.setMessage("all");
        hello.greet();
    }
}

When compiled and ran with JVM it works like expected. We can just call: jvm multiclass.MultiClass and it works. Reason is that when JVM is requested to create a new instance from MultiHello class - and the class has not been loaded previously - it automatically search for MultiHello.class from the classpath, and dynamically loads it. By default classpath contains system configured paths, plus current working directory. Thus JVM is able to load what it needs and make the app working. Worst thing is, classpath can be configured at runtime.

However my compiler does not have dynamic loading, and it should figure out all the other classes at compile time. That ended up being a hard thing to do. Of course doable, but I think this ended up being way too complex to handle, and I encountered some limitations of my implementation.

In best case it would scan the class files and resolve dependencies, generate separate output files for all of those. For now I just was lazy and print everything to stdout. Thus idea of making multiple output files was doomed without bigger changes to the code base. It ended up being mess trying to solve it so that it would output all those in same file.

At this point I think I lost my interest for the project.

Afterwords

For now this has been abandoned for few years. I wanted to look back on it, since I think it still has potential. Just would need refactoring and bit new design.

Back then there was GraalVM coming, and it promised proper AOT as well. I think that is quite remarkable project. However it’s big behemoth, and getting a bit fresher and simpler view would be nice.

If nothing else, at least this made me understand better how JVM works. At the same time it helped me improving my Java programming skills. I believe that understanding what is happening under the hood makes you better at whatever you’re doing. So it’s again time to celebrate the failure!