Devoxx - Day One

Devoxx - Day One

This year I’m returning to Devoxx. I’m planning to write some notes on interesting sessions or other content. Also, I’ll be delivering my talk on Transport Layer Security tomorrow. But today was a day of catching up with old friends and attending a talk or two.

One talk really caught my attention: “Implementing a simple JVM in Rust”.

Implementing a simple JVM in Rust

Ben Evans introduced his pet project, a Rust-implementation of the Java Virtual Machine dubbed Ocelotter. Although Ben works at New Relic, it’s his personal project, so he didn’t need to add a Safe Harbour statement; I like that 🙂.

The JVM actually consists of many components. Most of them we don’t see that often when developing Java programs:

  • (AOT) Compiler - javac, the Java Compiler that compiles Java source code into class files.
  • Class Loader - loads a class file and keeps it in the method cache.
  • Interpreter - interprets the methods that live in the cache
  • JIT Compiler - emits optimised code for hot methods. Those are pretty hard to write, so Ben decided to not have a JIT compiler in his project.
  • Garbage Collector - out of scope for this talk, as this is also pretty hard. Since the JDK itself also comes with a GC that doesn’t collect anything (Epsilon GC), Ben figured this would be fine.

Memory

The JVM interpreter is stack based. This means it has an evaluation area for methods to calculate temporary values. This area is private to the method that currently executes, and it’s implemented as a stack - hence the name. Contrary to what many people say, the JVM bytecode is not machine code for an imaginary CPU because of this. Real CPUs don’t have this temporary evaluation area; instead they have a register for that. The JVM interpreter doesn’t have such a register.

In fact, JVM bytecode is closer to source code than it is to machine code. Many of the Java constructs (e.g. loops) have gone away in the bytecode; only if and jump still exist. Java bytecode stays pretty close to the Java sourcecode from which it originated. This is different for languages like Scala, where the generated bytecode is more distant from the source code.

Rust

After this, Ben continued with a short intro of Rust. It has a few features that make it suitable for low-level stuff like bytecode interpretation, like unsigned types. Also, only programs that can be proven to be memory-safe will be compiled. That sounds to me like a very powerful feature for any programming language!

Interpreter

In essence, the simplest JVM interpreter is a switch statement inside a while loop, with a byte[] (the bytecode) as input. For each instruction it encounters, it looks up the opcode, and then evaluates that against the temporary evaluation stack.

And then, finally, it’s time for some Rust code! For a Rust n00b like me, the first snippets are pretty well readable. Bens approach has been test-driven, and the first test is pretty trivial:

#[test]
fn bc_adds_to_two() {
    let first_test = vec![
        opcode::Opcode::ICONST_1,
        opcode::Opcode::ICONST_1,
        opcode::Opcode::IADD,
        opcode::Opcode::IRETURN,
    ];
    let ret = match execute_simple_bytecode(&first_test) {
        JvmValue::Int { val: i } => i,
        _ => {
            println!("Unexpected, non-integer value encountered");
            0
        }
    };
    assert_eq!(2, ret);
}

This piece of codes creates some bytecode that declares four instructions:

  1. declare the constant integer value 1 and push it on the stack;
  2. declare the constant integer value 1 and push it on the stack;
  3. add those integer values that live on the stack; push the result back on the stack;
  4. return the value on top of the stack.

That bytecode is then fed into a façade method for running bytecode and evaluating the outcome. The outcome is an optional value; if it’s not a JvmValue it prints an error and uses the value 0. Finally, the test asserts that the returned value is equal to 2.

And the implementation of the façade method seems trivial as well:

fn execute_simple_bytecode(buf: &Vec<u8>) -> JvmValue {
    let mut repo = init_repo();
    let mut lvt = InterpLocalVars::of(10); // FIXME
    let opt_ret = exec_bytecode_method(&mut repo, "DUMMY".to_string(), &buf, &mut lvt);
    match opt_ret {
        Some(value) => value,
        None => JvmValue::ObjRef {
            val: 0, // object::OtObj::get_null(),
        },
    }
}

This piece of code creates the temporary evaluation area, defines a dummy method with the supplied bytecode and runs it. If the outcome is an empty optional, it translate that to a Java null value. If it is a non-empty optional, it will wrap it in a Some.

This is a façade to another method, and the actual implementation becomes more complex. The actual implementation loops through the bytes it got. It performs a lookup from the bytecode value to the corresponding opcode. The opcode is then evaluated against the data temporary evaluation area. The implementation of such an opcode is surprisingly easy - at least for the “add” operation:

pub fn iadd(&mut self) -> () {
    // For a runtime checking interpreter - type checks would go here...
    let i1 = match self.pop() {
        JvmValue::Int { val: i } => i,
        _ => panic!("Unexpected, non-integer value encountered"),
    };
    let i2 = match self.pop() {
        JvmValue::Int { val: i } => i,
        _ => panic!("Unexpected, non-integer value encountered"),
    };

    self.push(JvmValue::Int { val: i1 + i2 });
}

It reads two values from the stack, assuming they’re both integer values. If so, it adds the two values, wraps that to a Int value and puts that back on the stack. Since Int is one of the options for JvmValue, the Rust compiler can prove the correctness of this program.

Ben even implemented native method calling, so invocations of System.currentTimeInMillis() work as well. All in all an impressive effort, but Ben explained it in a very entertaining way. It actually made me curious if I could do a similar thing, but I’m afraid that’s going to take way to much time… 😉