TOC

Internal JVM architecture and Bytecode

Adamansky Anton

Internal JVM architecture and Bytecode

Simple application 2 + 2 = 4

Sum.java
             public class Sum {
              public static void main(String[] args) {
              int a = 2;
              int b = 2;
              int c = a + b;
              }
             }
         

Java code lifecycle

  1. javac Sum.java
  2. java Sum

Bytecode

— The set of instructions in order to execute in a registry-based virtual machine.

Why you should learn JVM bytecode?

Stack machine

Ask VM to compute 2 + 2

LIFO — Last-In-First-Out

Stack machine

  1. PUSH 2 operand

Stack machine

  1. PUSH 2 operand

Stack machine

  1. PUSH + operation

Stack machine

Now VM will do
  1. POP + operation
  2. POP 2 operand
  3. POP 2 operand
  4. Perform computation

Stack machine

  1. PUSH result on top of stack

Stack-based VMs

  1. JVM
  2. .Net CIL (Common Intermediate Language)
  3. Perl
  4. Adobe's PostScript
  5. ...

JVM Bytecode

Show me bytecode!

Hmm....

Hmm....

http://docs.oracle.com/javase/specs/jvms/se7/html/index.html

Java class file structure

  1. Magic 0XCAFEBABE
  2. Minor + major versions of the class file (4 byte)
  3. Constants pool
  4. Class access flags (2 bytes)
  5. Class name ref (2 bytes)
  6. Superclass name ref (2 bytes)
  7. Interfaces
  8. Fields
  9. Methods
  10. Annotation attributes
The ClassFile Structure
ClassFile {
    u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

What is minor and major class format versions of the following bytecode?

00000000  ca fe ba be 00 00 00 34  00 1d 0a 00 06 00 0f 09
00000010  00 10 00 11 08 00 12 0a  00 13 00 14 07 00 15 07
00000020  00 16 01 00 06 3c 69 6e  69 74 3e 01 00 03 28 29

Java class format MAJOR versions

javap -c Sum.class

javap -c Sum

            public Sum(): //Sum constructor
             aload_0 //PUSH this on top of the stack
             invokespecial #1 //POP this and invoke #1 method on this
        
Constant pool:
   #1 = Methodref          #3.#12         //  java/lang/Object."<init>":()V
   #2 = Class              #13            //  Sum
   #3 = Class              #14            //  java/lang/Object
   #4 = Utf8               <init>
   #5 = Utf8               ()V
   #6 = Utf8               Code
   #7 = Utf8               LineNumberTable
   #8 = Utf8               main
   #9 = Utf8               ([Ljava/lang/String;)V
  #10 = Utf8               SourceFile
  #11 = Utf8               Sum.java
  #12 = NameAndType        #4:#5          //  "<init>":()V
  #13 = Utf8               Sum
  #14 = Utf8               java/lang/Object

javap -c Sum

public static void main(String[] args) {
            iconst_2 PUSH 2 const
            istore_1 POP 2 → local a var with 1 index
            iconst_2 PUSH 2 const
            istore_2 POP 2 → local b var with 2 index
            iload_1  PUSH local a value
            iload_2  PUSH local b value
            iadd     POP a POP b → PUSH a + b
            istore_3 POP a + b → save into local c var
            return
        
public static void main(String[] args) {
    int a = 2;
    int b = 2;
    int c = a + b;
}

OPCODEs prefixes

iload, aload, lload ...

b byte
s short
c char
i int
l long
f float
d double
a object ref

http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html

Bytecode descriptors (Types)

Java type Bytecode descriptor
intI
longJ
floatF
doubleD
charC

Bytecode descriptors (Types)

Java type Bytecode descriptor
booleanZ
StringLjava/lang/String;
voidV
int[][I
int[][][[I
......

Bytecode descriptors (Methods)

package foo.bar;
class C {
    int m1()
    void m2(int i)
    String m3(int x, String y)
}
Method Bytecode descriptor
int m1() foo/bar/C.m1()I
void m2(int i) foo/bar/C.m2()(I)V
String m3(int x, String y) foo/bar/C.m2() (ILjava/lang/String;)Ljava/lang/String;
long[][] foo(Integer i, char j) ????

Method invocations

Bytecode manipulation tools

ASM http://asm.ow2.org/

ASM is an all purpose Java bytecode manipulation and analysis framework.

De facto standard bytecode manipulation library.

Bytecode manipulation tools

BiteScript http://github.com/headius/bitescript

BiteScript is a Ruby DSL for generating Java bytecode and classes

JVM bytecode documentation

Bytecode analysis

Analize bytecode of following programs

Program 1:

package nsu.fit.javaperf.lab2;

public class P1 {
    public static void main(String[] args) {
        String s = "";
        for (int i = 0; i < 10000; ++i) {
            s += " " + String.valueOf(i);
        }
        System.out.println(s.length());
    }
}

Bytecode analysis

Program 2:

package nsu.fit.javaperf.lab2;

public class P2 {
    public static void main(String[] args) {
        StringBuilder s = new StringBuilder();
        for (int i = 0; i < 10000; ++i) {
            s.append(" ").append(String.valueOf(i));
        }
        System.out.println(s.length());
    }
}

Bytecode analysis

Bytecode analysis

  1. Compile Lab3.java into Lab3.class
  2. Obtain hexdump of Lab3.class. You can use http://www.fileformat.info/tool/hexdump.htm
package nsu.fit.javaperf.lab3;

public class Lab3 {

    int coins = 1;

    void muliplyCoins(int ratio) {
        int newcoins = coins * ratio;
        coins = newcoins;
    }
    public static void main(String[] args) {
        Lab3 l2 = new Lab3();
        l2.muliplyCoins(10);
        l2.muliplyCoins(20);
    }
}

Bytecode analysis

  1. Locate in the hexdump muliplyCoins() bytecode
  2. Explain each octet value of the muliplyCoins() bytecode in the table as follows:
Octet Related bytecode instruction
0x3d istore_2
... ...

Bytecode generation with ASM

Lets's generate simple Hello program.
public class HelloWorld {
   public static void main(String[] args) {
        System.out.println("Hello, World!");
   }
}

Bytecode generation with ASM

Construct the ClassWriter instance:

ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_MAXS | ClassWriter.COMPUTE_FRAMES);

COMPUTE_MAXS tells ASM to automatically compute the maximum stack size and the maximum number of local variables of methods.
COMPUTE_FRAMES flag makes ASM to automatically compute the stack map frames of methods from scratch.

Bytecode generation with ASM

The define a class we must invoke the visit() method of ClassWriter:
cw.visit(Opcodes.V1_7, Opcodes.ACC_PUBLIC, "HelloWorld", null,
        "java/lang/Object", null);

Bytecode generation with ASM

Then generate default constructor:
MethodVisitor constructor =
    cw.visitMethod(Opcodes.ACC_PUBLIC, "<init>", "()V", null, null);

    constructor.visitCode();

    //call super()
    constructor.visitVarInsn(Opcodes.ALOAD, 0);

    constructor.visitMethodInsn(Opcodes.INVOKESPECIAL,
                                        "java/lang/Object", "<init>", "()V");

    constructor.visitInsn(Opcodes.RETURN);

    constructor.visitMaxs(0, constructor.visitEnd(); 0);

Bytecode generation with ASM

Generate main method:
MethodVisitor mv = cw.visitMethod(Opcodes.ACC_PUBLIC + Opcodes.ACC_STATIC,
                                "main", "([Ljava/lang/String;)V", null, null);

    mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System",
                  "out", "Ljava/io/PrintStream;");

    mv.visitLdcInsn("Hello, World!");

    mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream",
                       "println", "(Ljava/lang/String;)V");

    mv.visitInsn(Opcodes.RETURN);
    mv.visitMaxs(0, mv.visitEnd(); 0);

Bytecode generation with ASM

Javassist Bytecode instrumentation

It is a class library for editing bytecodes in Java; it enables Java programs to define a new class at runtime and to modify a class file when the JVM loads it. http://www.csg.ci.i.u-tokyo.ac.jp/~chiba/javassist/

Javassist proxy

Dynamic proxying via inheritance

package nsu.fit.javaperf.lab4;
public class Lab4 {

    static class Calculator {
        public int sum(int x, int y) {
            return x + y;
        }
    }

    static Calculator createCalculator() throws Exception {
        //todo use JavaAssist to inherit Calculator class,
        //overload sum() method
        //and add 1 to original return value, so 2 + 2 will be 5
        return new Calculator();
    }

    public static void main(String[] args) throws Exception {
        Calculator cal = createCalculator();
        System.out.println("2 + 2 = " + cal.sum(2, 2));
    }
}

Using java.lang.reflect.Proxy

public class Lab5 {

    static <T> T setLogging(final T obj) {
        //todo get all interfaces: obj.getClass().getInterfaces()
        //todo Intercept with java.lang.reflect.Proxy
        //todo Log each interface call into System.out
        //todo OUTPUT EXAMPLE:
        //todo      CALLING:  java.lang.Runnable#run()
        //todo      RETURN VAL: null

        return (T) obj;
    }
         ...
        

Java agent technology

Allows on-fly bytecode modification of all JVM classes.

Classloading

Пусть имеет некоторый каталог /A в котором находятся произвольные .class файлы, необходимо написать программу которая принимает в качестве параметра командной строки путь до данного каталога, и проверяет каждый класс (находящийся в нем) на наличие метода: String getSecurityMessage(), если данный метод присутствует в классе, программа создает объект класса (публичный конструктор без параметров), вызывает данный метод и выводит на консоль полное имя класса и результат выполнения данного метода. Скачать каталог с тестовыми классами можно здесь: http://ccfit.nsu.ru/~adamansky/java/task5.zip

Java agent + jassist

package nsu.fit.javaperf;
public class TransactionProcessor {

    public void processTransaction(int txNum) throws Exception{
        System.err.println("Processing tx: " + txNum);
        int sleep = (int) (Math.random() * 1000);
        Thread.sleep(sleep);
        System.err.println(String.format("tx: %d completed", txNum));
    }

    public static void main(String[] args) throws Exception{
        TransactionProcessor tp = new TransactionProcessor();
        int tx = 0;
        tp.processTransaction(++tx);
        tp.processTransaction(++tx);
        tp.processTransaction(++tx);
    } }        

Java agent + jassist

Using java agent technology and class files transformations ask the following questions:

  1. Total number of classes loaded by JVM during the application lifetime
  2. Total size of all classes loaded by JVM during the application lifetime
  3. Record all object/arrays allocations and their sizes into seprarate file Format: object class name => size in bytes
  4. Measure a time needed to complete each method in all classes defined in package nsu.fit.javaperf