Yanyan's Wiki 软件工程研究入门 (2022)

Tracing Memory Accesses

Deadline: Sunday 12 Dec 2021 23:59:59.

Submit via command line:

curl http://jyywiki.cn/upload \
  -F course=ISER2021 \
  -F module=PA2 \
  -F token={{your token}} \
  -F stuid={{student id}} \
  -F stuname={{name (chinese)}} \
  -F file=@{{path to your submission}}

ISER2021-PA2 提交结果

1. Background

We have learned that a useful perspective to understand program execution is considering the program as a state machine. Existing logging mechanisms (e.g., printfs in your code) thus provide effective debugging aids. All modern software systems come with logging, e.g., for performance diagnostics. For Java, there are plenty widely adopted profilers:

  • Jprofiler
  • YourKit
  • VisualVM

In this lab, we instrument a Java program (by hacking its bytecode) to trace all shared memory accesses. Memory tracing is useful in various tasks, e.g., data race detection. We recommend Prof. Xinyu Feng's video lectures on relaxed memory model and related issues:

2. The Assignment

Write a program that traces all shared memory accesses of the classes in a given Java jar package. For each shared memory access, print a line consisting of the following four parts:

  • R/W to indicate it is a read or write.
  • A decimal number to indicate the thread number.
  • A 64-bit hex number to indicate the identifier of the object being accessed. Try your best to assign each object a unique identifier (e.g., using System.identityHashCode()). But you should notice that this is not possible: your program could run an infinite amount of time, creating an infinite amount of objects.
  • The member or array index being accessed.

Pack your tool as a command-line tool which resembles the java command:

$ jmtrace -jar something.jar "hello world"
R 1032 b026324c6904b2a9 cn.edu.nju.ics.Foo.someField
W 1031 e7df7cd2ca07f4f1 java.lang.Object[0]
W 1031 e7df7cd2ca07f4f2 java.lang.Object[1]
...

You can safely ignore memory accesses in the system libraries, e.g., java.lang and java.util. You will have trouble tracing these memory accesses.

3. Memory Access Tracing

Memory tracing is a bit harder than coarse-grained tracing, e.g., strace (by intercepting system calls over ptrace) or ltrace (by hacking dynamic linker). This is because memory accesses are of substantial amount and should be highly optimized. We have to do considerably intrusive penetrations to the program to obtain the trace.

For example, we can instruct the program at the source-code level:

static void foo() {
  int[] a = new int [10];
  for (int i = 0; i < a.length; i++) {
    a[i] = 0;
  }
  SomeClass.staticField = 1;
  someObj.otherField = someObj.field;
}

We can leverage an AST rewriter (e.g., JAssist) to identify all memory access nodes and rewrite them:

static void foo() {
  int[] a = new int [10];
  for (int i = 0; i < a.length; i++) {
    int $t0 = 0;
    a[i] = $t0;
    mtrace.traceArrayWrite(a, i, $t0); // instrument added
  }

  int $t1 = 1;
  SomeClass.staticField = $t1;
  mtrace.traceStaticWrite(SomeClass, "staticField", $t1); // instrument added

  SomeType $t2 = someObj.field;
  mtrace.traceFieldRead(someObj, "field", $t2); // instrument add

  someObj.otherField = $t2;
  mtrace.traceFieldWrite(someObj, "otherField", $t2); // instrument add
}

We shall see that the program behaves identically before and after instrumentation. However, the memory accesses were rewrote by appending a call to mtrace.

Another approach is to instrument at bytecode level. Bytecode is an even more simplified of abstraction, which is more friendly to a compiler. You can leverage bytecode rewriting tools (e.g., ASM) to change bytecode at class-loading time. The only instructions you need to instrument are: getstatic/putstatic/getfield/putfield/*aload/*astore). We recommend the students to read the Java Virtual Machine Specification. The manual provides sufficient details to implement your own Java runtime!

To hijack the class-loading, we can use JVMTI (JVM Tool Interface) to register a callback to class loading. Java also has a built-in java.lang.Instrument to conduct instrumentation. You have the freedom to choose how to implement the instrumentation.

4. Submission

Upload the following as a zip file (zip or tar).

  • Source code for the tool (make sure that only the source code is included and the library functions that your source code depends on are readily available; do not put files that can be generated from the source code (dependencies, binaries, etc.) into your zip archive - they may cause your zip file to exceed the size limit).
  • Short compilation instructions, including how dependent libraries are obtained. (Better use existing dependency management systems.)
  • A report in pdf format (English), briefly describing your algorithm for implementing the tracing. The report should be no more than two A4 pages.

The experiment is graded mainly based on correctness: you should not miss any shared memory access. Nor should you print unnecessary debugging information. Multithreading is a built-in feature of JVM. So your program should work in a multi-threaded setting.

Creative Commons License    苏 ICP 备 2020049101 号