Reading List for ISER'24

📚How to use this list?

These papers serve as a valuable starting point for your reading, offering interesting and insightful perspectives. To fully grasp the contributions of these works, you may need to explore additional papers, textbooks, or tutorials. For older papers, it is also helpful to identify key milestones that build upon these foundational works. Additionally, seeking out the authors' conference talks on YouTube (if available) can provide a high-level overview of the ideas presented in the papers.

0. Must-reads

1. Studies

  1. setuid demystified. USENIX Security'02.
  2. Do crosscutting concerns cause defects? IEEE Transactions on Software Engineering, 34(4), 2008.
  3. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. ASPLOS'08.
  4. Understanding integer overflow in C/C++. ICSE'12.
  5. Ad hoc transactions in Web applications: The good, the bad, and the ugly. SIGMOD'22.

2. Compilers

  1. LLVM: A compilation framework for lifelong program analysis & transformation. CGO'04.
  2. QEMU, a fast and portable dynamic translator. USENIX ATC'05.
  3. Stochastic superoptimization. ASPLOS'13.
  4. Bringing the Web up to speed with WebAssembly. PLDI'17.
  5. Copy-and-patch compilation: A fast compilation algorithm for high-level languages and bytecode. OOPSLA'21.

3. Static Analysis and Checking

  1. Introduction to set constraint-based program analysis. Science of Computer Programming, 35, 1999.
  2. Bugs as deviant behavior: A general approach to inferring errors in systems code. SOSP'01.
  3. Language-based information-flow security. IEEE Journal on Selected Areas in Communications, 21(1), 2003.
  4. Finding bugs is easy. OOPSLA'04.
  5. Safe systems programming in Rust. Communications of the ACM, 64(4), 2021.

4. Dynamic Analysis and Trace

  1. Efficient path profiling. MICRO'96.
  2. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. OSDI'02.
  3. Valgrind: A framework for heavyweight dynamic binary instrumentation. PLDI'07.
  4. AddressSanitizer: A fast address sanity checker. USENIX ATC'12.
  5. Debugging the OmniTable way. OSDI'22.

5. Debugging

  1. DDD-A free graphical front-end for UNIX debuggers. ACM SIGPLAN Notices, 31(1), 1996.
  2. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering (TSE), 28(2), 2002.
  3. Bug isolation via remote program sampling. PLDI'03.
  4. Debugging in the (very) large: Ten years of implementation and experience. SOSP'09.
  5. Automatically finding patches using genetic programming. ICSE'09.

6. Testing and Validation

  1. EXPLODE: A lightweight, general system for finding serious storage system errors. OSDI'06.
  2. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. NSDI'09.
  3. Coverage is not strongly correlated with test suite effectiveness. ICSE'14.
  4. IJON: Exploring deep state spaces via fuzzing. SP'20.
  5. Fuzz4All: Universal fuzzing with large language models. ICSE'24.

7. Verification

  1. The existence of refinement mappings. Theoretical Computer Science (TCS), 82(2), 1991.
  2. An extensible SAT-solver. SAT'03.
  3. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. OSDI'08.
  4. Hyperkernel: Push-button verification of an OS kernel. SOSP'17.
  5. Verus: Verifying Rust programs using linear ghost types. OOPSLA'23.

8. Synthesis

  1. Synthesis: Dreams \Rightarrow programs. IEEE Transactions on Software Engineering (TSE), 5(4), 1979.
  2. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering (TSE), 27(2), 2001.
  3. Combinatorial sketching for finite programs. ASPLOS'06.
  4. Automating string processing in spreadsheets using input-output examples. POPL'11.
  5. Scaling enumerative program synthesis via divide and conquer. TACAS'17.

9. Neural-symbolic Computing

  1. Mastering the game of Go with deep neural networks and tree search. Nature 529, 2016. (See also: AlphaGeometry and AlphaProof)
  2. Multi-modal program inference: A marriage of pre-trained language models and component-based synthesis. OOPSLA'21.
  3. Visual programming: Compositional visual reasoning without training. CVPR'23.
  4. Prompting is programming: A query language for large language models. PLDI'23.
  5. A structured generation language designed for LLMs. ArXiv'24.

10. AI for Software Engineering

  1. Detecting large-scale system problems by mining console logs. SOSP'09.
  2. On the naturalness of software. Communications of the ACM, 59(5), 2016.
  3. Competition-level code generation with AlphaCode. Science, 378(6624), 2022.
  4. SWE-bench: Can language models resolve real-world GitHub issues? ICLR'24.
  5. Meta large language model compiler: Foundation models of compiler optimization. ArXiv'24.
📚To PhD Students

You need to seriously and extensively read papers related to your broader field, rather than limiting your perspective to your specific research problem: we have PL, systems, and security papers in this list. Remember, you are first and foremost a Computer Science PhD student, and your comprehensive knowledge of the discipline is crucial to your success.

Reading papers can be challenging, especially when essential background knowledge is lacking. Don't hesitate to ask Large Language Models or your peers for assistance.