Introduction to Software Engineering Research &

How to Read


Yanyan Jiang

Overview

Check our homepage for assignments


Introduction to software engineering research

  • three easy pieces: reading, writing, and hacking
  • grading policies

How to read

  • why, what, and how to read a research paper

Why We Created This Course?

Raise the bar.


“Research” is serious and hard

  • NOT like your undergrad assignments

What is Software Engineering?

A Question

What is Software?


What is Software's Engineering?

Software Engineering Research

Tries to answer the fundamental question of how to build a piece of software


Make 码农 an easier life

  • faster (productivity)
  • better (quality)

Ultimate goal: take over human's role in software development

To Be More Specific...

See ICSE's call for papers

  • AI and software engineering
  • Testing and analysis
  • Software analytics
  • Software evolution
  • Social aspects of software engineering
  • Requirements, modeling, and design
  • Dependability

Examples

“Empirical and human study”


“Technologies”

  • FlashFill in Excel (POPL'11)
  • AddressSanitizer (ATC'12)

For fun and profits

  • Solve hard problems; build useful tools!
  • A quick test for evaluating your work: do real engineers (e.g., someone in Google/Facebook) will feel your work interesting?

Software Engineering Research: Three Easy Pieces


Reading, Writing, and Hacking

Reading

Read all papers in the list

  • Good papers (novel and significant) of good taste
  • Cover many SE areas
    • Empirical software engineering (EMSE)
    • Software engineering process (SEP)
    • Formal methods (FM)
    • Software testing and analysis (STA)
    • Software maintenance and evolution (SME)

(This is today's topic)

Writing

Be concise and precise.

  • Bad writing practices
    • broken logic
    • imprecise wording
    • verbose writing

SUGGESTIONS

  • Start writing as early as possible
  • Rush into your supervisor's office and let him face-to-face revise your manuscripts

Hacking

RTFM, STFW, RTFSC.

  • Drop course if you don't know how to use Git.

Coding should mean nothing to you (just some implementation work)

  • Intercept file system calls; capture block-device traces; create and compare file system snapshots
  • Modify Android Runtime (ART) to expose a debugging interface
  • Modify Java regex engine to obtain matching feedbacks
  • Parse source code via Clang/LLVM

Grading Policies

Reading (presentation sessions, 30%)

  • Four papers a session (15-minute talk + 3-minute QA, as if you were the authors); bid paper today
  • Bring a laptop. Projector is 4:3 (VGA)

Writing (research proposal, 30%)

Hacking (programming assignments, 40%)

  • Two assignments (static/dynamic software analysis)
  • Choose one (freely) for artifact evaluation

How to Read

Why Read a Paper?

Personal experience: almost all of my ideas emerged from reading textbooks and papers.


Gonna some big ideas?

  • if your thoughts really matter
    • 99.9% have already been explored
    • find a related work

What is a Paper?

A piece of work that teaches the research community a lesson. (Hope everyone remembers this!)

  1. Motivation and observations
    • the most important part; why they came up with the problem?
  2. Explanations and analyses
    • arguments, evidence, and defenses against potential threats
  3. Solution and evaluation
    • this is the least important part
    • if you have sufficient knowledge about the problem (and background), you are ready to propose your own solution

How to Find a Paper?

All papers form a citation graph

  • Your supervisor should give you some seed papers
    • if not, quit now
    • if of low-quality, quit now
    • traverse the graph (from the seeds) to find more papers
  • Web resources: Google Scholar recommendations, hacker news, blogs, tweets, github repos (lot of paper lists), ...
  • Textbooks: further reading (usually awesome classical papers!)
  • Magazines: Communications of the ACM, IEEE Spectrum, IEEE Software, ...

How Many Should I Read?

A LOT!

  • Papers: 100/year (~1000 pages, very dense)
    • 2 per week
  • Textbooks: 20/year (~10000 pages, very comprehensive)
    • better with online courses
  • Magazines: 24/year (~2400 pages, less dense)

How to read a paper

S. Keshav. How to read a paper, ACM SIGCOMM Computer Communication Review, 3(37), 2007.

Three-pass approach

  1. Quick scan
    • category, context, correctness, contributions, clarity
  2. Jot down the key points
    • read with care, but ignore details as proofs
  3. Virtual re-implement
    • re-create the work as if you were the author

Reading Paper: My Approach

Just like back-propagation in training neural networks.

New problem

  • Ask myself: why didn't I noticed this important problem?

Old problem, new solution

  • Ask myself: how to solve it? is there any tricky parts?

Finally

  • Watch the author's presentation to check their own understandings

Traps in Paper Reading

Only read papers related to my topic

  • be a good problem solver first

This paper is limited in XXXX. It's a piece of junk!

  • every paper has its limitation
  • try to be constructive: what can I learn from it?

I cannot hold!

  • you're not ready for this paper.

Short Summary

  1. Read a lot of papers
  2. Read really good papers (to raise your bar)
    • 90% “top-conference” papers have nearly zero “real” contributions
      • they may be doing very well job, but not groundbreaking
    • many non-top-conference papers have negative contributions

Example

FlashFill

Sumit Gulwani. Automating string processing in spreadsheets using input-output examples (POPL'11, Most Influential Test-of-Time Paper Award in 2021)

  • The most hard-to-read paper for this semester

1st pass: the problem is to find a program $P$ that generalizes existing “examples” (and 90% students stop here)

Ready for the Second Pass?

Ready for Reading This Paper?

I Cannot Hold 😂😂😂

(Unix is user-friendly; it's just choosy about who its friends are.)

Ready for the Second Pass?

What's this?

  • (ICPC WF'08 Problem I: Password Suspects) find the count of all $n$-length passwords that “contain” a given set of observations.

Read More.

Trust the authors: papers are self-contained. Reading them only requires textbook knowledge.

  • [Corollary] Go read (good enough) textbooks if you find you can't understand a paper
    • compilers, mathematical logics, algorithms, programing language theory (optional)

You can find open courses on Youtube (or Bilibili)

More Comments on Reading...

Feeling hard? See Manuel Blum's advice to a beginning graduate student.

Virtually Re-implement?

You may get a new paper (similar technique)!

End.


Drop course (and quit PhD) if you're not ready for the challenges.