[A5] Peer Review & Artifact Evaluation
截止日期：2020 年 12 月 23 日 23:59:59 (以服务器时间为准); The deadline is firm.
提交方法：在命令行中 (请确保拥有 curl 命令) 执行 (将学号、姓名、路径替换为你的个人信息)
curl http://jyywiki.cn/upload -F course=ISER2020 -F module=A5 -F stuid=学号 -F stuname=中文姓名 -F file=@待提交的文件路径
(正确提交会返回 “SUCC” 以及提交 id。在网页右上角输入提交 id 并刷新页面查看提交情况。如果你不在选课名单中，请告知 jyy 姓名学号。)
在本页面查看 Peer Review 的结果。
严格的同行评议是学术界能够运转的根本之一，尤其是 top-tier 的 conference/journal，通常都有很高的同行评议标准。在本实验中，你将会对三位其他同学的 Research Proposal 和 Artifacts 进行同行评议，你可以从其他同学的提交中看到值得学习借鉴的地方，同时也看到他们犯的错误，从而指导你进步。注意你收到的提交并不完全符合你的研究方向——这是正常的。Program committee 的成员很可能收到一份不是自己专业小领域的论文，但你依然可以以一个 knowledgeable 或是 informed outsider (大同行) 的角度去评价这份工作——是否易懂、逻辑是否可靠、实验论证是否充分等。
你可以在你 research proposal 的提交结果中找到下载你的 review package 的链接 (包含你需要评议的 3 个 research proposal 和 3 份 artifact 的链接，以及评议表格)。请以纯文本的格式填入你的 comments (允许使用 Markdown)，并用本页面的方式提交 (文本文件)。忘记提交 ID 的同学可以再次提交 (获得新的提交 ID)，或是邮件联系助教获取。
撰写 Review Comments
请尽可能详尽地提出你对这份工作的意见和建议，一般至少 300-500 词 (可以写得更长)。同时，也请尊重投稿者的工作。即便你认为这份 research proposal 有较大缺陷 (例如缺少 novelty 或 significance)，也请礼貌地提出建议，不要对作者进行人身攻击。
如果你收到的 research proposal 有违反课程 policy 的行为 (如超过篇幅、模板错误、未匿名、抄袭行为等)，可以给出 desk reject，并简短地附上理由。
首先，你需要根据 research proposal 的 novelty 和 significance 给 research proposal 综合评分，评价标准参照 top-tier conference，你根据你对这份工作的价值是否足够 top-conference 进行评价。评分分为四种：
- (strong) accept (+2)
- weak accept (+1)
- weak reject (-1)
- (strong) reject (-2)
在软件工程会议评审中，通常必须所有审稿人达成一致接收 (至少 weak accept)，论文才会被接收。因此有一个人反对都会导致论文被拒 (在约 20% 录用率的前提下，技术/贡献上的轻微问题都可能导致论文被拒)。评价的五个主要标准：
- significance，论文对 research community 的重要性
除去评分，comments 包含三部分，主要阐述对上述五个标准的解释，以及一些额外给作者的 comments (例如改进的意见、可能的未来方向等)：
- 全文概述 (paper summary)，用 100 词左右的篇幅概括你对全文贡献和亮点的理解。鉴于你可以同意或否认论文里的 claims (例如，你认为某些点不是贡献，或者你可能为作者发掘了他没有意识到的研究贡献)，你的概述可以不与论文的摘要等保持一致。
- 优点和缺点 (strengths and weakness)，用非常简要的观点概论文的优点和缺点，例如贡献充足/不足、方法设计合理/不合理、实验论据充分/不充分等。
- 详细意见/建议 (detailed comments for author)，针对论文中具体的技术细节、写作等进行评论，主要目的是帮助作者指出论文中存在的问题，以供未来改进。
下面是两篇已被接收论文收到的审稿意见，一个是 positive，另一个是 negative，供大家参考。大家可以参照这个形式编写，但描述的论点不局限于此。随着你论文投稿的增加，你会收到更多的 review comments。
Automatic self-validation for code coverage profilers (ASE19)
Comments below are scored with “accept (+2)”.
Authors present Cod, a metamorphic testing approach towards validation of coverage profilers. Cod uses the metamorphic relation that, given a program and coverage profile statistic, removing an "uncovered" statement from the original program should not affect the coverage profile of the modified program. When applied to gcov and llvm-cov, this simple MR revealed multiple faults, some of which have been reported to the developers and acknowledged as faults.
- an important and relevant problem
- clearly effective technique, well described
- found real faults
- specific to C, perhaps, but not a major flaw in any way
Comments for Authors
I enjoyed reading this paper. The focus is very clear, and the problem is an important one. The proposed method has been described clearly and evaluated against widely used real world software tools, and eventually found real faults acknowledged by the developers. I think this is a clear step forward when compared to the differential testing approach that is the state-of-the-art.
There are a few, relatively minor issues that can be addressed to improve the paper even further.
I think it would be better for the reader if authors can be a bit more specific about the limitations of differential testing in the introduction. The current description under Section II.B is good, but it would be better to understand what the problem is before the "approach" subsection in the introduction.
EMI is used without being introduced properly with a citation. I think the footnote 2 should be moved to Section I, the paragraph after "weak inconsistency".
I find it a bit strange that coverage based grey-box fuzzing takes up one third of the related work, as coverage is much more prevalent than that. Grey-box fuzzers will do their own instrumentation (as authors acknowledge with the case of AFL) and therefore are somewhat irrelevant to the coverage profilers. I think a discussion of techniques that rely on accurate coverage information would be more appropriate, such as many of regression testing techniques, or Spectrum Based Fault Localisation techniques. I would also comment that DeepXplore is really irrelevant here, as neuron coverage is not a structural coverage at all: apart from the terminology, "coverage", I don't see how it is related.
I honestly do not have much to add here... some may argue that these faults will not affect coverage measurement in any major way, to which I'd say that coverage profilers are such a fundamental tooling that it cannot hurt to get them right. Others may argue that this is relatively specific to C due to the complicated optimisation, to which I'd say that does not hurt the generalisability of the technique itself.
Automating object transformations for dynamic software updating via online execution synthesis (ECOOP18)
Comments below are scored with “weak reject (-1)”.
This paper proposes a technique for in-vivo update of software. The authors propose a technique to synthesise a backward execution that would restore the program state at the end of the snippet's execution to that at the beginning. This is done by observing only the end state and working out a sequence of operations that invert the forward execution. The sequence of inverse operations is then used to rebuild the current state once again from the old state by synthesising a forward execution using operators from a new version of the code.
- It is a promising and elegant line of work that does away with tracing operations on objects. One only needs to inspect the current state and the source code to synthesise object histories.
- The formalisation of value node collection in Section 4 is nice but it can be improved (see comments in the next section)
- The authors compare their work with TOS which seems to be the state of the art in dynamic software update
- AOTES, though elegant, is subject to invertibility of operators and limitations of static analysis. This severely restricts its generalisability.
- The choice of updated classes for evaluation is not well documented. This could be due to difficulties in finding good use cases for this work.
- The authors do not discuss or address limitations of their technique. In a system under execution, dynamic software update must provide guarantees of safety and progress. Any spurious update is catastrophic.
Comments for Authors
I think the work has promise but it is currently applied to the wrong aspect of the problem. It would be much useful to apply AOTES as a vetting engine to identify what changes can be pushed through at runtime rather than propose AOTES as a tool to automate object transformations which is a harder problem. AOTES would be a powerful and useful tool with formal guarantees of correctness than it is now.
As noted, Section is well written, but can be improved. Even though I am comfortable with the formalism, I had trouble understanding as terms are overloaded. Values and value nodes need to be adequately disambiguated. I suggest you drop the adjective 'value' from 'value nodes'. Using 'value' to denote operations is inherently confusing. There is also insufficient separation between method-level configurations and intra-method configurations. I suggest using two different terms for them. Terms like pre-heap, post-heap and expression stack should be briefly discussed for the non-expert reader. The definition of a symbolic heap is ambiguous as described.
You spend significant effort explaining how a value graph is constructed but do not use it to help you explain Algorithm 1. This is a missed opportunity. I believe traversal of the graph is a core element in Algorithm 1 but you do not make this explicit either in the algorithm itself or the accompanying text.
There are sentences in the evaluation which are ambiguous and not backed by an explanation. For example, I did not understand what the authors mean when they say: "we also exclude rare cases in which the current state does not contain sufficient information to determine the new state." Clarifying exactly which sorts of program state AOTES can handle and which not will strengthen the work.
The evaluation is unconvincing. It rests on a small, heavily curated corpus, seemingly selected to amenable to AOTES. What are the prospects for the general utility of AOTES?
撰写 Artifact Evaluation Comments
你需要根据作者提供的指导安装相应的依赖，然后运行作者提供的软件，进行一些简单的运行 (每个同学都应当有一些测试用例)，然后撰写简短的评议报告 (英文)，100-300 词即可，你可以考虑在 summary 中包括的 comments: