My first experience as an Area Chair

10 minute read

Published:

I have reviewed 100s of papers across dozens of venues (conferences, workshops, journals)—pertaining to various fields (from cybersecurity to information systems) and of various “fame” (from top-tier ones, to less known ones). At the same time, I have also (co-)authored many peer-reviewed papers. Suffice it to say, I did get my fair share of experience with the peer-review system.

In this post, I would like to talk about my recent experience as an area chair (AC) of NeurIPS. The reasons are many, but above all I want to discuss how being an AC enabled me to “put in practice” some ideas that I had in mind w.r.t. the “issues” of the peer-reviewing system—at least according to my own vision. Importantly, the following are my thoughts and as of September 2024. It is likely that other researchers see things differently, and it is also likely that I will change my mind in the future.[a]I firmly believe the goal of reviewing should be twofold. On the one hand, it should help the chairs/editors make a sound decision on what to do with the paper. On the other hand, it should help the authors in improving their research. In theory, a good reviewer should accomplish both of these goals. Importantly: the subject, here, is the reviewer and not the review (a “review” is just one of the “deliverables” of a reviewer). This already outlines a crucial aspect: the reviewer must relate to two different entities: the authors, and the chairs/editors. Hence, as an AC, I expected reviewers to “help me”, while also helping the authors in the process.

Context

As an AC of NeurIPS2024, I had to oversee 13 papers. I will not go over the process of selecting reviewers, which is not relevant here. I will merely discuss what I saw—and how I acted—in fulfilling my duties as an AC. Importantly, by the time the reviews were released (July 30th, 2024), all papers had at least 4 reviews (this took a lot of effort from my end, but this is not something I’ll discuss here); no reviews were added after the release.

Of the 13 papers, four have been accepted; two have been withdrawn after the reviews were released (and will not be covered in this post); the remaining seven have been rejected.

What did I see (and what did I do)?

Among the 11 papers (accepted or rejected) in my batch, I witnessed…a lot of things. I will discuss the most relevant ones below; note that some of these may overlap with others.

First, I had cases of ChatGPT-written reviews. Two, to be exact. In both cases, I urged the reviewer to “provide more detail”. In one case, the review was changed substantially; in another case, the reviewer simply disappeared. Intriguingly, this “disappearing” reviewer recommended an “8: Strong Accept”, but never participated in the discussion. This paper was eventually rejected (all other reviewers were negative). This was quite sad.

I had (many) cases of unresponsive reviewers. By this I mean reviewers who never participated in the discussion and did not acknowledge the authors’ rebuttal. In these cases, I simply disregarded their reviews. To me, it was almost as if these reviewers never wrote their review. The NeurIPS guidelines clearly say that this is mandatory by reviewers—so not acknowledging reading the rebuttal is equivalent to not fulfilling your duties as a reviewer.

I had cases of poorly-responsive reviewers. By this I mean cases in which the reviewer merely stated “I read the rebuttal. My score did not change”. In these cases, I did a deep investigation: I read the response of the authors, and then determined if it made sense that the reviewer’s score did not change. This led to me “stealthily updating” (in my mind ☺) some scores, which I thought should have been higher due to concerns that had been addressed (and if the reviewer pointed out concerns that, despite being addressed, did not make them change their mind, then the review was not very respectful). However, this also led to me realising how convoluted some responses were—and I was not surprised that the reviewer simply wrote “my score did not change”.

I had cases of grumpy reviewers. By this I mean reviewers who were negative, and every attempt of the authors to make them change their mind was unsuccessful. These were very hard to deal with. My stance is that any paper can be improved and clarifications can always be made. A reviewer that interacts with the authors but never changes their mind despite repeated interactions is, in my view, somewhat biased. Potentially, this can be because they are steering the discussion towards points that are ultimately meaningless, or keep finding flaws that were not mentioned before whenever a given concern is addressed (and maybe the “new” concern is much less relevant than the first one). In these cases, too, I deeply investigated. This led to me “updating” some scores when I felt that there was no ground to maintain such low scores. Moreover, as a subcategory of the “grumpy reviewer” cases, I also had “one-grumpy reviewer” cases. These were those cases in which most reviewers were neutral or positive, and only one was negative. In these cases, I called out to such reviewer, explicitly requesting them to provide valid reasons for rejection that would counter all arguments of the “positive” reviewers. This never happened.

I had cases in which the authors pointed me to reviews that were “factually wrong”, or to the fact that “no reviewer responded”. I acknowledged these messages and reassured the authors that I was going to take action (if I hadn’t already), and that I would take that into account in elaborating my decision.

In either case, before writing my metareview, I wrote a shout-out to all reviewers summarizing the situation (likely a reject/accept) and to speak up if they wanted to object the decision.

Lessons Learned

Now, it may be surprising, but the four papers for which I proposed acceptance had relatively low scores (in absolute terms). Specifically: 4.8, 5.0, 5.0, 6.3 (on a 1–10 scale; note that 4 is “Borderline Reject” and 5 is “Borderline Accept” and 6 is “Weak Accept”). The reasons why I recommended acceptance are to be found in my explanations above. And I am glad that the SAC did not overrule my decisions (which were, by the way, motivated in the metareview).

So, what did I learn from this? Essentially the following.

  • As an AC, you have the power to “do good”. Do you remember that adversarial/lazy/grumpy reviewer (e.g., those who simply write “poor novelty” without pointing out any flaw and without providing any guideline)? You can “reasonably ignore them” as an AC (it takes effort, but it is very rewarding).[b]This is not to say that “grumpy” reviewers are bad (I, too, am a “grumpy” reviewer at times!). The point is avoiding having the “grumpy reviewer” monopolize the discussion and, hence, dictate the outcome of the paper—unless, of course, such reasons are rooted on factual grounds. In any event, the authors should always have the chance to respond.
  • As an AC, you should listen to the authors. I believe a crucial missing piece in the peer-review system is the absence of a “third party” that the authors can interact with before a decision is made. An AC can do this: if the authors feel that something is wrong, the AC can and should listen.
  • Ultimately, papers that were rejected are those for which (i) the majority of the reviewers agreed that the paper had issues, and (ii) that participated in the discussion, and that (iii) even after the authors’ responses, there were no clear path of acceptance (more below).
  • Authors can overcommit. For instance, in many cases authors carried out new experiments, but the paper was still rejected. This typically was because either (a) it was hard to put the experiments in the paper in a way that “made sense” (without having to rewrite the entire paper); or (b) the experiments were so complex that it was impossible to gauge their correctness during the discussion phase.
  • Authors can “abuse” the responses. Even though I tried to side with the authors, in some cases it was simply impossible to understand “what” the responses were talking about. The problem is not “lengthy responses”, but rather, “lengthy responses that miss the point of the review”.

In summary, I liked being an AC. The responsibility was high, and there is a lot of effort (including that not described above, such as finding reviewers and soliciting reviews). However, it can be very rewarding—if only because it allowed me to “rectify” the many cases that I see far too often in other venues, wherein a single negative review manages to dictate the decision of a paper, with the authors unable to do anything and the chairs, too, unable to respond (because “every decision is final” and “we must trust the PC members”[c]I think that especially today, due to the high reviewing load, it is better not to trust PC members and have a “third party” that moderates the discussion.).

Of course, this is not to say that the system is perfect.[d]I managed to achieve the above because I did a lot to ensure that each paper had at least 4 good reviews (some had 6). I sent personalized reminders to each reviewer and asked them to keep me updated with their status, if needed. And, of course, some people may disagree with my decisions. But I stand by my decisions: if at least some reviewers found some paper “interesting”, then unless other reviewers point out significant flaws the paper deserves to be accepted.