Are conference reviewers harsher when they have a submission of their own?

My area of academia runs mainly on conferences, as opposed to journals. This means that a few times each year, hundreds of researchers simultaneously submit papers about their latest and greatest project, and a “program committee” decides which of those get to appear in the proceedings of the upcoming conference.

There is usually considerable overlap between the members of that program committee and the researchers who submit papers. This is partly because if conferences didn’t allow the members of their program committee also to submit their own papers, few would be willing to serve on that committee. (But of course, nobody reviews their own submission!)

It seems to me that if a member of the program committee has a submission of their own under review, and wishes to act purely selfishly, then they should be as negative as possible about all the papers they are reviewing. Thus, the argument goes, they should increase the chances of their own paper ending up near the top of the rankings and ultimately being accepted.

To what extent does this actually happen?

I accessed some data for a recent computer science conference. I partitioned the 73 members of the program committee into the 47 who had no submission of their own and the 26 who had at least one. (Only a handful of reviewers had more than one submission, so I lumped them all together.)

I also looked at the average scores given by all those reviewers. Each reviewer reviewed between one and five papers, and scored each one on a 4-point scale (strong reject, weak reject, weak accept, strong accept). So I took each reviewer’s “mean review score” by a weighted average (strong reject = -1.5, weak reject = -0.5, weak accept = 0.5, strong accept = 1.5).

I found that the mean review scores given by the two groups are as follows:

No paper under review	-0.117
At least one paper under review	-0.478

Both numbers are negative, which means both groups of reviewers lean towards rejecting rather than accepting – what grumps! But those that have a paper of their own under review seem noticeably more negative than those that don’t.

Here’s a graphical version, which shows a bit more information:

The graph shows the distribution of mean review scores for the two groups (blue line = no submission, green line = at least one submission). The dots are the actual data points after dividing the review scores into buckets of width 0.2; the lines are moving averages (window = 2) which smoothes out the noise a bit. We see that both distributions are vaguely Gaussian, peaking a bit below a score of 0, but that the green line is noticeably skewed to the left (more negative scores).

I also calculated the Mann–Whitney U-test on this data, which promises to quantify whether this apparent skew is actually statistically significant. The number crunching says yes, there is a statistically significant effect here: there’s only about a 0.4% chance that this skew is simply caused by random noise.

So that’s quite interesting. Reviewers do seem to be harsher when they have a paper of their own under submission.

Of course, it’s perfectly possible that having a paper under submission doesn’t actually cause a reviewer to be harsher, but rather, that there is a third factor causing both of these things to be simply correlated.

One candidate factor is the seniority of the reviewer. Senior professors might give lower review scores because they’re more familiar with the research area and hence find it easier to spot missing citations, etc.; they may also be more likely to have papers under submission from their large research group. Or it might go the other way: senior professors might give higher scores because they gloss over the little flaws that junior folk pick up on, and they may be less likely to have papers under submission because they’re not focusing so urgently on building up their publication profile.

But no, it would seem not. Here’s a graph that shows the relationship between how many publications a reviewer has co-authored (according to their DBLP profile) and their review scores.

Visually, there seems to be no relationship, and sure enough, Spearman’s coefficient of rank correlation confirms that there is no statistically significant correlation here.

The same applies if we use “number of years since first publication” as another proxy measure for seniority, again using data from DBLP:

Again, there’s no visual correlation here, and Spearman agrees. So seniority doesn’t appear to be a confounding factor here.

Another factor to consider is whether the reviewer is affiliated with a University or not. Perhaps reviewers who are affiliated with a company or a government organisation are not so focused on growing their publication count, and hence may feel less “competitive” when reviewing.

My data says that yes, this does indeed seem to be an important factor. The graph below plots the scores given by reviewers currently affiliated with a University (blue line) and reviewers not currently affiliated with a University (green line).

It looks like the University-affiliated reviewers are harsher (mean=-0.329) than their non-University-affiliated counterparts (mean=0.027). The Mann–Whitney U-test reports a 4% chance that this is just down to random noise, so there’s just about a statistically significant effect here.

What’s certainly the case is that the non-University-affiliated reviewers are not submitting many papers: of the 17 reviewers, only one of them had a submission of their own, whereas the 56 University-affiliated reviewers had an average of 0.64 submissions.

One way to proceed here would simply be to discount all the non-University-affiliated reviewers. Here’s the graph obtained from doing that:

We see that the University-affiliated reviewers with a submission of their own (green line) are still noticeably harsher than those without (blue line). And the Mann–Whitney U-test confirms that there’s still a statistically significant effect going on here: there’s only a 2% chance that this is just down to random noise.

So, what to do about this?

One could imagine attaching a little “badge” to reviewers’ names in the discussion forums that indicates “I have at least one paper under submission myself”. That might discourage them from being overly negative about a paper. Then again, reviewers might overcompensate and become too positive in their reviews, for fear of being accused of acting in a self-interested manner!

A more subtle approach could be to make this information visible to program chairs while papers are being allocated to reviewers, so that they can try to ensure that each submission has a balanced number of reviewers from both groups.