EMNLP 2024 · Main Conference ORAL

AgentReview

Exploring Peer Review Dynamics with LLM Agents

1Georgia Institute of Technology 2University of Science and Technology of China 3Carnegie Mellon University 4UC Santa Barbara 5UC Los Angeles 6William & Mary
* Equal contribution
53.8K+ Generated peer-review documents
523 ICLR papers simulated (2020–2023)
37.1% Variation in paper decisions due to bias
27.7% Decision change under author-identity leak
01 · Overview

Abstract

Peer review is a cornerstone of academic publishing, ensuring the integrity, novelty, and accuracy of published research. However, peer review faces challenges such as reviewer biases, inconsistent assessments, and concerns regarding the design of review mechanisms. These issues can obscure the fairness and integrity of scientific evaluation, especially given the increasing volume of academic submissions.

To address these issues, traditional studies in peer review often focus on statistical analyses of past reviews, which struggle to fully capture the multivariate nature of peer review — entangled factors like reviewer expertise, motivation, and bias contribute jointly to the review results, making them difficult to study in isolation. Moreover, ethical and privacy issues around investigating real-world peer review data further complicate the exploration of real-world peer review data.

In our recent work, accepted as a main track (Oral) paper at EMNLP 2024, we introduce AgentReview, the first large language model (LLM)-based framework designed to simulate the peer review process. AgentReview allows for the controlled simulation of peer review dynamics using LLM agents, enabling researchers to explore biases, reviewer roles, and decision mechanisms in a way that respects privacy while providing actionable insights into how the peer review process can be improved.

Overview of the AgentReview framework, illustrating how LLM agents simulate the peer review process and how multiple latent variables are disentangled.

Figure 1. AgentReview is an open and flexible framework designed to realistically simulate the peer review process. It enables controlled experiments to disentangle multiple variables in peer review, allowing for an in-depth examination of their effects on review outcomes.

02 · Watch

Video Overview

A short walkthrough of the AgentReview framework and its key findings on peer review dynamics.

03 · Framework

Inside AgentReview

AgentReview provides a flexible, extensible testbed for studying the impact of different roles and decision mechanisms in peer review. It follows the procedures of popular NLP and ML conferences, simulating reviewers, authors, and Area Chairs as LLM agents.

Key Roles & Dimensions

Reviewers

Modeled along three orthogonal attributes that shape behavior throughout the review process:

Commitment responsibleirresponsible
Intention benignmalicious
Knowledgeability knowledgeableunknowledgeable
Area Chairs (ACs)

Responsible for final decisions. We study three contrasting decision styles:

Authoritarian Prioritize their own evaluations over the collective input from reviewers.
Conformist Rely heavily on reviewers' evaluations, minimizing their own influence.
Inclusive Consider all available discussion: reviews, author rebuttals, and reviewer comments.

Peer Review Process Pipeline

Our simulation follows a structured 5-phase pipeline modeled after major ML and NLP venues.

Visualization of the 5-phase AgentReview peer review pipeline showing reviewer assessment, author rebuttal, reviewer-AC discussion, meta-review, and final decision.

Figure 2. The 5-phase pipeline for AgentReview.

1
Reviewer Assessment
Three reviewers independently evaluate each manuscript.
2
Author–Reviewer Discussion
Authors submit rebuttals addressing reviewer concerns.
3
Reviewer–AC Discussion
AC facilitates discussion; reviewers may update their assessments.
4
Meta-Review Compilation
AC synthesizes the full discussion into a meta-review.
5
Paper Decision
AC makes the final accept/reject decision based on all inputs.

A fixed acceptance rate of 32% matches the actual average for ICLR 2020–2023.

04 · Findings

Insights from Peer Review Simulations

We use AgentReview to simulate peer review with real ICLR papers (2020–2023) and probe sociological theories that explain how latent factors shape review outcomes.

Distribution of reasons to accept and reject across reviewer configurations.

Figure 3. Distribution of reasons to accept and reject, broken down across reviewer configurations.

F 01 Social Influence

Social Influence Theory suggests that individuals in a group revise their beliefs toward a common viewpoint. In AgentReview, reviewers align their ratings with peers during the rebuttal phase (Phase III. Reviewer–Author Discussion).

−27.2%  decrease in the standard deviation of ratings after Reviewer–AC discussion.
Average ratings as the number of biased reviewers varies.
F 02 Altruism Fatigue & Peer Effect

Peer review is unpaid and time-consuming. Altruism Fatigue describes how this demanding nature can lead to superficial assessments when reviewers feel their efforts go unrecognized.

−18.7%  drop in engagement during the Reviewer–AC discussion when just one irresponsible reviewer joins the panel.
F 03 Groupthink & Echo Chamber

When biased reviewers dominate, Groupthink can drive a panel to consensus without rigorous evaluation. Biased reviewers reinforce each other's negative opinions during Phase III, and their negativity spills over to unbiased peers.

−0.17 drop in ratings among biased reviewers  ·  −0.25 spillover decrease overall.
F 04 Authority Bias

Reviewers may give more favorable ratings to papers from well-known authors. When reviewers were aware of author identities, paper decisions shifted measurably, highlighting how prestige can unjustly influence the review process.

27.7%  of paper decisions change when author identities are revealed to reviewers.
Average ratings as the number of identity-aware reviewers varies.
05 · Implications

Implications for the Future of Peer Review

Better Design of Review Systems

AgentReview offers key insights into latent factors like bias, expertise, and motivation, helping design more equitable and efficient peer review systems. Future processes could incorporate checks for reviewer expertise and commitment to reduce biases and move toward more equitable, transparent, and efficient peer review.

Addressing Privacy Concerns

By simulating the peer review process, AgentReview ensures privacy and ethical standards without accessing sensitive real-world data, allowing large-scale studies that respect reviewer anonymity while offering actionable insights into improving peer review mechanisms.

Adaptive Review Mechanisms

AgentReview's simulations enable adaptive mechanisms, such as adjusting the number of reviewers or the process length based on a paper's complexity. This flexibility maintains rigor while streamlining the review process.

Cross-Disciplinary Insights

AgentReview can simulate peer review dynamics in interdisciplinary fields, identifying challenges where reviewers from different domains evaluate the same paper. These insights could help design better processes for interdisciplinary journals, ensuring fair, balanced assessments.

06 · Conclusion

Toward Fairer Peer Review

AgentReview marks significant progress in understanding peer review dynamics. Through LLM-driven simulations, we reveal how biases, expertise, and social factors impact academic evaluations. These insights lay the groundwork for enhancing the fairness and integrity of peer review.

Call to action. We encourage researchers, editors, and policymakers to explore AgentReview's findings and consider how AI can help create a more transparent and equitable academic publishing process.

07 · Cite

BibTeX

If you find this work useful, please consider citing us.

@inproceedings{jin-etal-2024-agentreview,
  title     = "{A}gent{R}eview: Exploring Peer Review Dynamics with {LLM} Agents",
  author    = "Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao
               and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong",
  editor    = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung",
  booktitle = "Proceedings of the 2024 Conference on Empirical Methods
               in Natural Language Processing",
  month     = nov,
  year      = "2024",
  address   = "Miami, Florida, USA",
  publisher = "Association for Computational Linguistics",
  url       = "https://aclanthology.org/2024.emnlp-main.70",
  pages     = "1208--1226",
}
@inproceedings{jin2024agentreview,
  title={AgentReview: Exploring Peer Review Dynamics with LLM Agents},
  author={Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao
          and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},
  booktitle={EMNLP},
  year={2024}
}