Exploring Peer Review Dynamics with LLM Agents
Peer review is a cornerstone of academic publishing, ensuring the integrity, novelty, and accuracy of published research. However, peer review faces challenges such as reviewer biases, inconsistent assessments, and concerns regarding the design of review mechanisms. These issues can obscure the fairness and integrity of scientific evaluation, especially given the increasing volume of academic submissions.
To address these issues, traditional studies in peer review often focus on statistical analyses of past reviews, which struggle to fully capture the multivariate nature of peer review — entangled factors like reviewer expertise, motivation, and bias contribute jointly to the review results, making them difficult to study in isolation. Moreover, ethical and privacy issues around investigating real-world peer review data further complicate the exploration of real-world peer review data.
In our recent work, accepted as a main track (Oral) paper at EMNLP 2024, we introduce AgentReview, the first large language model (LLM)-based framework designed to simulate the peer review process. AgentReview allows for the controlled simulation of peer review dynamics using LLM agents, enabling researchers to explore biases, reviewer roles, and decision mechanisms in a way that respects privacy while providing actionable insights into how the peer review process can be improved.
Figure 1. AgentReview is an open and flexible framework designed to realistically simulate the peer review process. It enables controlled experiments to disentangle multiple variables in peer review, allowing for an in-depth examination of their effects on review outcomes.
A short walkthrough of the AgentReview framework and its key findings on peer review dynamics.
AgentReview provides a flexible, extensible testbed for studying the impact of different roles and decision mechanisms in peer review. It follows the procedures of popular NLP and ML conferences, simulating reviewers, authors, and Area Chairs as LLM agents.
Modeled along three orthogonal attributes that shape behavior throughout the review process:
Responsible for final decisions. We study three contrasting decision styles:
Our simulation follows a structured 5-phase pipeline modeled after major ML and NLP venues.
Figure 2. The 5-phase pipeline for AgentReview.
A fixed acceptance rate of 32% matches the actual average for ICLR 2020–2023.
We use AgentReview to simulate peer review with real ICLR papers (2020–2023) and probe sociological theories that explain how latent factors shape review outcomes.
Figure 3. Distribution of reasons to accept and reject, broken down across reviewer configurations.
Social Influence Theory suggests that individuals in a group revise their beliefs toward a common viewpoint. In AgentReview, reviewers align their ratings with peers during the rebuttal phase (Phase III. Reviewer–Author Discussion).
Peer review is unpaid and time-consuming. Altruism Fatigue describes how this demanding nature can lead to superficial assessments when reviewers feel their efforts go unrecognized.
When biased reviewers dominate, Groupthink can drive a panel to consensus without rigorous evaluation. Biased reviewers reinforce each other's negative opinions during Phase III, and their negativity spills over to unbiased peers.
Reviewers may give more favorable ratings to papers from well-known authors. When reviewers were aware of author identities, paper decisions shifted measurably, highlighting how prestige can unjustly influence the review process.
AgentReview offers key insights into latent factors like bias, expertise, and motivation, helping design more equitable and efficient peer review systems. Future processes could incorporate checks for reviewer expertise and commitment to reduce biases and move toward more equitable, transparent, and efficient peer review.
By simulating the peer review process, AgentReview ensures privacy and ethical standards without accessing sensitive real-world data, allowing large-scale studies that respect reviewer anonymity while offering actionable insights into improving peer review mechanisms.
AgentReview's simulations enable adaptive mechanisms, such as adjusting the number of reviewers or the process length based on a paper's complexity. This flexibility maintains rigor while streamlining the review process.
AgentReview can simulate peer review dynamics in interdisciplinary fields, identifying challenges where reviewers from different domains evaluate the same paper. These insights could help design better processes for interdisciplinary journals, ensuring fair, balanced assessments.
AgentReview marks significant progress in understanding peer review dynamics. Through LLM-driven simulations, we reveal how biases, expertise, and social factors impact academic evaluations. These insights lay the groundwork for enhancing the fairness and integrity of peer review.
Call to action. We encourage researchers, editors, and policymakers to explore AgentReview's findings and consider how AI can help create a more transparent and equitable academic publishing process.
If you find this work useful, please consider citing us.
@inproceedings{jin-etal-2024-agentreview,
title = "{A}gent{R}eview: Exploring Peer Review Dynamics with {LLM} Agents",
author = "Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao
and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong",
editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods
in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.70",
pages = "1208--1226",
}
@inproceedings{jin2024agentreview,
title={AgentReview: Exploring Peer Review Dynamics with LLM Agents},
author={Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao
and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},
booktitle={EMNLP},
year={2024}
}