AgentReview🔍: Exploring Academic Peer Review with LLM Agents

1Georgia Institute of Technology, 2University of Science and Technology of China,
3Carnegie Mellon University, 4University of California, Santa Barbara,
5University of California, Los Angeles, 6William & Mary

Abstract

Peer review is a cornerstone of academic publishing, ensuring the integrity, novelty, and accuracy of published research. However, peer review faces challenges such as reviewer biases, inconsistent assessments, and concerns regarding the design of review mechanisms. These issues can obscure the fairness and integrity of scientific evaluation, especially given the increasing volume of academic submissions.

To address these issues, traditional studies in peer review often focus on statistical analyses of past reviews, which struggle to fully capture the multivariate nature of peer review -- entangled factors like reviewer expertise, motivation, and bias contribute jointly to the review results, making them difficult to study in isolation. Moreover, ethical and privacy issues around investigating real-world peer review data further complicate the exploration of real-world peer review data.

In our recent work, accepted as a main track (Oral) paper at EMNLP 2024, we introduce AgentReview, the first large language model (LLM)-based framework designed to simulate the peer review process. AgentReview allows for the controlled simulation of peer review dynamics using LLM agents, enabling researchers to explore biases, reviewer roles, and decision mechanisms in a way that respects privacy while providing actionable insights into how the peer review process can be improved.

Overview of AgentReview.

Figure 1: AgentReview is an open and flexible framework designed to realistically simulate the peer review process. It enables controlled experiments to disentangle multiple variables in peer review, allowing for an in-depth examination of their effects on review outcomes.

Video

AgentReview

AgentReview provides a flexible and extensible testbed for studying the impact of different roles and decision mechanisms in peer review. It follows procedures of popular Natural Language Processing (NLP) and Machine Learning (ML) conferences.

The framework simulates the roles of reviewers, authors, and Area Chairs (ACs) as LLM agents, allowing us to observe how different various configurations lead to different simulation results.

Key Roles in AgentReview

AgentReview integrates three roles—reviewers, authors, and ACs—all powered by LLM agents.

  1. Reviewers are modeled based on three key attributes:
    • Commitment measures the reviewer's dedication and sense of responsibility. (responsible vs. irresponsible reviewers)
    • Intention describes the motivation behind the reviews, i.e. whether the reviewer genuinely helps authors improve their papers. (benign vs. malicious reviewers)
    • Knowledgeability measures the reviewer's expertise in the manuscript's subject area. (knowledgeable vs. unknowledgeable reviewers)
  2. Area Chairs (ACs) are responsible for final decisions. We categorize ACs into three styles:
    • Authoritarian ACs prioritize their own evaluations over the collective input from reviewers;
    • Conformist ACs rely heavily on other reviewers' evaluations, minimizing their own influence;
    • Inclusive ACs consider all available discussion and feedback, including reviews, author rebuttals, and reviewer comments, for final decisions.

Peer Review Process Pipeline

Review Pipeline.

Figure 2: The Pipeline for AgentReview.

Our simulation adopts a structured, 5-phase pipeline:

  1. Reviewer Assessment. Each manuscript is evaluated by three reviewers independently.
  2. Author-Reviewer Discussion. Authors submit rebuttals to address reviewers' concerns.
  3. Reviewer-AC Discussion. The AC facilitates discussions among reviewers, prompting updates to their initial assessments.
  4. Meta-Review Compilation. The AC synthesizes the discussions into a meta-review.
  5. Paper Decision. The AC makes the final decision on whether to accept or reject the paper, based on all gathered inputs.

We adopt a fixed acceptance rate of 32%, corresponding to the actual average acceptance rate for ICLR 2020 -- 2023.

Insights from Peer Review Simulations

Distribution of Reasons to Accept and Reject.

Figure 3: Distribution of Reasons to Accept and Reject.

We use AgentReview to simulate peer review process using real conference papers from ICLR 2020 to 2023, and explore several sociological theories that explain how multiple factors affect peer review outcomes.

Social Influence

Social Influence Theory suggests that individuals in a group tend to revise their beliefs towards a common viewpoint. In AgentReview, we observe that reviewers often align their ratings with their peers during the rebuttal phase (Phase III. Reviewer-Author Discussion), leading to a 27.2% decrease in the standard deviation of ratings.

Average Ratings with Varying Numbers of Biased Reviewers.

Altruism Fatigue and Peer Effect

Paper review is typically unpaid and time-consuming. Altruism Fatigue describes how this demanding nature of peer review can lead to superficial assessments when reviewers feel their efforts go unrecognized. Including just one _irresponsible_ reviewer led to an 18.7% drop in engagement during the Reviewer-AC discussion phase, as reviewers provided shorter, less detailed feedback.

Groupthink and Echo Chamber Effects

When biased reviewers dominate discussions, Groupthink can occur, where a group of individuals reaches a consensus without critical reasoning or evaluation of a manuscript. Biased reviewers often reinforce each other's negative opinions during Phase III. Reviewer-AC discussion. resulting in a 0.17 drop in ratings among the biased reviewers. This can also cause a spillover effect, where their negativity influences the assessments of unbiased reviewers, ultimately causing a 0.25 decrease in overall ratings.

Authority Bias

Reviewers may give more favorable ratings to papers from well-known authors. When reviewers were aware of the authors' identities, paper decisions changed by 27.7%, highlighting how prestige can unjustly influence the review process.

Average Ratings When Varying Numbers of Reviewers are Aware of the Authors' Identities.

Implications for the Future of Peer Review

Towards Better Design of Peer Review Systems

AgentReview offers key insights into latent factors like bias, expertise, and motivation, helping design more equitable and efficient peer review systems. Future processes could incorporate better checks for reviewer expertise and commitment to reduce biases and move towards more equitable, transparent, and efficient peer review processes.

Addressing Privacy Concerns with Simulation

By simulating the peer review process, AgentReview ensures privacy and ethical standards without accessing sensitive real-world data, allowing for large-scale studies that respect reviewer anonymity while offering actionable insights into improving peer review mechanisms.

Adaptive Peer Review Mechanisms

AgentReview's simulations enable adaptive mechanisms, such as adjusting the number of reviewers or the process length based on a paper’s complexity. This flexibility maintains rigor while streamlining the review process.

Cross-Disciplinary Insights

AgentReview can simulate peer review dynamics in interdisciplinary fields, identifying challenges where reviewers from different domains evaluate the same paper. These insights could help design better processes for interdisciplinary journals, ensuring fair, balanced assessments.

Conclusion

AgentReview marks significant progress in understanding peer review dynamics. Through LLM-driven simulations, we reveal how biases, expertise, and social factors impact academic evaluations. These insights lay the groundwork for enhancing the fairness and integrity of peer review.

Call to Action: We encourage researchers, editors, and policymakers to explore AGENTREVIEW’s findings and consider how AI can help create a more transparent and equitable academic publishing process.

BibTeX

@inproceedings{jin2024agentreview,
  title={AgentReview: Exploring Peer Review Dynamics with LLM Agents},
  author={Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},
  booktitle={EMNLP},
  year={2024}
}

@inproceedings{jin-etal-2024-agentreview,
  title = "{A}gent{R}eview: Exploring Peer Review Dynamics with {LLM} Agents",
  author = "Jin, Yiqiao  and
    Zhao, Qinlin  and
    Wang, Yiyang  and
    Chen, Hao  and
    Zhu, Kaijie  and
    Xiao, Yijia  and
    Wang, Jindong",
  editor = "Al-Onaizan, Yaser  and
    Bansal, Mohit  and
    Chen, Yun-Nung",
  booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
  month = nov,
  year = "2024",
  address = "Miami, Florida, USA",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2024.emnlp-main.70",
  pages = "1208--1226",
}