Substantial Agreement of Referee Recommendations at a General Medical Journal – A Peer Review Evaluation at Deutsches Ärzteblatt International

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Peer review is the mainstay of editorial decision making for medical journals. There is a dearth of evaluations of journal peer review with regard to reliability and validity, particularly in the light of the wide variety of medical journals. Studies carried out so far indicate low agreement among reviewers. We present an analysis of the peer review process at a general medical journal, Deutsches Ärzteblatt International.

Methodology/Principal Findings

554 reviewer recommendations on 206 manuscripts submitted between 7/2008 and 12/2009 were analyzed: 7% recommended acceptance, 74% revision and 19% rejection. Concerning acceptance (with or without revision) versus rejection, there was a substantial agreement among reviewers (74.3% of pairs of recommendations) that was not reflected by Fleiss' or Cohen's kappa (<0.2). The agreement rate amounted to 84% for acceptance, but was only 31% for rejection. An alternative kappa-statistic, however, Gwet's kappa (AC1), indicated substantial agreement (0.63). Concordance between reviewer recommendation and editorial decision was almost perfect when reviewer recommendations were unanimous. The correlation of reviewer recommendations and citations as counted by Web of Science was low (partial correlation adjusted for year of publication: −0.03, n.s.).

Conclusions/Significance

Although our figures are similar to those reported in the literature our conclusion differs from the widely held view that reviewer agreement is low: Based on overall agreement we consider the concordance among reviewers sufficient for the purposes of editorial decision making. We believe that various measures, such as positive and negative agreement or alternative Kappa values are superior to the application of Cohen's or Fleiss' Kappa in the analysis of nominal or ordinal level data regarding reviewer agreement. Also, reviewer recommendations seem to be a poor proxy for citations because, for example, manuscripts will be changed considerably during the revision process.

Related collections

Most cited references 11

Record: found
Abstract: found
Article: not found

High agreement but low kappa: II. Resolving the paradoxes.

Jeffrey Feinstein, D Cicchetti (1989)

An omnibus index offers a single summary expression for a fourfold table of binary concordance among two observers. Among the available other omnibus indexes, none offers a satisfactory solution for the paradoxes that occur with p0 and kappa. The problem can be avoided only by using ppos and pneg as two separate indexes of proportionate agreement in the observers' positive and negative decisions. These two indexes, which are analogous to sensitivity and specificity for concordance in a diagnostic marker test, create the paradoxes formed when the chance correction in kappa is calculated as a product of the increment in the two indexes and the increment in marginal totals. If only a single omnibus index is used to compared different performances in observer variability, the paradoxes of kappa are desirable since they appropriately "penalize" inequalities in ppos and pneg. For better understanding of results and for planning improvements in the observers' performance, however, the omnibus value of kappa should always be accompanied by separate individual values of ppos and pneg.

0 comments Cited 137 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

Lutz Bornmann, Rüdiger Mutz, Hans‐Dieter Daniel (2010)

Background This paper presents the first meta-analysis for the inter-rater reliability (IRR) of journal peer reviews. IRR is defined as the extent to which two or more independent reviews of the same scientific document agree. Methodology/Principal Findings Altogether, 70 reliability coefficients (Cohen's Kappa, intra-class correlation [ICC], and Pearson product-moment correlation [r]) from 48 studies were taken into account in the meta-analysis. The studies were based on a total of 19,443 manuscripts; on average, each study had a sample size of 311 manuscripts (minimum: 28, maximum: 1983). The results of the meta-analysis confirmed the findings of the narrative literature reviews published to date: The level of IRR (mean ICC/r2 = .34, mean Cohen's Kappa = .17) was low. To explain the study-to-study variation of the IRR coefficients, meta-regression analyses were calculated using seven covariates. Two covariates that emerged in the meta-regression analyses as statistically significant to gain an approximate homogeneity of the intra-class correlations indicated that, firstly, the more manuscripts that a study is based on, the smaller the reported IRR coefficients are. Secondly, if the information of the rating system for reviewers was reported in a study, then this was associated with a smaller IRR coefficient than if the information was not conveyed. Conclusions/Significance Studies that report a high level of IRR are to be considered less credible than those with a low level of IRR. According to our meta-analysis the IRR of peer assessments is quite limited and needs improvement (e.g., reader system).

0 comments Cited 63 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The philosophical basis of peer review and the suppression of innovation.

D. Horrobin (1990)

Peer review can be performed successfully only if those involved have a clear idea as to its fundamental purpose. Most authors of articles on the subject assume that the purpose of peer review is quality control. This is an inadequate answer. The fundamental purpose of peer review in the biomedical sciences must be consistent with that of medicine itself, to cure sometimes, to relieve often, to comfort always. Peer review must therefore aim to facilitate the introduction into medicine of improved ways of curing, relieving, and comforting patients. The fulfillment of this aim requires both quality control and the encouragement of innovation. If an appropriate balance between the two is lost, then peer review will fail to fulfill its purpose.

0 comments Cited 58 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Daniele Fanelli: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2013

Publication date (Electronic): 2 May 2013

Volume: 8

Issue: 5

Electronic Location Identifier: e61401

Affiliations

[1 ]Deutsches Ärzteblatt International, Editorial Offices, Cologne, Germany

[2 ]Department of Psychiatry and Psychotherapy, University of Cologne Medical School, Cologne, Germany

[3 ]Institute of Medical Statistics, Informatics, and Epidemiology, University of Cologne Medical School, Cologne, Germany

The University of Edinburgh, United Kingdom

Author notes

* E-mail: baethge@ 123456aerzteblatt.de

Competing Interests: Christopher Baethge, and Stephan Mertens are employed by Deutsches Ärzteblatt International. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.

Conceived and designed the experiments: CB JF SM. Performed the experiments: CB SM. Analyzed the data: CB JF. Contributed reagents/materials/analysis tools: CB JF. Wrote the paper: CB JF SM.

Article

Publisher ID: PONE-D-12-35353

DOI: 10.1371/journal.pone.0061401

PMC ID: 3642182

PubMed ID: 23658692

SO-VID: b1be993f-7bbe-4ac0-8868-01ace5d41644

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 10 November 2012

Date accepted : 6 March 2013

Page count

Pages: 7

Funding

The authors have no support or funding to report.

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 12

See all cited by

Most referenced authors 262

See all reference authors

Substantial Agreement of Referee Recommendations at a General Medical Journal – A Peer Review Evaluation at Deutsches Ärzteblatt International

Read this article at

Abstract

Background

Methodology/Principal Findings

Conclusions/Significance

Related collections

PLOS Climate

Most cited references 11

High agreement but low kappa: II. Resolving the paradoxes.

A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

The philosophical basis of peer review and the suppression of innovation.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 83

Cited by 12

Most referenced authors 262