Calibrated Large Language Models for Binary Question Answering

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Quantifying the uncertainty of predictions made by large language models (LLMs) in binary text classification tasks remains a challenge. Calibration, in the context of LLMs, refers to the alignment between the model's predicted probabilities and the actual correctness of its predictions. A well-calibrated model should produce probabilities that accurately reflect the likelihood of its predictions being correct. We propose a novel approach that utilizes the inductive Venn--Abers predictor (IVAP) to calibrate the probabilities associated with the output tokens corresponding to the binary labels. Our experiments on the BoolQ dataset using the Llama 2 model demonstrate that IVAP consistently outperforms the commonly used temperature scaling method for various label token choices, achieving well-calibrated probabilities while maintaining high predictive quality. Our findings contribute to the understanding of calibration techniques for LLMs and provide a practical solution for obtaining reliable uncertainty estimates in binary question answering tasks, enhancing the interpretability and trustworthiness of LLM predictions.

Related collections

Author and article information

Journal

Publication date Created: 01 July 2024

Article

ArXiV ID: 2407.01122

SO-VID: 517bb969-b937-4c12-91a2-4cf6624690a0

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Comments Accepted to COPA 2024 (13th Symposium on Conformal and Probabilistic Prediction with Applications)

Categories cs.CL cs.LG

ScienceOpen disciplines: Theoretical computer science,Artificial intelligence

Data availability:

ScienceOpen disciplines: Theoretical computer science, Artificial intelligence

Calibrated Large Language Models for Binary Question Answering

Read this article at

Abstract

Related collections

Semantic Knowledge Base

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 228