Logo image
Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval Datasets
Conference paper

Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval Datasets

Dengya Zhu, S. L. Nimmagadda, Kok Wai Wong and T. Reiners
30th International Conference on Information Systems Development (ISD 2022) (Cluj-Napoca, Romania, 31/08/2022–02/09/2022)
2022

Abstract

The quality of training/testing datasets is critical when a model is trained and evaluated by the annotated datasets. In Information Retrieval (IR), documents are annotated by human experts if they are relevant or not to a given query. Relevance judgment of human assessors is inherently subjective and dynamic. However, a small group of experts’ relevance judgment results are usually taken as ground truth to “objectively” evaluate the performance of an IR system. Recent trends intend to employ a group of judges, such as outsourcing, to alleviate the potentially biased judgment results stemmed from using only a single expert’s judgment. Nevertheless, different judges may have different opinions and may not agree with each other, and the inconsistency in human relevance judgment may affect the IR system evaluation results. Further, previous research focused mainly on the quality of documents, rather on the quality of queries submitted to an IR system. In this research, we introduce Relevance Judgment Convergence Degree (RJCD) to measure the quality of queries in the evaluation datasets. Experimental results reveal a strong correlation coefficient between the proposed RJCD score and the performance differences between two IR systems.

Details

Metrics

12 Record Views
Logo image