RELiC: Retrieving Evidence for Literary Claims
(ACL 2022)
About
RELiC is a large scale dataset of 79k excerpts of literary scholarship, each containing
a quotation from a primary source and the surrounding critical analysis. 79 public domain primary sources
and over 8,836 secondary sources are represented in RELiC.
We have fine-tuned a dense retriever, dense-RELiC, that outperforms both pre-trained dense retrievers (DPR, SIM, c-REALM)
and sparse retrievers (BM25). However, experiments and analysis by humans with English degrees indicate substantial room for improvement.
The Literary Evidence Retrieval Task
We describe the task of literary evidence retrieval as follows:
Given an excerpt of analysis that surrounds a masked out quote from a literary work, can a model retrieve the correct quote from the set of all passages in the work?
Paper
Link to paper@inproceedings{relic22,
author={Katherine Thai and Yapei Chang and Kalpesh Krishna and Mohit Iyyer},
Booktitle = {Association of Computational Linguistics},
Year = "2022",
Title={RELiC: Retrieving Evidence for Literary Claims}
}