RELiC: Retrieving Evidence for Literary Claims
(ACL 2022)


RELiC is a large scale dataset of 79k excerpts of literary scholarship, each containing a quotation from a primary source and the surrounding critical analysis. 79 public domain primary sources and over 8,836 secondary sources are represented in RELiC.

We have fine-tuned a dense retriever, dense-RELiC, that outperforms both pre-trained dense retrievers (DPR, SIM, c-REALM) and sparse retrievers (BM25). However, experiments and analysis by humans with English degrees indicate substantial room for improvement.

The Literary Evidence Retrieval Task

We describe the task of literary evidence retrieval as follows:

Given an excerpt of analysis that surrounds a masked out quote from a literary work, can a model retrieve the correct quote from the set of all passages in the work?

Example from Pride and Prejudice of the task of literary evidence retrieval.


Link to paper

  author={Katherine Thai and Yapei Chang and Kalpesh Krishna and Mohit Iyyer},
  Booktitle = {Association of Computational Linguistics},
  Year = "2022",
  Title={RELiC: Retrieving Evidence for Literary Claims}