Submit to Leaderboard

[L/R] next to the model name refers to the number of left and right context sentences the model was trained on. Rank is ordered according to recall@5.

Zero-Shot Setting

These models were deployed in a zero-shot manner (i.e. not trained on the RELiC dataset).

Rank Model [L/R] Contributors recall@k
1 3 5 10 50 100
1 RankGen [4/0] Anonymous Razorbill
PG-XL-inbk ^ 6.0 12.2 15.4 20.7 37.3 46.1
all-XL-both ^ 4.9 9.2 11.9 16.5 31.5 39.9
PG-XL-both ^ 4.5 8.4 11.0 15.1 27.9 35.0
PG-base-both ^ 3.7 7.3 9.8 13.8 29.1 38.3
PG-XL-gen ^ 0.7 1.9 2.7 4.1 9.1 12.8
2 ColBERT [1/1] Khattab and Zaharia, 2020* 2.9 6.0 7.8 11.0 21.4 27.9
3 c-REALM [1/1] Krishna et al., 2021* 1.6 3.5 4.8 7.1 15.9 21.7
4 DPR [1/1] Karpukhin et al., 2020* 1.3 3.0 4.3 76.6 15.4 22.2
5 BM25 [1/1] Robertson et al., 1995* 1.2 3.2 4.2 5.9 12.5 17.0
6 SIM [1/1] Wieting et al., 2019* 1.3 2.8 3.8 5.6 13.4 18.8

Trained Setting

These models were trained on the RELiC dataset.

Rank Model [L/R] Contributors recall@k
1 3 5 10 50 100
1 dense-RELiC [4/4] RELiC team, 2022 9.4 18.3 24.0 32.4 51.3 60.8

* Baseline reported in RELiC paper