Submit to Leaderboard
[L/R] next to the model name refers to the number of left and right context sentences the model was trained on. Rank is ordered according to recall@5.
Zero-Shot Setting
These models were deployed in a zero-shot manner (i.e. not trained on the RELiC dataset).
Rank | Model [L/R] | Contributors | recall@k | |||||
---|---|---|---|---|---|---|---|---|
1 | 3 | 5 | 10 | 50 | 100 | |||
1 | RankGen [4/0] | Anonymous Razorbill | ||||||
PG-XL-inbk | ^ | 6.0 | 12.2 | 15.4 | 20.7 | 37.3 | 46.1 | |
all-XL-both | ^ | 4.9 | 9.2 | 11.9 | 16.5 | 31.5 | 39.9 | |
PG-XL-both | ^ | 4.5 | 8.4 | 11.0 | 15.1 | 27.9 | 35.0 | |
PG-base-both | ^ | 3.7 | 7.3 | 9.8 | 13.8 | 29.1 | 38.3 | |
PG-XL-gen | ^ | 0.7 | 1.9 | 2.7 | 4.1 | 9.1 | 12.8 | |
2 | ColBERT [1/1] | Khattab and Zaharia, 2020* | 2.9 | 6.0 | 7.8 | 11.0 | 21.4 | 27.9 |
3 | c-REALM [1/1] | Krishna et al., 2021* | 1.6 | 3.5 | 4.8 | 7.1 | 15.9 | 21.7 |
4 | DPR [1/1] | Karpukhin et al., 2020* | 1.3 | 3.0 | 4.3 | 76.6 | 15.4 | 22.2 |
5 | BM25 [1/1] | Robertson et al., 1995* | 1.2 | 3.2 | 4.2 | 5.9 | 12.5 | 17.0 |
6 | SIM [1/1] | Wieting et al., 2019* | 1.3 | 2.8 | 3.8 | 5.6 | 13.4 | 18.8 |
Trained Setting
These models were trained on the RELiC dataset.
Rank | Model [L/R] | Contributors | recall@k | |||||
---|---|---|---|---|---|---|---|---|
1 | 3 | 5 | 10 | 50 | 100 | |||
1 | dense-RELiC [4/4] | RELiC team, 2022 | 9.4 | 18.3 | 24.0 | 32.4 | 51.3 | 60.8 |
* Baseline reported in RELiC paper