Tuning BM25F parameters
Watheq9 opened this issue · comments
Hi @cmacdonald,
I was trying to tune BM25F parameters. Per the documentation, BM25F is implemented, as described by [Zaragoza TREC-2004]. In Zaragoza's paper, there are 'b' and 'w' parameters per field, and one 'k' global parameter. My questions are as follows:
- I figured out that 'b' parameter is actually named 'c' in terrier, and 'w' corresponds to 'w.i' where i is the field number (starting from 0). So, is this mapping correct?
b = 1
bm25f = pt.BatchRetrieve(index, wmodel='BM25F',
controls={'w.0': 1.0, 'w.1': 0.5,
'c.0': b, 'c.1': b},
verbose=True)
- For 'k1' parameter, I could not find the corresponding name. So, could you please let me know what it is?
Just copying my supervisor Dr. @JMMackenzie
(1) yes, this looks right
(2) I dont think we have every tuned k1 in BM25F. 6 parameters was always enough!
Thanks @cmacdonald, for your reply!
What are the 6 parameters?
What are the 6 parameters?
normalisation i.e. b (c.f. c) values for each field and the weight.
Hey Craig, thanks for the help!
Just double checking - does this mean your (Terrier) BM25F doesn't include k? Or it's just not exposed?
not exposed in BM25F, while it is in BM25.
See
https://github.com/terrier-org/terrier-core/blob/5.x/modules/core/src/main/java/org/terrier/matching/models/BM25.java#L45
vs
https://github.com/terrier-org/terrier-core/blob/5.x/modules/core/src/main/java/org/terrier/matching/models/basicmodel/BM.java#L48
Pardon @cmacdonald, but what are the normalization parameters which are exposed in pyterrier, other than 'c'?
I tried to set 'b' and 'b.0' to multiple values, but none of them changed anything in the performance.
If I am not mistaken, the exposed parameters are just 'c' and the weight for each field. Please correct me if I am wrong.
If I am not mistaken, the exposed parameters are just 'c' and the weight for each field. Please correct me if I am wrong.
I'm not sure I follow the question. For BM25F, this is correct, right...?
bm25f = pt.BatchRetrieve(index, wmodel='BM25F',
controls={'w.0': 1.0, 'w.1': 0.5,
'c.0': b, 'c.1': b},
verbose=True)
Any update guys, or can I close the issue?
I think we've got it figured out now, thanks for the help! We'll get back to you if we need to re-open.