How to get raw content
BeastyZ opened this issue · comments
Hi, I'm trying to print the raw content of beir-v1.0.0-scifact.flat, but failed. However, I made it on msmarco-v1-passage like Hyde. Below is my code:
from pyserini.search import FaissSearcher, LuceneSearcher
corpus = LuceneSearcher.from_prebuilt_index('beir-v1.0.0-scifact.flat')
print(corpus.doc("0").raw())
Then, I got the error:
AttributeError: 'NoneType' object has no attribute 'raw'
I've found the solution
hey man what's your solution? did you index using faiss or lucene? I'm confused that we can't get raw content from faiss index, so I guess lucene is a must even if we search by faiss or lucene? @@
hey man what's your solution? did you index using faiss or lucene? I'm confused that we can't get raw content from faiss index, so I guess lucene is a must even if we search by faiss or lucene? @@
I think it has nothing to do with faiss or lucene. Whether it is prebuilt index or self-built index, you have to use the correct docid to get the raw content. If you index by yourself and want to get the raw content, maybe you can refer to usage-index
I get it, tks!