How to get raw content

Question

How to get raw content

BeastyZ opened this issue 7 months ago · comments

Hi, I'm trying to print the raw content of beir-v1.0.0-scifact.flat, but failed. However, I made it on msmarco-v1-passage like Hyde. Below is my code:

from pyserini.search import FaissSearcher, LuceneSearcher
corpus = LuceneSearcher.from_prebuilt_index('beir-v1.0.0-scifact.flat')
print(corpus.doc("0").raw())

Then, I got the error:
AttributeError: 'NoneType' object has no attribute 'raw'

BeastyZ · Answer 1 · Wed Dec 27 2023 18:46:18 GMT+0800 (China Standard Time)

I've found the solution

Ting-Wen Ko · Answer 2 · Tue Feb 06 2024 21:23:58 GMT+0800 (China Standard Time)

hey man what's your solution? did you index using faiss or lucene? I'm confused that we can't get raw content from faiss index, so I guess lucene is a must even if we search by faiss or lucene? @@

BeastyZ · Answer 3 · Thu Feb 08 2024 16:29:55 GMT+0800 (China Standard Time)

hey man what's your solution? did you index using faiss or lucene? I'm confused that we can't get raw content from faiss index, so I guess lucene is a must even if we search by faiss or lucene? @@

I think it has nothing to do with faiss or lucene. Whether it is prebuilt index or self-built index, you have to use the correct docid to get the raw content. If you index by yourself and want to get the raw content, maybe you can refer to usage-index

Ting-Wen Ko · Answer 4 · Sat Feb 17 2024 17:37:45 GMT+0800 (China Standard Time)

I get it, tks!