castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Home Page:http://pyserini.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to read topics: MIRACL_V10_FI_DEV java.io.IOException

wxywb opened this issue · comments

I executed following command

Press ENTER or type command to continue
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --language fi \
  --topics miracl-v1.0-fi-dev \
  --index miracl-v1.0-fi \
  --output run.miracl.bm25.fi.dev.txt2 \
  --bm25 --hits 1000

Traceback (most recent call last):
File "/home/xuyu/anaconda3/envs/pyserini/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/xuyu/anaconda3/envs/pyserini/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/xuyu/anaconda3/envs/pyserini/lib/python3.10/site-packages/pyserini/search/lucene/main.py", line 152, in
query_iterator = get_query_iterator(args.topics, TopicsFormat(args.topics_format))
File "/home/xuyu/anaconda3/envs/pyserini/lib/python3.10/site-packages/pyserini/query_iterator.py", line 187, in get_query_iterator
return mapping[topics_format].from_topics(topics_path)
File "/home/xuyu/anaconda3/envs/pyserini/lib/python3.10/site-packages/pyserini/query_iterator.py", line 104, in from_topics
topics = get_topics(topics_path)
File "/home/xuyu/anaconda3/envs/pyserini/lib/python3.10/site-packages/pyserini/search/_base.py", line 583, in get_topics
topics = JTopicReader.getTopicsWithStringIds(topics_mapping[collection_name])
File "jnius/jnius_export_class.pxi", line 876, in jnius.JavaMethod.call
File "jnius/jnius_export_class.pxi", line 1042, in jnius.JavaMethod.call_staticmethod
File "jnius/jnius_utils.pxi", line 79, in jnius.check_exception
jnius.JavaException: JVM exception occurred: Unable to read topics: MIRACL_V10_FI_DEV java.io.IOException

I attempted to evaluate the Finnish ('fi') language in the MIRACL dataset, but encountered an error. Can someone give me some clue how topics are handled in Pyserini so that the JVM could experience such an issue? I ran the evaluation for the Arabic ('ar') language, and it worked fine. Thank you.

I just tried the command on master - works fine for me... what version are you on? A dev release? Or try v0.35.0?

Reopen issue if you're still have trouble?