bitextor / bifixer

Tool to fix bitexts and tag near-duplicates for removal

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bifixer Indexerror: list index out of range

jokinlasa opened this issue · comments

After installing the bifixer software, I tried to run the test for testing if the software was working correctly and I get this Error:

`/bifixer-master/tests$ pytest
================================================ test session starts ================================================
platform linux -- Python 3.6.9, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /home/adminitzuli/ml-bifixer-jokin/bifixer-master/tests
collected 0 items / 1 error

====================================================== ERRORS =======================================================
_________________________________________ ERROR collecting test_bifixer.py __________________________________________
test_bifixer.py:116: in
class TestOrthoFix:
test_bifixer.py:118: in TestOrthoFix
replacements_es = restorative_cleaning.getReplacements("es")
../bifixer/restorative_cleaning.py:612: in getReplacements
replacements[field[0].strip()] = field[1].strip()
E IndexError: list index out of range
============================================== short test summary info ==============================================
ERROR test_bifixer.py - IndexError: list index out of range
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================= 1 error in 0.34s ==================================================
`

I get the same error trying to clean my memory, I have not had problems with the installation, and the python version is on point.
Have any of you experienced this error or any suggestions to solve it?

Thank you,

Jokin

The tests are working for me, have you changed any of the replacements files? The error is telling that a replacement file has a missing column.

No I haven't changed any replacement file, but I deleted all the files and downloaded them again, still not working...

Here I send you the error I get when I try to run the bifixer.py with my Corpus it may help you:
2021-05-20 09:30:39,633 - INFO - Arguments processed. 2021-05-20 09:30:39,633 - INFO - Executing main program... 2021-05-20 09:30:39,633 - INFO - Starting fixing text 2021-05-20 09:30:39,638 - ERROR - Traceback (most recent call last): File "bifixer.py", line 242, in <module> main(args) # Running main program File "bifixer.py", line 234, in main perform_fixing(args) File "bifixer.py", line 218, in perform_fixing fix_sentences(args) File "bifixer.py", line 118, in fix_sentences replacements_tlang = restorative_cleaning.getReplacements(args.trglang) File "/home/adminitzuli/ml-bifixer-jokin/bifixer-master/bifixer/restorative_cleaning.py", line 612, in getReplacements replacements[field[0].strip()] = field[1].strip() IndexError: list index out of range
Thank you,

Jokin

Hello @ZJaume, I just solved the problem, I used the replacement files from a virtual machine where I did some tests and now is working, don't know yet why is not working with the replacement files you have to download.

I have another question if you can answer me please,
is it a mistake when this happens when running the bifixer.py?:

`2021-05-20 09:40:53,390 - ERROR - Wrong column index on line 110576
2021-05-20 09:40:53,393 - ERROR - Traceback (most recent call last):
File "bifixer.py", line 130, in fix_sentences
target_sentence = parts[args.tcol - 1]
IndexError: list index out of range

2021-05-20 09:40:53,393 - ERROR - Wrong column index on line 110578
2021-05-20 09:40:53,394 - ERROR - Traceback (most recent call last):
File "bifixer.py", line 130, in fix_sentences
target_sentence = parts[args.tcol - 1]
IndexError: list index out of range

2021-05-20 09:40:53,394 - ERROR - Wrong column index on line 110580
2021-05-20 09:40:53,395 - ERROR - Traceback (most recent call last):
File "bifixer.py", line 130, in fix_sentences
target_sentence = parts[args.tcol - 1]
IndexError: list index out of range

2021-05-20 09:40:53,395 - ERROR - Wrong column index on line 110582
2021-05-20 09:40:53,968 - INFO - Text fixing finished
2021-05-20 09:40:53,969 - INFO - Finished
2021-05-20 09:40:53,969 - INFO - Input lines: 111303 rows
2021-05-20 09:40:53,969 - INFO - Output lines: 106245 rows
2021-05-20 09:40:53,969 - INFO - Elapsed time 57.37 s
2021-05-20 09:40:53,969 - INFO - Troughput: 1940 rows/s
`

Thank you Jaume,

Jokin

Hi Jokin,

Sorry for the misunderstanding, I hadn't the latest changes on my machine. After pulling them correctly I noticed that there were some replacements being separated by space instead of tab. Pushed a commit that fixes it.

Regarding your second question, the error is telling you that, probably the 110582 does not contain the target sentence or has less fields than it is supposed to have.