facebookresearch / LASER

Language-Agnostic SEntence Representations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to import LaserEncoderPipeline

sumedhan-r opened this issue · comments

While calling LaserEncoderPipeline for the purpose of downstream NLP tasks, the first error that popped up was a ValueError, which stated

mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory

I asked the opinion of ChatGPT on the same and had got a code that slightly modified the line where the Config classes are declared.

After having made changes to all the respective Config related classes, I was getting another Error, namely ValidationError which stated

Object of unsupported type: '_MISSING_TYPE'
full_key:
reference_type=None
object_type=None

Below are attached some screenshots related to the errors. Please look into this at the earliest.

LASER 1_1 LASER 1_2 LASER 2_1 LASER 2_2 LASER 3_1 LASER 3_2

The suggestion given by ChatGPT is as follows :

@dataclass class FairseqConfig(FairseqDataclass): common: CommonConfig = field(default_factory=CommonConfig) common_eval: CommonEvalConfig = field(default_factory=CommonEvalConfig)

@sumedhan-r can you please indicate the versions of fairseq and omegaconf that you are using, and the minimal code required to reproduce the problem?

I am getting this error too. I was able to recreate it in Python 3.11 right from the import statement:
from laser_encoders import LaserEncoderPipeline

My sense is that ChatGPT's suggestion is on the right track. Specifically, there are a number of statements toward the end of fairseq/dataclass/configs.py that assign a mutable type as a default. Assigning these using the pattern field(default_factory=x) makes these errors go away. (See https://stackoverflow.com/questions/53632152/why-cant-dataclasses-have-mutable-defaults-in-their-class-attributes-declaratio for an additional explanation)

It appears the dubious pattern is still used through the latest version of fairseq, v0.12.2, which was released in 2022.

@dpdeb @sumedhan-r can you please indicate the versions of fairseq and omegaconf that cause the error?

When I am installing all packages from scratch (see this Colab notebook for repro), I get fairseq-0.12.2, omegaconf-2.0.6 and laser_encoders-0.0.1 installed by default, and they are working fine.

So, I have got the same error:

raise ex # set end OC_CAUSE=1 for full backtrace
^^^^^^^^
omegaconf.errors.ValidationError: Object of unsupported type: '_MISSING_TYPE'
full_key:
reference_type=None
object_type=None

I also changed classes fairseq\dataclass\configs.py and hydra\conf_init_.py to use default_factory and then received the error above. Has anyone come up with a fix? :)

@TheHappyLemon how can I reproduce your error?
Can you please share a Colab notebook or something that reproduces it?

@avidale
Well, I havent done anything special. Firstly, I installed laser_encoders through anaconda with pip install laser_encoders. Then when i wanted to just import the library I got error stating that I should use default_factory for some configs.py. So it is done just with

from laser_encoders import LaserEncoderPipeline

Initially, I thought I have to install newest versions of dependent package fairseq. But I just couldn`t install fairseq at all, because I was getting error: FileNotFoundError: [Errno 2] No such file or directory: 'VERSION.txt. There are multiple issues for this error, like skrub-data/skrub#476 I also tried to install it from local clone, but then I was getting errors described here facebookresearch/demucs#423 So I decided to abandon this idea and fix error with default_factory. I did fixes in miniconda3\Lib\site-packages\fairseq\dataclass\configs.py and in miniconda3\Lib\site-packages\hydra\conf_init_.py And then finally I received the error I commented.

I have following packages versions (ran pip install laser_encoders to get this info):
Requirement already satisfied: laser_encoders in c:\users\artem\miniconda3\lib\site-packages (0.0.1)
Requirement already satisfied: fairseq>=0.12.2 in c:\users\artem\miniconda3\lib\site-packages (from laser_encoders) (0.12.2)
Requirement already satisfied: omegaconf<2.1 in c:\users\artem\miniconda3\lib\site-packages (from fairseq>=0.12.2->laser_encoders) (2.0.6)

So I dont know what whould be the best way to reproduce. Maybe do a clean install? Idk :(

@avidale

I just created a new fresh virtual conda environment, ran pip install laser_encoders, succesfully installed following packages:

Installing collected packages: tbb, sentencepiece, intel-openmp, bitarray, antlr4-python3-runtime, unicategories, portalocker, omegaconf, mkl, cython, torch, sacremoses, sacrebleu, hydra-core, torchaudio, fairseq, laser_encoders
Successfully installed antlr4-python3-runtime-4.8 bitarray-2.9.2 cython-3.0.10 fairseq-0.12.2 hydra-core-1.0.7 intel-openmp-2021.4.0 laser_encoders-0.0.1 mkl-2021.4.0 omegaconf-2.0.6 portalocker-2.8.2 sacrebleu-2.4.2 sacremoses-0.1.0 sentencepiece-0.2.0 tbb-2021.12.0 torch-2.3.0 torchaudio-2.3.0 unicategories-0.1.2

And after running from laser_encoders import LaserEncoderPipeline, I got:

raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory

Error is reproduced on windows 11 with this environment:
conda create -n test_env python=3.11.8 anaconda

Apparently, Fairseq is not supporting Python 3.11 and newer versions; see e.g. facebookresearch/fairseq#5191.

Thus, there are 3 possible solutions for you:

  1. Downgrade your Python to 3.10 or below
  2. Fork Fairseq and fix the error (this pull request facebookresearch/fairseq#5359 might be what you need)
  3. Migrate from the LASER encoder (which uses Fairseq which has already become pretty stale) to the SONAR encoder (which performs better and is based on Fairseq2, a package that currently enjoys better support).