WorksApplications / SudachiPy

Python version of Sudachi, a Japanese tokenizer.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

easy installable dictionary

izziiyt opened this issue · comments

explosion/spaCy#3756 (comment)

Asking PyPI organization allowing 60MB limit exception for full and core dictionary.
This issue is heavily related to https://github.com/WorksApplications/SudachiDict

Any progress on this? I think all you need to do is open an issue here and explain why the package is going to be large.

Hi, thanks for the reminder. Let me have a look and take action in the next few days.

Okay, so I will confirm with the team about the dictionary file and then open an issue on Pypa repo early next week.

I've added the following 3 dictionary packages on PyPI;

  1. small (40MB): https://pypi.org/project/SudachiDict-small/20191030/
  2. core (70MB): https://pypi.org/project/SudachiDict-core/0.0.0/
  3. full (150MB): https://pypi.org/project/SudachiDict-full/0.0.0/

The PyPI size limit is 60MB; For small I have uploaded the dictionary resource already, and for core and full I have created the PyPI package version 0.0.0 without the resource files and waiting for the PyPA to increase the size limit.

So you can already do the following to start using the tokenizer;

$ pip install sudachipy sudachidict-small

I have filed an issue to request the size limit increase;
pypa/packaging-problems#299

Once they increase the limit, I will upload core and full resources on PyPI, notify you here, and update the SudachiPy readme.

@sorami Did you get some response from the PyPI? I'm going to release the new version of GiNZA in next two weeks. I'd like to make the GiNZA packages available via the PyPI if the SudachDict-core would be also coming from the PyPI.

@hiroshi-matsuda
Unfortunately, I haven't heard anything from the PyPA team.

The same request made by someone else a day before us ( PyPI package size limit for splice-beakerx · Issue #298 · pypa/packaging-problems ) is in a same situation.

I've added a comment to the issue just now to ping them.

As soon as they change the size limit, our side is ready to release all three (small, core, full) dictionaries on the PyPI.

Just poking this issue. It looks like the correct place to make a request for a size increase is actually pypi-support, not the place linked to before. The 298 issue linked here was migrated by the admin there but for some reason the SudachiPy one wasn't, so I guess you need to open a new issue.

Thank you very much for the information, @polm !!

I have opened an issue there; pypi/support#131

Limit Request: sudachidict-{core,full} - {75, 160MB} · Issue #131 · pypa/pypi-support

Okay, so we can't distribute the dictionaries on PyPI. Let me consider the alternative approaches Jason introduced to us in the above issue.

I've (finally) set up the Python packages for the dictionaries; Now you can install them via PyPI.

$ pip install sudachidict_core
$ pip install sudachidict_small
$ pip install sudachidict_full

The dictionary binary files are not in the packages, but they are downloaded upon installation.