cannot download model
peiyaoli opened this issue · comments
Is there an existing issue for this?
- I have searched the existing issues and found nothing
Bug description
I try to search model with:
model_card = store.search(name="ChemBERTa-77M-MLM")[0]
but got:
AttributeError: 'ModelInfo' object has no attribute 'model_dump'
How to reproduce the bug
No response
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- Molfeat version (e.g., 0.1.0):
#- PyTorch Version (e.g., 1.10.0):
#- RDKit version (e.g., 2022.09.5):
#- scikit-learn version (e.g., 1.2.1):
#- OS (e.g., Linux):
#- How you installed Molfeat (`conda`, `pip`, source):
Additional context
No response
Hey @peiyaoli ! Thank you for reporting this bug! Before we investigate, it would be useful to have some additional information on your environment, specifically your Molfeat version, Python version, Pydantic version and how you installed your environment (i.e. pip, conda, source).
Something is wrong with downloading at the stage of hash sum comparison
The error is raised here: https://github.com/datamol-io/molfeat/blob/97855c6c7df2c46acb698d64eab60b08006c8936/molfeat/store/modelstore.py#L235C2-L235C2
@cwognum do you want to have a look ?
I'm not sure what's happening here.
I could reproduce the bug locally. As the error suggests, it appears the checksum computed when we created the artifact no longer matches the checksum we compute when downloading the artifacts locally. I'm not sure what causes this. Recreating the artifact using the ETL notebooks leads to an entirely new hashsum that doesn't match any of the artifacts in the thrown exception. I will investigate further!
@cwognum, did you manage to find the error ? Looking at the code, my first take would be that the order of the file in the shasum changed for some reasons ...
molfeat/molfeat/utils/commons.py
Lines 45 to 55 in 97855c6
Can you generate all randomization of the filelist order and check if you can recover the original shasum ?
That's indeed one of the things I considered. I haven't tried all combinations though, that's a good idea!
@maclandrol I tried the different permutations, but it does not include the actual hash.
Ok, I think for now, maybe just apply sort on the file names, to ensure the same consistent order, then recompute the hash and update the hash in the metadata.
I looked into a couple of things, but I'm not sure what started causing the issue, which is a very unsatisfying conclusion.
Some further notes for future reference:
- The datamol
fs
module was updated: datamol-io/datamol#210 - Things seemed to start breaking when
fsspec
released2023.9.2
, but nothing suspicious in the changelogs. Maaaaybe the changes related to caching?
I can imagine that the above changes affect the order of the files, but not actually the number of files or the content of the files. Since trying all permutation of file-paths to recover the expected hash doesn't work, I'm not sure what's happening here.
Anyways, the fix is luckily simple! I recreated the featurizer artifacts by simple rerunning the ETL notebooks and all seems to work again. I made a small PR to sort the to-be-hashed files: #86
Let me know if the issue persists or pops up again! If so, we will have to investigate further.
One final note: ChemGPT-1.2B and ChemGPT-19M is still running
Thanks @cwognum I'll check and let you know asap if it also fixes the issue on my side!
Bug description
Hello! @cwognum
I'm currently using molfeat in a conda env by running pip install molfeat=0.8.8. When I try to fetch a Pretrained HF transformers like GPT2-Zinc480, Roberta-Zinc480M-102M and MolT-5, I would get the following error message:
ModelStoreError: Can't retrieve model MolT5 from the store !
It seems related to caches and where the models are stored locally, what should I do?
How to reproduce the bug
from molfeat.trans.pretrained.hf_transformers import PretrainedHFTransformer
transformer = PretrainedHFTransformer(kind='MolT5', notation='selfies', dtype=float)
features = transformer(my_smiles_list)
Error messages and logs
ModelStoreError Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\molfeat\store\loader.py:100, in PretrainedStoreModel._load_or_raise(cls, name, download_path, store, **kwargs)
99 modelcard = store.search(name=name)[0]
--> 100 artifact_dir = store.download(modelcard, download_path, **kwargs)
101 except Exception as e:
File ~\anaconda3\lib\site-packages\molfeat\store\modelstore.py:216, in ModelStore.download(self, modelcard, output_dir, chunk_size, force)
215 mapper.fs.delete(output_dir, recursive=True)
--> 216 raise ModelStoreError(
217 f"""The destination artifact at {model_dest_path} has a different sha256sum ({cache_sha256sum}) """
218 f"""than the Remote artifact sha256sum ({modelcard.sha256sum}). The destination artifact has been removed !"""
219 )
221 return output_dir
ModelStoreError: The destination artifact at C:\Users\dd\AppData\Local\molfeat\molfeat\Cache/MolT5/model.save has a different sha256sum (e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) than the Remote artifact sha256sum (e0537549289bfffc9ba6a5fb17c5b8d031e1b04a17555fd8f6494ebe3ce79395). The destination artifact has been removed !
Environment
#- Molfeat version (e.g., 0.1.0): 0.8.8
#- Python version (e.g., 1.10.0): 3.10.9
#- RDKit version (e.g., 2022.09.5): 2023.03.1
#- scikit-learn version (e.g., 1.2.1): 1.2.1
#- OS (e.g., Linux): Windos 10 22H2
#- How you installed Molfeat (`conda`, `pip`, source): pip
Hi @dawndarkmusic, thanks for reporting! Could you try upgrading to the latest molfeat
version? We might have broken backwards compatibility with #86 by ordering the to-be-hashed files.
Ping @maclandrol - If we end up needing some more robust model / data versioning, I recently came across https://github.com/iterative/dvc which seems pretty powerful!
Hello @cwognum
I've upgraded the molfeat to version 0.9.5 but still got the same issue with using the same code.
Here are some screenshots while running the code, it generated the loading task bar every time and then popped out the error. When I used the PretrainedHFTransformer before it wouldn't show the task bar in my memory, should I restart the computer to test it again? Any help will be greatly appreciated! Thank you
Ping @maclandrol - If we end up needing some more robust model / data versioning, I recently came across https://github.com/iterative/dvc which seems pretty powerful!
Yes, moving away from the custom model store would be a good idea. We are getting too many issues related to GCS.
@dawndarkmusic can you follow the instruction here to delete your cache and try again ?
In your case, it's better to clear the whole cache directory, and then restart your python runtime.
import datamol as dm
import platformdirs
# delete the cache dir
path_dir = platformdirs.user_cache_dir("molfeat")
mapper = dm.fs.get_mapper(path_dir)
mapper.fs.delete(path_dir, recursive=True)
Hello @maclandrol
I've tried to clear the whole cache directory and restart it again but still got the same issue
Here are the recorded shorts of the situation, the pretrained transformer disappeared from the cache folder once the error popped out, sorry for bothering you and @cwognum .
Untitled.video.-.Made.with.Clipchamp.1.mp4
Thanks @dawndarkmusic, this is very strange. Even when deleting and restarting the interpreter, you are having the same issue ?
Some of the function are functools cached, so data is saved in memory. Normally by restarting the interpreter, it should purge that data too. @cwognum can you handle this ?
Hello @maclandrol
If deleting and restarting the interpreter means to shut down the whole juypter notebook and restart the kernel, then yes, I'm still having the issue right now and I couldn't figure the reason.
However, after I tried using another PC to install molfeat and run the code again, the code went fine and nothing happened. Guess it's just having some troubles in my PC, I'll try to figure it out by myself. Sorry for bothering.
@cwognum @maclandrol It seems this issue can be closed.