google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shared library use unsafe because of abseil linkage

danieldk opened this issue · comments

I am using my own Rust binding to sentencepiece. When I link against sentencepiece statically, things work as expected. However, when I use sentencepiece as a dynamic library, I get SIGBUS when a program exists and the destructor of absl::Flag is called:

Thread 1 "syntaxdot-0562c" received signal SIGBUS, Bus error.
0x00007fffe6a267f8 in absl::Flag<int>::~Flag() ()
   from /nix/store/yvdlp1n57b57fv0gf74v3adz3y2g4ghv-sentencepiece-0.1.94/lib/libsentencepiece.so.0
(gdb) bt
#0  0x00007fffe6a267f8 in absl::Flag<int>::~Flag() ()
   from /nix/store/yvdlp1n57b57fv0gf74v3adz3y2g4ghv-sentencepiece-0.1.94/lib/libsentencepiece.so.0
#1  0x00007fffe67c8cd6 in __cxa_finalize ()
   from /nix/store/kah5n342wz4i0s9lz9ka4bgz91xa2i94-glibc-2.32/lib/libc.so.6
#2  0x00007fffe69c9663 in __do_global_dtors_aux ()
   from /nix/store/yvdlp1n57b57fv0gf74v3adz3y2g4ghv-sentencepiece-0.1.94/lib/libsentencepiece.so.0
#3  0x00007fffffff8430 in ?? ()
#4  0x00007ffff7fe1d47 in _dl_fini ()
   from /nix/store/kah5n342wz4i0s9lz9ka4bgz91xa2i94-glibc-2.32/lib/ld-linux-x86-64.so.2
Backtrace stopped: frame did not save the PC

According to the abseil documentation, it's not safe to use Abseil in combination with loading/unloading of dynamic libraries.

It's likely that the compiler flags differ between Rust compilation and compiling of sentencepiece, leading to memory errors.

I modified src/CMakeList.txt to not compile third_party/absl/flags/flag.cc and init.cc into the (shared) library, but into the individual binaries, and I also modified the LOG macro to not use absl::GetFlag. With these changes the program terminates fine without SIGBUS (so apparently string_view does not suffer from the same problem?).

I don't have a ready-to use patch/PR, because I am not sure how the LOG macro should be adjusted to avoid linking absl::Flag into the shared library.

Ideally, we would like to get rid of the dependency to absl::Flags from sentencepiece library and remain them only *_main.cc.
I found that absl::Flags is used in sentencepiece_trainer API. I guess this dependency could be removed. Let me ping you when I can get rid of the dependency

In the latest main branch, I've removed the dependency from flags to C++ library. Could you try it? if there is no problem. it will be included in the next release? Please feel free to reopen this issue if necessary.

In the latest main branch, I've removed the dependency from flags to C++ library. Could you try it? if there is no problem. it will be included in the next release? Please feel free to reopen this issue if necessary.

No dice, still SIGBUS. I think the problem is that since init.cc is linked into the shared library, this will still cause the Flag destructor to be called when the library is unloaded.

Edit: seems like I cannot reopen this issue.

Removed absl::Flags completely from *so and *a library. Could you try the head?

Works like a charm now. Thanks a lot!

I am facing the same issue using the latest (v0.1.99) pip package. I also resolved it by building the pip package from source and removing the using absl::flags from src/init.h.