Error running rq

Question

Error running rq

meteorza opened this issue 8 years ago · comments

Hi

I have installed libraptorq successfully, but get the following library error when running rq on Ubuntu 16.04 (64bit). Is this a known issue? Or I am doing something wrong?

root@ubuntu:~# rq --debug encode --repair-symbols-rate 0.5 --drop-rate 0.3 test.csv test.csv.enc
Traceback (most recent call last):
  File "/usr/local/bin/rq", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/libraptorq/__main__.py", line 116, in main
    opts.min_subsymbol_size, opts.symbol_size, opts.max_memory ) as enc:
  File "/usr/local/lib/python2.7/dist-packages/libraptorq/__init__.py", line 179, in __init__
    super(RQEncoder, self).__init__()
  File "/usr/local/lib/python2.7/dist-packages/libraptorq/__init__.py", line 126, in __init__
    self._lib = self._ffi.dlopen('libRaptorQ.so') # ABI mode for simplicity
  File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 139, in dlopen
    lib, function_cache = _make_ffi_library(self, name, flags)
  File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 769, in _make_ffi_library
    backendlib = _load_backend_lib(backend, libname, flags)
  File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 758, in _load_backend_lib
    return backend.load_library(name, flags)
OSError: cannot load library libRaptorQ.so: /usr/local/lib/libRaptorQ.so: undefined symbol: LZ4_decompress_safe_continue

Mike Kazantsev · Answer 1 · Tue Aug 30 2016 13:25:58 GMT+0800 (China Standard Time)

Key line is cannot load library libRaptorQ.so: /usr/local/lib/libRaptorQ.so: undefined symbol: LZ4_decompress_safe_continue

You're running everything correctly, but I think Ubuntu package you're using is missing dependency on "liblz4" or something like that.
Easy workaround should be to simply install liblz4 manually.

lz4 dependency was added to libRaptorQ rather recently - only in latest v0.1.7 release - so I'd suggest you file a bug in Ubuntu bugzilla, so that libRaptorQ package maintainer would know about it and fix it.

Doesn't have much to do with python code, I think.

Mike Kazantsev · Answer 2 · Tue Aug 30 2016 18:16:36 GMT+0800 (China Standard Time)

Though to be fair, I didn't really test the thing with v0.1.7, might also happen to be some more generic cffi or libRaptorQ issue, but that seem unlikely.
Do let me know if installing liblz4 fixes it or if maybe you already have it and error is still there.

Thanks.

meteorza · Answer 3 · Tue Aug 30 2016 18:35:59 GMT+0800 (China Standard Time)

Hi

Yes, I believe liblz4 was already installed as part of the libRaptorQ 0.1.7 install. I have also installed manually just to be sure (apt-get install liblz4-dev).

root@ubuntu:/usr/local# lz4 -V
*** LZ4 command line interface 64-bits r131, by Yann Collet (Aug  2 2016) ***

root@ubuntu:~# ls /usr/local/lib/libRaptorQ.so -hal
lrwxrwxrwx 1 root root 15 Aug  2 19:19 /usr/local/lib/libRaptorQ.so -> libRaptorQ.so.1

I have also tried with libRaptorQ 0.1.6, but libRaptorQ only creates:

-rwxrwxrwx  1 root root  2.4M Aug 30 12:07 libRaptorQ-0.1.6.a
-rwxrwxrwx  1 root root  249K Aug 30 12:03 libRaptorQ-0.1.6.so

So it tried to create a softlink from libRaptorQ-0.1.6.so to to libRaptorQ.so, but rq then has problems accessing the library:
OSError: cannot load library libRaptorQ.so: libRaptorQ.so: cannot open shared object file: No such file or directory

Tried running ldconfig, but no still no luck.

Mike Kazantsev · Answer 4 · Tue Aug 30 2016 19:43:41 GMT+0800 (China Standard Time)

I have also tried with libRaptorQ 0.1.6, but libRaptorQ only creates:

Ah, yeah, it's the issue with 0.1.6 that I've raised here some time ago:
https://www.fenrirproject.org/Luker/libRaptorQ/issues/6

It's weird that you get similar linking error though, given that 0.1.6 should only be linked against very basic stuff like libc, maybe try ldd /usr/lib/libRaptorQ-0.1.6.so, see where it says "not found" there?

Symlinking/moving the lib should totally work in general, if lib itself works, of course.

I'll probably try 0.1.7 here later, see if I can reproduce the issue, but seeing how you get similar error for 0.1.6, suspect it might be due to something that I don't have here, unfortunately.

meteorza · Answer 5 · Tue Aug 30 2016 19:58:19 GMT+0800 (China Standard Time)

my library in ubuntu was installed in /usr/local/lib and not /usr/lib

So I created a link in /usr/lib to the 0.1.6 lib in /usr/local/lib and rq works now with 0.1.6. I will reinstall 0.1.7 to see if the same fix works.

Mike Kazantsev · Answer 6 · Tue Aug 30 2016 20:05:18 GMT+0800 (China Standard Time)

If that was the issue and moving stuff is a hassle, you can probably also do it by putting e.g. "local.conf" to /etc/ld.so.conf.d/ with /usr/local/lib in it.
If you're doing stuff like "make install" as root, it might be a bad idea to have it put stuff in /usr/lib.

meteorza · Answer 7 · Tue Aug 30 2016 20:59:59 GMT+0800 (China Standard Time)

Still getting same error with 0.1.7, so that is probably not library path issue.

OSError: cannot load library libRaptorQ.so: /usr/local/lib/libRaptorQ.so: undefined symbol: LZ4_decompress_safe_continue


ldd libRaptorQ.so 
root@ubuntu:/usr/local/lib# ldd libRaptorQ.so
        linux-vdso.so.1 =>  (0x00007fffa3940000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f71d31a0000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f71d2e97000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f71d2acd000)
        /lib64/ld-linux-x86-64.so.2 (0x0000559fb3df1000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f71d28b7000)

I can rq encode a file with 0.1.6, but with decoding I get:

root@ubuntu:~# rq --debug decode test.csv.enc test.txt
Floating point exception (core dumped)

Mike Kazantsev · Answer 8 · Wed Aug 31 2016 01:45:46 GMT+0800 (China Standard Time)

Floating point exception (core dumped)

Looking into that, found few loosely-related things, all with 0.1.6 that I have:

For some reason sometimes (?) nothing was encoded, --debug line even straight-up said "0 left in output", and encoding parameters were left at 0.
Which crashed decoder with that error, as it couldn't handle oti_common/oti_scheme being 0.
Which prompted me to refactor rq code blob a bit to make it a bit easier to work with, but when I finished, somehow error on the same exact input data was gone.

Maybe random bug due to crappy code in the module leaving something up for python's gc to free up before it's been used by libRaptorQ somewhere? Not sure.
Tried running script a bunch more times, couldn't reproduce that error, but found that for larger (e.g. 0.5M) files decoded data doesn't match source data (!!!).

Doesn't happen for small files (for $REASONS!?), which was actually my use-case for the thing.

So added sha256 check to rq script, so that it'd spot these right away, and it seem to be 100% reproducible now for larger files, though fairly sure I've ran it with 1M file before, comparing hashes and it all worked fine...

Might need to look into python wrapper code, check if I can spot something wrong there, but doubt it can do the straight-up data corruption as in the latter case.

Also need to try 0.1.7 for that other lz4 issue and to see if maybe latter thing is also gone there.

But seeing the general pattern, I'd advise you against using this module, unless you're totally ok with loosing your data, apparently it's a load of badly-tested crap, sorry.
Must be a damn miracle that it somehow encoded/decoded data for me at all.

Will also put up a warning in the README.

Mike Kazantsev · Answer 9 · Mon Sep 05 2016 13:29:33 GMT+0800 (China Standard Time)

Still getting same error with 0.1.7, so that is probably not library path issue.
OSError: cannot load library libRaptorQ.so: /usr/local/lib/libRaptorQ.so: undefined symbol: LZ4_decompress_safe_continue

Built libRaptorQ here with following PKGBUILD (nothing odd there, same process as in lib's README): https://github.com/mk-fg/archlinux-pkgbuilds/blob/master/libraptorq/PKGBUILD
Don't have any issues with lz4, so no idea what happens in your case, but fairly sure it's related to either how you build it or quirks of the distro you're using.

If you'll find what was the cause of that issue, probably worth reporting to libRaptorQ here - https://www.fenrirproject.org/Luker/libRaptorQ/issues - as it might be easy to address in the build system or such.

meteorza · Answer 10 · Mon Sep 05 2016 13:31:20 GMT+0800 (China Standard Time)

Thank you, will do.

Mike Kazantsev · Answer 11 · Mon Sep 05 2016 15:07:01 GMT+0800 (China Standard Time)

I should probably clarify my message above - while encoding itself works fine, and raises no lz4-related errors, there is still show-stopper issue that anything roughly >100K fails to decode.
And some smaller chunks I get from /dev/urandom fail to encode as well, with RaptorQ_Enc() returning 0 blocks for that data for some reason.

I've now looked over python code, and couldn't spot anything wrong there (e.g. memory-management gc-related issues sometimes pop-up when passing stuff to C code), so I'd still suggest to avoid using the module.

One obvious way to look further into it would be to write C/C++ tool from scratch that'd do same thing, and if it fails in exactly same way, likely that it's a bug in the lib, but I don't know if I'll get around to do that anytime soon.

Asked if maybe Luca would be up for the task, as I think such tool would be useful companion to the lib itself (e.g. maybe you won't be needing python wrapper at all then) here: https://www.fenrirproject.org/Luker/libRaptorQ/issues/9

Luca Fulchir · Answer 12 · Mon Sep 05 2016 16:32:38 GMT+0800 (China Standard Time)

Hi @ALL, I'm the developer for libRaptorQ.
This might be kind of a long post, but it seems there are a couple of things to explain. You can find most of these in the docs, I think.

1- LZ4 is an optional dependency in the master branch, but not in the 0.1.X releases. the master branch is now working towards v0.2, with new API, performance stuff and so on. I thought the "prealpha" version in the name would have been enough, along with the first lines in the readme... Any pointers on how to avoid future problems like this?

2- Sometimes the encoder gives back 0 blocks, or immediately returns: This probably means that there is something wrong in the encoder initialization. Symbol size must be a multiple of subsymbol_size, max_memory is related to the maximum amount of memory.

Funny stuff: what does "related to" mean? Don't know. v0.1 was modelled with RFC6330 in mind, and that parameter is needed. Might be maximum bytes for the internal algorithm? Close, but not really, as the actual amount of memory internally used is not that.
The amount of memory used is closer to: *2 (symbols_per_block^2) + size_of_input

But max_mem is used to divide your input in blocks, and choose the block size. You are also limited to 256 blocks, so if you put a really small amount of memory here, RQ won't work.
It sucks, and it is unneeded complexity. But that 's how the RFC goes.

Which is why v0.2 will change APIs, along with performance stuff. the API you are using will be under the RFC630:: C++ namespace, and will be moved to a rfc6330_ prefix in the C functions.
The new API will be much simpler, work only with a single block, and use the RaptorQ:: namespace for C++ and RaptorQ_ prefix for C.

So maybe you are using low parameters as max_mem, which would give more than 256 blocks?

3- I see you are using different sizes for subsymbols and symbols. That is only used to interleave the data (again, RFC), but IMHO it's useless. Interleaving makes sense for example in audio streaming so that losing a packet means losing a couple of bytes for multiple samples, basically reducing the quality without hearing the other stuttering. But it also means that the application has to actually de-interleave the data by itself, and be able to work with half-received packets, or wait for RQ to recover everything. Making this a requirement in the RFC for everyone was a really retarded thing to do and has a (low) performance impact. Suggestion: keep subsym_size == sym_size. Again, no such problems in v0.2.

4- I do not know python, so I can't really debug your code, sry.

5- v0.1.X is bufgix only, there are a couple of API problems and inefficiencies, I'd rather just work on v.02 now. But I might think about writing the binary you suggested, should be simple enough. I think one other is needed for streams of data, not just static files. Easy enough.
Problem: command line design: how to make it flexible enough? what are you looking for?

Mike Kazantsev · Answer 13 · Tue Sep 06 2016 04:08:55 GMT+0800 (China Standard Time)

1- LZ4 is an optional dependency in the master branch, but not in the 0.1.X releases.

Right, my bad, think I've ran grep over git-log and checked lz4-related commit date, then simply checked that vs date on git-tag's, without taking diverging branches into account at all - a mistake.

2- Sometimes the encoder gives back 0 blocks, or immediately returns: This probably means that there is something wrong in the encoder initialization. Symbol size must be a multiple of subsymbol_size, max_memory is related to the maximum amount of memory.

As mentioned in the issue on the libRaptorQ gitlab, parameters for that particular attached file are fairly fixed and seem to satisfy symbols criterias (ENC_32 subsymbol=8 symbol_size=16)...

But max-memory thing indeed makes a world of difference - running e.g. ./rq --debug encode -m 500 test.enc_fail test.enc_fail.enc && ./rq --debug decode test.enc_fail.enc test.enc_fail.dec works perfectly!

Same for larger files - bumping max-mem value works, doesn't produce symbols that result in corrupted data upon decoding anymore.

So maybe you are using low parameters as max_mem, which would give more than 256 blocks?

Yeah, exactly what happened.

Would be cool if lib did some sanity checks there, if at all possible, instead of silently producing stuff that won't be decodable, as that'd clearly indicate where error happens (in this case - script parameters).

Have to admit that I didn't actually read RFC, so script parameters have descriptions like this one:

-k bytes, --min-subsymbol-size bytes
                    No idea what it means, see RFC6330. Default: 8

And that's how blame shifts onto poor user who happens to run it ;)

I'll look over the RFC and try adding a bunch of sanity checks here in the module, I guess, and try to emphasize that unless one read the RFC, garbage will be produced, as clearly is the case here.

I see you are using different sizes for subsymbols and symbols. That is only used to interleave the data (again, RFC), but IMHO it's useless.
Suggestion: keep subsym_size == sym_size.

Yeah, will do, thanks for clarification.

I actually grabbed current defaults (and used them myself) from the same test_c.c "practical API example", which I guess has them as some kind of "weirdest thing possible" case - i.e. "if that works, everything will" - and not common defaults at all.

Again, reading through the RFC would've probably made me pick something saner there.

can't really debug your code

Actually, you've pretty much fixed it already ;)
Didn't mean to suggest anything like that anyway, of course.

v0.1.X is bufgix only, there are a couple of API problems and inefficiencies, I'd rather just work on v0.2 now

Got it, will update the thing for the new API when it comes out, I guess, thanks for the tip.
And definitely seem to be a good idea, given all the quirks of RFC you've mentioned above.

I think one other is needed for streams of data, not just static files. Easy enough.

Assuming you mean something like "read endless stream from stdin, write to stdout", I'd think that it can be same thing as for static files - split input into blocks of pre-defined size encode each one independently (with some fixed redundancy), output in the same order.

Problem: command line design: how to make it flexible enough? what are you looking for?

I think ideal option for both random person who wants to encode data and me who wants "reference encoder" tool: raptorq { encode | decode } [options] [--] [src] [dst]

src/dst is stdin/stdout, if not specified or "-", otherwise paths to files.
"options" allow to specify all the algo parameters, i.e. --symbol-size, --subsymbol-size, --max-memory, --repair-symbol-rate (or something like that), --threads, --stream-input-block-size (for streaming encoding).

For example, opts I've used in "rq" test-script for "encode" op are - https://gist.github.com/mk-fg/c9da03bcdd61991afcfbaaec3414ec7c
Guess one'd use something like getopt in C/C++ for these, python stdlib has argparse: https://github.com/mk-fg/python-libraptorq/blob/master/libraptorq/__main__.py#L143-L198
Any omitted option either:
- Uses sane default, if one can be picked for ALL inputs, e.g. default subsymbol size = symbol size, unless explicitly overidden by option.
- Auto-detects it from the input, e.g. --max-memory from input file size, if regular file via stat().st_size.
- If no explicit value specified, implicit value is required but cannot be determined (e.g. --max-memory for stdin stream of unknown size that should not be encoded as separate chunks) - exit immediately with error, e.g. "ERROR: at least one of either --max-memory or --stream-input-block-size options must be specified explicitly for stream input".
- Simply is required to be explicitly specified, if absolutely necessary or too important/dangerous to ever pick implicitly, e.g. --repair-symbol-rate option (has to be selected for specific use-case by user, any random value can be both bad or excessive).
Encoded data (symbols) format - something that is well-supported in most languages, encoded/decoded in streams and can represent binary blobs efficiently.

I've used JSON as it was easiest thing to reach for, but it's kinda terrible for this kind of data - super-inefficient, has no "binary blob" type (was using base64-encoded strings there) and everyone and their dog (case in point - python's stdlib) encodes/decodes it in one go, so bad idea to store more than few megs there.

Any explicitly specified binary structure shouldn't be hard to parse though, as long as some short description for it embedded right in the tool, accessible via e.g. --help-format option.
Encoding parameters (e.g. all the symbol-size stuff) and hash of source data (or block) also present in the output, so that reliable decoding doesn't require anything but that output.

That's an ideal though, as it should work for:

Random Joe (who didn't read RFC like me) - to simply run the thing as raptorq-tool encode -r 50% mydata{,.enc} and not have to read RFC and all the parameters and caveats there - tool will just do the obvious thing, encode data with 50% extra, picking all the sizes as needed for source size, safely, and put all the necessary parameters in the output, which I can then split, store or deliver as necessary.
App developer - use as pipeline, writing source data to stdin, reading stdout symbol-by-symbol and e.g. sending these over UDP as they're being read.
My testing use-case - specify all the parameters explicitly, get the output, see if it works (if this tool crashes - likely reportable bug, or that RFC parameters thing) and (roughly) matches what one gets when using lib via wrapper.
Random testing/benchmarking use-case - dd from /dev/urandom, sha256 it, feed to "encode" op, run "decode", compare hashes, try with diff parameters, etc.

For my "test if C API works at all with this file" case (especially now that I'm aware that parameters can cause silent data corruption), simple test.c with hard-coded path to file and parameters (which I can then tweak and re-compile) will work too, but probably not so much for a person who wants to just run it from terminal.

Mike Kazantsev · Answer 14 · Tue Sep 06 2016 15:32:06 GMT+0800 (China Standard Time)

Made all libRaptorQ encoder parameters like symbol-size and max-memory mandatory in the command-line interface of this module now, and added a bunch of warning to the README as to dangers of these, so hopefully no one will rely on them to "just work" for random use-case.

Guess both lz4 and undecodable data / crashes should be addressed by now, so closing the issue here.
Big thanks to both of you for pointing out these issues!