estraier / tkrzw

a set of implementations of DBM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ppc64le check error

tieugene opened this issue · comments

Up to 1.0.10 everything was ok, but starting from 1.0.11 build ok for all architectures excepting ppc64le (PowerPC).
I proposed that this is because of 073bdc9 that fixed i686 bug.
But reverting this commit's changes back has no effect - ppc64le builds (more precisely make check) still fails:
https://koji.fedoraproject.org/koji/taskinfo?taskID=76258864
(By the way i686 build ok without 073bdc9).
Tail of build (make check-light) log (https://kojipkgs.fedoraproject.org//work/tasks/8959/76258959/build.log):

LD_LIBRARY_PATH=.:/usr/local/lib::. ./tkrzw_dbm_perf sequence --dbm poly --path casket.tkt \
  --iter 1000 --threads 2 --params "dbm=tree,num_buckets=500"
Setting: impl=poly num_iterations=1000 value_size=8 num_threads=2
.................................................. (00000050)
.................................................. (00000100)
.................................................. (00000150)
.................................................. (00000200)
.................................................. (00000250)
.................................................. (00000300)
.................................................. (00000350)
.................................................. (00000400)
.................................................. (00000450)
.................................................. (00000500)
.................................................. (00000550)
.................................................. (00000600)
.................................................. (00000650)
.................................................. (00000700)
.................................................. (00000750)
.................................................. (00000800)
.................................................. (00000850)
.................................................. (00000900)
.................................................. (00000950)
.................................................. (00001000)
Synchronizing: ... done (elapsed=0.000922)
Setting done: elapsed_time=0.006026 num_records=2000 qps=331893 mem=4288000
  file_size=43008
Getting: impl=poly num_iterations=1000 value_size=8 num_threads=2
.................................................. (00000050)
.................................................. (00000100)
.................................................. (00000150)
............................................. (00001000)
Getting done: elapsed_time=0.001653 num_records=2000 qps=1209954 mem=4480000
  file_size=43008
Iterating: impl=poly num_iterations=1000 value_size=8 num_threads=2
Iterating done: elapsed_time=0.000172 num_records=2000 qps=11618571 mem=4480000
  file_size=43008
Removing: impl=poly num_iterations=1000 value_size=8 num_threads=2
Synchronizing: ... done (elapsed=0.000399)
Removing done: elapsed_time=0.000607 num_records=2000 qps=3294819 mem=4544000
  file_size=43008
make[1]: Leaving directory '/builddir/build/BUILD/tkrzw-1.0.13'
Get failed: BROKEN_DATA_ERROR: invalid magic checksum
make[1]: *** [Makefile:758: check-perf-sampler] Error 1

Something wrong starting from 1.0.11

Thanks for reporting this.
It seems that the checksum algorithm has overflow or endian issue, though I'm not sure.
Does "ppc64le" represents Power PC, 64-bit, little endian?
Can I see the result of "tkrzw_build_util config" on the system?

Does "ppc64le" represents Power PC, 64-bit, little endian?

Exactly.

Can I see the result of "tkrzw_build_util config" on the system?

I have no this system in my hands. Packages are building at remote virtual hosts.
So I just can insert some command in build sequence and see log.
Now part of .spec file is:

%prep
%autosetup


%build
autoreconf -vif
%configure
%make_build apidoc all
for bin in \
  tkrzw_build_util tkrzw_str_perf tkrzw_file_perf tkrzw_dbm_perf tkrzw_dbm_util
do
  LD_LIBRARY_PATH=$PWD help2man --no-info --no-discard-stderr \
    --version-string='%{version}' --output="${bin}.1" \
    "./${bin}"
done


%install
%make_install
# Remove static .a file
rm -f %{buildroot}%{_libdir}/lib%{name}.a
# mans
install -d %{buildroot}%{_mandir}/man1
install -t %{buildroot}%{_mandir}/man1 -m 0644 -p tkrzw_*.1


%check
%make_build check-light

Where I have to insert tkrzw_build_util config?

I released 1.0.14. Now, "tkrzw_build_util config" is run at the beginning of "make check" and "make check-light" so we can confirm the result in the log.

https://kojipkgs.fedoraproject.org//work/tasks/5013/76265013/build.log
PS. you can notify me if/when bug will be fixed and before flushing new tag - I will test it on Fedora build farm.

Thanks. The problem doesn't seem so simple. I have to obtain a PowerPC environment to investigate it more.

Another notable difference between 1.0.10 and 1.0.11 is this change.
073bdc9

So, could you check this package where the change is reverted?
https://dbmx.net/tkrzw/pkg/tkrzw-1.0.14-nommapfix.tar.gz

As I wrote "I proposed that this is because of @073bdc9 that fixed i686 bug. But reverting this commit's changes back has no effect".
But I'll try

Oops, I was confused.
Perhaps, the bug had already been in 1.0.10 but not found because check-light was not there.

I tried to rebuild original 1.0.13 with the patch:

diff -dur tkrzw-1.0.13.orig/tkrzw_dbm_hash.cc tkrzw-1.0.13/tkrzw_dbm_hash.cc
--- tkrzw-1.0.13.orig/tkrzw_dbm_hash.cc	2021-09-22 21:08:26.000000000 +0300
+++ tkrzw-1.0.13/tkrzw_dbm_hash.cc	2021-09-25 10:14:30.466455004 +0300
@@ -1508,8 +1508,7 @@
           }
         } else {
           if (old_is_set) {
-            eff_data_size_.fetch_add(
-                static_cast<int32_t>(new_value.size()) - static_cast<int32_t>(old_value.size()));
+            eff_data_size_.fetch_add(new_value.size() - old_value.size());
           } else {
             num_records_.fetch_add(1);
             eff_data_size_.fetch_add(key.size() + new_value.size());
diff -dur tkrzw-1.0.13.orig/tkrzw_dbm_tree.cc tkrzw-1.0.13/tkrzw_dbm_tree.cc
--- tkrzw-1.0.13.orig/tkrzw_dbm_tree.cc	2021-09-22 21:08:26.000000000 +0300
+++ tkrzw-1.0.13/tkrzw_dbm_tree.cc	2021-09-25 10:17:19.167476222 +0300
@@ -1904,8 +1904,7 @@
         node->page_size +=
             static_cast<int32_t>(new_rec_size) - static_cast<int32_t>(old_rec_size);
         node->dirty = true;
-        eff_data_size_.fetch_add(
-            static_cast<int32_t>(new_value.size()) - static_cast<int32_t>(old_value_size));
+        eff_data_size_.fetch_add(new_value.size() - old_value_size);
         if (static_cast<int32_t>(new_value.size()) > old_value_size) {
           if (CheckLeafNodeToDivide(node)) {
             reorg_ids_.Insert(std::make_pair(node->id, std::string(records.front()->GetKey())));

Oops... Your last source ('nommapfix') works:
https://koji.fedoraproject.org/koji/taskinfo?taskID=76273881
If this is the solution it is not required to make new tag.
Now Fedora has 0.9.16 in stable repo and 1.0.13 in testing. I can just add required patch for 1.0.14 on next build.

@estraier , if you need access to a ppc64le hw, please send me an email to sharkcz at fedoraproject dot org, I can arrange you an account on our public machine.

I'm glad if I can do some experiments on the ppc64le environment.

Seems sharkcz propose you direct ssh access to virtual ppc64le host.
But this requires some moves from your side (e.g. you have to get Fedoraproject's signed certificate)

Right, I am offering a remote (ssh) access to a ppc64le virtual machine. Fedora contributors can access the Fedora public machines (https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainers), but we have also a solution for upstream projects/authors, who are not Fedora contributors. All I need is an email with your public ssh key.

1.0.15: bug still exists

Thanks Dan and Eugene.
My public key is the following.

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAtWY5XD4t3uYIOTZ0WUH2BGnFAdIdkVUZplUQB34CW/YOpWoeOJy/CKfXibZ+xubH7cSPnwLgg26ZMJWaxxzF1R3jXzJCTKe/LoqdNkqjSCqSiqyXLFOf2K4PdQ5MKA80VIMO+PDmtd3noOeEN/Mu667S2IYMYu5BD9XW+haiOwvwiQd8+9y4FKKBnPjZle9u5XjgKs3QVG1bDa0phWQu95q7/7QrVQp4AFGsBNDNfOltPp/zjG+tU9MSPa5jscX0YcWo4QD+TnQaJNv9wCXdZAJvy/6Bs+k/g9CpAPbmNYuBDXwrVUXNGQaPYEEg4isMCgposDHP91oo7ZkXnU8dkQ== mikio@kyoto

Please send me an email (sharkcz at fedoraproject.org), I will need to send back you the login details :-) Thanks.

Please send me an email (sharkcz at fedoraproject.org), I will need to send back you the login details :-) Thanks.

As the author published his pubkey - you can publish login detail here. Why not?
PS. Another problem is in private key, but let's wait.

It's a private system, thus I won't share details here. It's outside the official Fedora infra.

Login info sent, please check your spam box too. Gmail doesn't like me sometimes :-)

I can login on it.
Thanks!

"make check" and "make check-light" of 1.0.15 work well on the ppc64le system.
So, the platform doesn't seem the cause of the error.
Now I'm suspecting that some resource limitation on the build system might be the cause.

@tieugene could you run "ulimit -a" on the build system? I'd like to simulate it.
And, is the GCC version same as "gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC)"

@tieugene could you run "ulimit -a" on the build system? I'd like to simulate it. And, is the GCC version same as "gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC)"

According to log (search ==== ULIMIT/ ====):

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 97324
max locked memory           (kbytes, -l) 64
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 97324
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

PS. You can compare with other archs, where builds ok. Click required build and search build.log link.

1.0.16: still failed

I'm still struggling to reproduce the bug.

@tieugene Could you check the build of this package on the ppc64le system?
It adds "madvice" system call after "mremap" system call though it is not necessary on the POSIX specification.
https://dbmx.net/tkrzw/pkg/tkrzw-1.0.16-madvise.tar.gz

@tieugene Could you check the build of this package on the ppc64le system? It adds "madvice" system call after "mremap" system call though it is not necessary on the POSIX specification. https://dbmx.net/tkrzw/pkg/tkrzw-1.0.16-madvise.tar.gz

Failed
If you have an access to vhost with PPC64le (as @sharkcz promised) you can try to build the package in-place.
Just:

  1. Download srpm
  2. sudo dnf install rpm-build (if not installed)
  3. rpmbuild --rebuild tkrzw-...src.rpm

PS. I asked vhost chief to give me same access but no response yet

Thanks.
The problem seems related to the behavior of mmap so I'll investigate more.
Reproduction is the key for debugging so I appreciate that I can use the build system myself.

rpmbuild doesn't fail on my account on the ppc64le.
So, maybe I should consider another way to reproduce it.

I did rebuilds using mock on my local machine and on the public machine. It fails locally, but builds OK on the public one. Both used F-34 buildroot, so the software build environment was (exactly) the same. Isn't there multi-threading active in the software, for example? The public machine has only 2 vCPUs.

The command which dies is the following. It uses two active threads for test tasks and one therad waiting for them.
tkrzw_dbm_perf sequence --dbm poly --path casket.tkt --iter 1000 --threads 2 --params "dbm=tree,num_buckets=500"

What happens with the following command which uses 1 active thread?
tkrzw_dbm_perf sequence --dbm poly --path casket.tkt --iter 2000 --threads 1 --params "dbm=tree,num_buckets=500"

still fails

...
LD_LIBRARY_PATH=.:/usr/local/lib::. ./tkrzw_dbm_perf sequence --dbm poly --path casket.tkt \
  --iter 2000 --threads 1 --params "dbm=tree,num_buckets=500"
Setting: impl=poly num_iterations=2000 value_size=8 num_threads=1
.................................................. (00000100)
.................................................. (00000200)
.................................................. (00000300)
.................................................. (00000400)
.................................................. (00000500)
.................................................. (00000600)
.................................................. (00000700)
.................................................. (00000800)
.................................................. (00000900)
.................................................. (00001000)
.................................................. (00001100)
.................................................. (00001200)
.................................................. (00001300)
.................................................. (00001400)
.................................................. (00001500)
.................................................. (00001600)
.................................................. (00001700)
.................................................. (00001800)
.................................................. (00001900)
.................................................. (00002000)
Synchronizing: ... done (elapsed=0.000696)
Setting done: elapsed_time=0.010143 num_records=2000 qps=197180 mem=4160000
  file_size=41984

Getting: impl=poly num_iterations=2000 value_size=8 num_threads=1
.................................................. (00000100)
.................................................. (00000200)
.................................................. (00000300)
.................................................. (00000400)
.................................................. (00000500)
.................................................. (00000600)
.................................................. (00000700)
.................................................. (00000800)
.................................................. (00000900)
.................................................. (00001000)
.................................................. (00001100)
.................................................. (00001200)
.................................................. (00001300)
.................................................. (00001400)
.................................................. (00001500)
.........................................Get failed: BROKEN_DATA_ERROR: invalid magic checksum
 (00002000)
Getting done: elapsed_time=0.007697 num_records=2000 qps=259846 mem=4224000
  file_size=41984

Iterating: impl=poly num_iterations=2000 value_size=8 num_threads=1
Iterating done: elapsed_time=0.000082 num_records=2000 qps=24385488 mem=4224000
  file_size=41984

Removing: impl=poly num_iterations=2000 value_size=8 num_threads=1
Synchronizing: ... done (elapsed=0.000263)
Removing done: elapsed_time=0.000350 num_records=2000 qps=5714311 mem=4288000
  file_size=41984

make[1]: *** [Makefile:774: check-perf-sampler] Error 1

I will look closer what the difference between the machines could be. It's weird ...

Thanks. Reproducing the problem with one thread is a good news because the deterministic behavior is easy to analyze.

Thanks. Reproducing the problem with one thread is a good news because the deterministic behavior is easy to analyze.

By the way - one from my home host has 1 CPU with 1 (one) core (P4 3.0GHz).

@estraier, I think I have a public environment where I can reproduce the failure using @tieugene 's rpm, expect a new email :-)

I found the cause of the error.
On the PowerPC system, memory-mapped region managed by "mmap" beyond the file size metadata managed by "ftruncate" is filled with zero.
On other Linux systems, data in such regin is retained as long as the file size metadata is set to the size of mapped region before calling "msync" or "munmap".
Let me illustrate this by pseudo-code.

// #1: Initialize
mmap(size=10)
ftruncate(size=10)

// #2: write something
write(offset=0, size=5, data="01234")

// #3 truncate
mmap(size=10)
ftruncate(size=5) // This is smaller than the map size

// #4 write something
// This is beyond the file size metadata set by fturncate
write(offset=5, data="56789")

// #5 synchronize
ftruncate(size=10)
munmap(size=10)

The problem is, at the #4 step, the data is written beyond the size 5 set by ftruncte.
On many system, the data writtin in the map is retained and finally synchronized to the file at #5 step because the metadata is set 10 before calling munmap.

Though I'm not sure, the difference of the behavior seems related to Linux version.
Anyway, now that I know the cause, I can fix it.

// #3 truncate
mmap(size=10)
ftruncate(size=5) // This is smaller than the map size
// #4 write something
write(offset=5, data="56789")
// #5 synchronize
ftruncate(size=10)
munmap(size=10)
The problem is, at the #4 step, the data is written beyond the size 5 set by ftruncte. On many system, the data writtin in the map is retained and finally synchronized to the file at #5 step because the metadata is set 10 before calling munmap.

I'm not familiar in system programming at all but as for me if you try to write data beyond reserved array - nobody can warranty nothing. Is not it?

Basically, yes.
The region is actually allocaed by the mmap call. However, some data is beyond the file size metadata.
What happens in that case is not specified by the specification AFAIK.
I've been assuming that data beyond the metadata is retained and written back to the file as long as the metadata is set before calling "munmap". It is the case in all systems where I've tested my code. But, it's not the case on the PPC system at least.

Anyway, the practical solution is to set the file size metadata aligned to the map size. I'll modify the code so.

I released 1.0.17.
make check and make check-light passes on the both PPC systems.
I also tested it on Ubuntu 20.10 (x64), MacOS 11.3 (M1) and Windows 11.
Thanks Eugene and Dan. This fix couldn't be done without your help.

Nice: ✓F34…✓F36.
If python-0.1.23 wrapper is compatible with it I will push new version into Fedora repos.
Closed in f2a08e6

oops... Issue autoclosing not works. PR only.

As the latest version of Tkrzw (C++) abolishes the memory locking feature, test cases of olf packages of language bindings fail.
Thus, I released new versions for them.
As for Python, 0.1.24 is the new version.