jhelovuo / RustDDS

Rust implementation of Data Distribution Service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fragmentation issue

curiousmindflow opened this issue · comments

Hello,

We may have found a bug regarding the fragmentation, on the reading side.
You can try with a simple test here.

What is weird is that it's difficult to reproduce it when it runs without debugger, but one you debug it and stop at some breakpoint, the issue appear.

The way i can easily reproduce the bug is:

  • put a breakpoint at line 81
  • put a breakpoint at line 85
  • launch debug
  • the debugger will hit breakpoint line 85
  • continue at least 3 times
  • remove breakpoint line 85
  • continue

The issue should appear. If not, try to pass more time on breakpoint line 85.

Here the error message:

thread 'RustDDS Participant 0 event loop' panicked at 'attempt to subtract with overflow', /home/afalher/.cargo/registry/src/github.com-1ecc6299db9ec823/rustdds-0.7.11/src/messages/submessages/data_frag.rs:119:24
stack backtrace:
   0:     0x555555bd9b20 - std::backtrace_rs::backtrace::libunwind::trace::h1d00f3fcf4cb5ac4
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x555555bd9b20 - std::backtrace_rs::backtrace::trace_unsynchronized::h920a6ff332484ee2
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x555555bd9b20 - std::sys_common::backtrace::_print_fmt::hd7323920c925af6d
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x555555bd9b20 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3155a8c966b4beb5
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x555555bf94be - core::fmt::write::h062c617411b691df
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/fmt/mod.rs:1209:17
   5:     0x555555bd51c5 - std::io::Write::write_fmt::hb61fdf1275c61e1c
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/io/mod.rs:1682:15
   6:     0x555555bd98e5 - std::sys_common::backtrace::_print::hd1b4d9664ab500e0
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x555555bd98e5 - std::sys_common::backtrace::print::hca896ae22beb06cb
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x555555bdb19f - std::panicking::default_hook::{{closure}}::h0b5eeed5cf36ab5f
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:267:22
   9:     0x555555bdaeda - std::panicking::default_hook::h8932b573145a321b
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:286:9
  10:     0x555555bdb898 - std::panicking::rust_panic_with_hook::h4b1447a24e3e94f8
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:688:13
  11:     0x555555bdb5f1 - std::panicking::begin_panic_handler::{{closure}}::h8701da9995a3820c
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:577:13
  12:     0x555555bd9fcc - std::sys_common::backtrace::__rust_end_short_backtrace::hb696c5ed02a01598
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:137:18
  13:     0x555555bdb352 - rust_begin_unwind
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
  14:     0x5555555c46c3 - core::panicking::panic_fmt::h8aa706a976963c88
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
  15:     0x5555555c479d - core::panicking::panic::h622b2a38bce78ff3
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:115:5
  16:     0x5555559fc22e - rustdds::messages::submessages::data_frag::DataFrag::deserialize::h1657510a3b0f5d3b
                               at /home/afalher/.cargo/registry/src/github.com-1ecc6299db9ec823/rustdds-0.7.11/src/messages/submessages/data_frag.rs:119:24
  17:     0x555555a21913 - rustdds::serialization::message::Message::read_from_buffer::h15096c4ba3e112a2
                               at /home/afalher/.cargo/registry/src/github.com-1ecc6299db9ec823/rustdds-0.7.11/src/serialization/message.rs:130:13
  18:     0x555555b1017f - rustdds::dds::message_receiver::MessageReceiver::handle_received_packet::h8146833a14a540f4
                               at /home/afalher/.cargo/registry/src/github.com-1ecc6299db9ec823/rustdds-0.7.11/src/dds/message_receiver.rs:198:30
  19:     0x5555558b4cb5 - rustdds::dds::dp_event_loop::DPEventLoop::event_loop::ha37ac809a01a7eba
                               at /home/afalher/.cargo/registry/src/github.com-1ecc6299db9ec823/rustdds-0.7.11/src/dds/dp_event_loop.rs:252:19
  20:     0x555555a817a3 - rustdds::dds::participant::DomainParticipantInner::new::{{closure}}::h0e1b96ffdd68e708
                               at /home/afalher/.cargo/registry/src/github.com-1ecc6299db9ec823/rustdds-0.7.11/src/dds/participant.rs:767:9
  21:     0x5555558da1cf - std::sys_common::backtrace::__rust_begin_short_backtrace::ha74386db2d1b19da
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:121:18
  22:     0x5555559efc30 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::ha405ebf69f9befcf
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/thread/mod.rs:551:17
  23:     0x555555946944 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hc5ef3e1ed2f23fb3
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panic/unwind_safe.rs:271:9
  24:     0x555555a51e60 - std::panicking::try::do_call::h2f507a99aeca0b4b
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:483:40
  25:     0x555555a8549b - __rust_try
  26:     0x555555a51d3c - std::panicking::try::hc4fee7c0a817bcd1
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:447:19
  27:     0x5555559ed464 - std::panic::catch_unwind::h0ba597f749600e23
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panic.rs:137:14
  28:     0x5555559ef943 - std::thread::Builder::spawn_unchecked_::{{closure}}::h4481641130f3a67e
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/thread/mod.rs:550:30
  29:     0x5555557d22bf - core::ops::function::FnOnce::call_once{{vtable.shim}}::h8a98e9d97e146f84
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/ops/function.rs:251:5
  30:     0x555555bddf93 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h6202d10b0224e7b0
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/boxed.rs:1987:9
  31:     0x555555bddf93 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h4dbea73c9cec160b
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/boxed.rs:1987:9
  32:     0x555555bddf93 - std::sys::unix::thread::Thread::new::thread_start::h0a2f4c32ba2f2278
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys/unix/thread.rs:108:17
  33:     0x7ffff7c94b43 - start_thread
                               at ./nptl/./nptl/pthread_create.c:442:8
  34:     0x7ffff7d26a00 - clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  35:                0x0 - <unknown>
point_cloud received

Thank you for the reproduction example.

There were two bugs.

  • Subtraction underflow, which caused the panic. There was a check for the underflow condition, but for some reason only after the subtraction.
  • DATAFRAG packet contents were badly formed, which caused the underflow on receive.

Please test if the latest version in master fixes these.

Hello,

As always, thank you for your reactivity. It means a lot for us as we decided to heavily use your crate.

I just try the fix and so far it seems to be working.
I will wait for my colleagues to test it as well and as soon as i'm sufficiently confident i'll close this issue.

Thank you. That is fine.

The explanation above was very brief, but let me give some more details.

The root cause of this was that there was a disagreement between DATAFRAG message generation and decoding in RustDDS. Apparently, the implementations of the two parts were not written by the same person, or at least not at the same time.

But it was quite easy to make such a mistake. Let's see the RTPS spec , specifically Section "9.4.5.5 DataFrag Submessage", where a diagram defines the wire representation of DATAFRAG. The scale on the top of the diagram is bits, so a horizontal slice of the diagram represents 32 bits, or 4 bytes of data in a message.

Now, there is a field octetsToInlineQos. The value in this field must be a 16-bit integer declaring how many bytes of data there are after octetsToInlineQos, but before ParameterList. Someone had read this a part carelessly and had come up with a value of 24, but the correct value seems to be 28. Can you see how this happened?