caio / foca

mirror of https://caio.co/de/foca/

Home Page:https://caio.co/de/foca/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to use BroadcastHandler in a user-friendly way

jeromegn opened this issue · comments

I've stumbled upon this crate and it looks like it's of very high quality and has all the features I need for a little project.

Everything has been pretty clear in the docs and examples, except I'm not sure how to correctly use BroadcastHandler.

I'm assuming I should be able to use it to send / receive an enum that may represent multiple message types. However, BroadcastHandler::Broadcast requires AsRef<[u8]>. It seems like the only thing I can then really broadcast are serialized bytes.

Now, that might be OK, but it means if my messages have any kind of complexity to them, I have to constantly deserialize them inside in my Invalidates implementation to determine if they should be invalidated.

I get that this was built to run in embedded environment without too much resources. The serialization I'm doing is probably far heavier than intended for the use cases here. I could see how it would be entirely possible to read a u64 from the start of the message to know to invalidate it or not, but in my implementation there will be many different message variants and invalidating them might even require a database round-trip (TBD).

Any insight on how I could use the BroadcastHandler to disseminate more complex types of messages?

commented

Hello,

Glad to hear you're interested in this project, it's always nice to hear when something I build for fun is useful for someone else 😁

I half-expected to get questions about broadcasting, it's the clunkiest part of it all - I struggled to come up with a reasonable way of modelling it without making the type signature awful large or imposing too many allocs... I also haven't done anything close to serious with broadcasts, so it's likely that the current api is too rigid: I'm very open to ideas, patches, issues, anything really

That said, RE disseminating complex types: I don't think you'll be able to get away from a tag in the beginning of the payload to help invalidate things quickly if you are maximizing performance; This tag should probably be more complex than a primary key, something closer to a Logical Clock. If you're shipping this on conventional hardware, deserializing the whole payload every time may be perfectly fine.

My main thought as I implemented the broadcast handler was using it for disseminating CRDT operations and then something outside of foca would periodically reconcile the state with the authoritative database. But as I said, I haven't done anything serious on that end yet.

Come this weekend I'll try to sit down and write a simple-but-not-simplistic example - thanks for the motivation! Meanwhile, would be super helpful to hear more about what kind of data you'll be disseminating if you are able to share 😊

I figured as much. I think I understand you have a pretty different use case compared to ours. We do intend to use CRDTs at some point. Still designing the schema at this point.

I ended up using rkyv as a "view" on top of Bytes. I just lazily parse my messages as I need them. I do need to parse them every time receive_item or invalidates is called, but at least it's cheap with rkyv.

I'm also prefixing each broadcast with a u64 length (to keep the following buffers aligned to 8-bytes).

Something like zerocopy would also work, but we're probably going to be using incompatible types here.

I kept running in circle trying to find a way to work within these constraints:

  • My deserialized type needed to fulfill AsRef<[u8]>
  • I needed to be able to implement Invalidates which required getting information from my messages (parsing)
  • Also needed that from the receive_item function to determine if I should keep disseminating
  • I had to advance the provided impl Buf for the correct length

Eventually I went with something sub-optimal, but still better than serializing and deserializing to msgpack (or almost anything with serde).

Message types
#[derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize)]
#[archive_attr(repr(u8), derive(bytecheck::CheckBytes, Debug))]
enum CtMessage {
    V1(CtMessageV1),
}

#[derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize)]
#[archive_attr(repr(C), derive(bytecheck::CheckBytes, Debug))]
struct CtMessageV1 {
    id: u64,
}

const SIZE_OF_U64: usize = std::mem::size_of::<u64>();

struct RawCtMessage(Bytes);

impl RawCtMessage {
    fn parse(&self) -> std::result::Result<&ArchivedCtMessage, CheckArchiveError<EnumCheckError<u8>, DefaultValidatorError>> {
        rkyv::check_archived_root::<CtMessage>(&self.0[SIZE_OF_U64..])
    }
}

impl AsRef<[u8]> for RawCtMessage {
    fn as_ref(&self) -> &[u8] {
        self.0.as_ref()
    }
}

impl Invalidates for RawCtMessage {
    fn invalidates(&self, other: &Self) -> bool {
        match (self.parse(), other.parse()) {
            (Ok(a), Ok(b)) => false,
            _ => false,
        }
    }
}
BroadcastHandler implementation
impl BroadcastHandler for CtMessageBroadcaster {
    type Broadcast = RawCtMessage;

    type Error = BroadcastError;

    fn receive_item(&mut self, mut data: impl bytes::Buf) -> Result<Option<Self::Broadcast>, Self::Error> {
        info!("receive_item!");
        let remaining = data.remaining();
        info!("remaining: {remaining}");
        if remaining < SIZE_OF_U64 {
            return Err(BroadcastError::NotEnoughBytes);
        }

        let len = { data.chunk().get_u64() } as usize; // big endian length
        info!("msg len: {len}");
        let full_len = SIZE_OF_U64 + len;

        if remaining < full_len {
            return Err(BroadcastError::NotEnoughBytes);
        }

        let msg = RawCtMessage(data.copy_to_bytes(full_len));
        // ...
    }
}
Sending a message
let msg = rkyv::util::to_bytes::<_, 512>(&CtMessage::V1(CtMessageV1 { id: 1 })).expect("boom");
let mut buf = BytesMut::with_capacity(SIZE_OF_U64 + msg.len());
buf.put_u64(msg.len() as u64);
buf.extend_from_slice(msg.as_ref());
if let Err(e) = foca.add_broadcast(buf.as_ref()) {
    error!("error adding broadcast: {e}");
}

If you think this can be improved, I'd love to know! :)

commented

Thanks a lot for the code samples, def helped painting a better picture.

I'd say you're doing pretty much what I would do too; The only difference is that I would use a cheap type to help with the invalidation case if invalidation is actually necessary.

I've sketched out a WIP example here: bb6bab0

It has two types of broadcast data:

  1. node configuration: pretty much a hashmap of address to $anything, with the condition that only the node of address A sends (as in: foca::add_broadcast) configuration updates about node A
  2. an arbitrary operation with a UUID, which always gets broadcast unless we received the operation before; I think your case is pretty close to this - if I'm right, your invalidation code is simply return false :)

I won't be getting much more time for it this weekend, but I'll be touching it up as I go. Let me know what you think, hope it helps!

Thank you very much, that helps!

I eventually figured it out, but looking at your example, maybe I should improve on the data structure a little bit. I didn't want to use bincode because that required deserializing and serializing, it's also very Rust-specific. I'm going to stick with rkyv to avoid serializing and deserializing.

We are indeed returning false every time for invalidation. We might improve that at some point, but we are using CRDTs that make old operations no-ops (I believe). I could check if the changes I'm receiving are already in the "tree" and invalidate (or something like that).

One thing I noticed while using the broadcast: even the local node receives the broadcast. I'll have to add a way to identify the "gossiper" in my messages because that's not available within the BroadcastHandler::receive_item function. I assume that's due to a design limitation? I haven't looked too closely. We do not want to handle messages we are ourselves gossiping, usually because we've already applied the change before gossiping it.

commented

Glad to hear it helped! I used bincode simply for convenience, rkyv looks great for this I will definitely check it out when I get the chance :)

I wouldn't spend much effort on invalidation if returning false suffices- A smarter receive_item that checks the "tree" would be more than enough to drastically reduce the backlog if it becomes a problem.

And yeah you will receive an item you broadcast back, I should definitely make that clear in the docs, thanks! Foca doesn't really know where a broadcast originated from (it can only ever know who sent a message containing that broadcast), so if it's relevant you will need to add to the payload indeed.

commented

closing stale issues that seem resolved. feel free to reopen