PretendoNetwork / nex-go

Barebones PRUDP/NEX server library written in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: PING ACKs can ACK DATA packets

wolfendale opened this issue · comments

Checked Existing

  • I have checked the repository for duplicate issues.

What happened?

When testing some low level connections, I found that if a PING packet is acknowledged by the client using a sequence ID which also has an in-flight data packet (really any reliable packet), we will acknowledge the data packet and not resend it. This is an incredibly unlikely scenario in most cases, and is most likely to happen in a newly established connection where the reliable counter and ping counter will both be similar

What did you expect to happen?

I expect ping ACKs for ping packets not to interfere with how we handle other resending reliable packets

Steps to reproduce?

  1. Start a minimal PRUDP server
  2. Establish a connection
  3. Send reliable packet to the client with a sequence id X
  4. Prevent client from ACKing the packet
  5. Send ping packet with sequence ID X
  6. ACK the ping packet

This will stop the resend scheduler for the data packet despite it not being ACK'd

Other relevant information. (OPTIONAL)

No response

Reopening as the PR is causing issues with unreliable PINGs

My two ideas of fixing this:

  1. We can do what nintendoclients does and send ping packets with the reliable flag set. This means they actually take up a reliable sequence ID which means it we know it won't clash with another packet. I haven't got access to any dumps from servers showing what their pings look like to see whether that's the behaviour we've seen from actual nintendo servers or not.
  2. Instead of checking for the reliable flag on an ack packet, we could check whether it's a ping packet instead, and then just make sure not to acknowledge the packet with that sequence id. If dumps from the real servers show that they're not sending reliable pings, I think that this must be what they do. I can't think of anything else. But it does feel hacky.

Neither of these options would deal with the fact that unreliable data packets would mess this up too. But I don't think unreliable data packets are actually used in NEX? Not sure if we've seen this in any other RDV stuff

Of the two I think I prefer (2) unless we have examples of reliable pings being sent by servers

Actually, thinking about it, how would this even happen?

As per the docs, there's at minimum 3 sequence ID counters

  1. Reliable packets (more than 1 if multiple substreams are used)
  2. Unreliable DATA packets
  3. Unreliable PING packets

The PING packets we send are unreliable, so they shouldn't be touching either of the first 2 counters. If they were sent as reliable then they'd have a proper sequence ID set, and would be safe to acknowledge in the SlidingWindow. I don't see how the issue would happen in the first place, since it's totally different counters?

That being said, option 2 is what the original client seems to be doing by default? Some details based on some rough decomp:

Inside nn::nex::PRUDPEndPoint::ServiceIncomingPacket the client pulls out the incoming packet type, and checks it at various points. eventually there's a branch checking if (packet_type < 3) with 2 blocks. This essentially splits the logic into "SYN, CONNECT, or DATA packet" and "DISCONNECT, PING, USER, etc. packet" respectively

Inside the first block the packets ACK flag is checked, and if so then nn::nex::PRUDPEndPoint::PacketAcknowledged is called and then it jumps towards the end (where it starts to handle the payload and then dispatch the packet)

Inside the second block it first checks if (packet_type != 3) (any packet besides DISCONNECT)

If it is a DISCONNECT packet, then it first checks the packets ACK flag. If set, then nn::nex::PRUDPEndPoint::PacketAcknowledged is called and then it jumps towards the end (same label used in the first block). If not set then nn::nex::PRUDPEndPoint::SendAggregateACK is called and the PRUDPEndPoint is released from the PRUDPStream (this is the handling for server->client DISCONNECT packets, killing the connection)

If not a DISCONNECT packet then it immediately checks if (packet_type != 4) (any packet besides PING)

If the packet is a PING packet then it checks if the ACK flag is not set or if some field on the PRUDPEndPoint is a null pointer (it seems to be a pointer to a PacketOut?)

If either of those checks are true, then it sets some other field on the PRUDPEndPoint to the packet sequence ID. Otherwise it compares the packet sequence ID to sequence ID of the packet stored on the PRUDPEndPoint, and if they aren't equal then nn::nex::PRUDPEndPoint::AdjustRTT is called. In both cases it eventually jumps towards the end (same label used in the first block)

If it's any packet besides a PING then it just immediately jumps towards the end (same label used in the first block) without any processing

I believe the issue is that we only have a single sequence ID counter being used for everything: the sliding window counter. Adding separate counters for unreliable PING and DATA packets would fix this issue

I believe the issue is that we only have a single sequence ID counter being used for everything: the sliding window counter. Adding separate counters for unreliable PING and DATA packets would fix this issue

I'm on mobile right now so it's hard to get a permalink. But connections have counters for both unreliable DATA and PING packets

image

I believe the issue is that we only have a single sequence ID counter being used for everything: the sliding window counter. Adding separate counters for unreliable PING and DATA packets would fix this issue

I'm on mobile right now so it's hard to get a permalink. But connections have counters for both unreliable DATA and PING packets

That is for outgoing packets though, on incoming ACK packets we are throwing them all into the sliding window, which affects PINGs because they are unreliable and thus use a separate counter

nex-go/prudp_endpoint.go

Lines 156 to 157 in f7b8a54

slidingWindow := connection.SlidingWindow(packet.SubstreamID())
slidingWindow.ResendScheduler.AcknowledgePacket(packet.SequenceID())

I believe the issue is that we only have a single sequence ID counter being used for everything: the sliding window counter. Adding separate counters for unreliable PING and DATA packets would fix this issue

I'm on mobile right now so it's hard to get a permalink. But connections have counters for both unreliable DATA and PING packets

That is for outgoing packets though, on incoming ACK packets we are throwing them all into the sliding window, which affects PINGs because they are unreliable and thus use a separate counter

nex-go/prudp_endpoint.go

Lines 156 to 157 in f7b8a54

slidingWindow := connection.SlidingWindow(packet.SubstreamID())
slidingWindow.ResendScheduler.AcknowledgePacket(packet.SequenceID())

Ah that's right! I missed that, yes there's the bug right there

@jonbarrow looking at your decomp steps I think we'd have to do quite a lot of refactoring how we handle incoming packets to match this.

For now are we happy to just exclude PING packets from acknowledging packets in the resend scheduler?

we'd have to do quite a lot of refactoring

I don't see how it's quite a lot? The bug was pointed out by @DaniElectra and is fairly trivial to fix, we just need to add 2 more counters for unreliable PING and DATA packets and update the correct counter

I don't see how it's quite a lot?

This was about if you wanted to make this work in the same way as the decomp you mentioned above, as this doesn't match how we process packets right now. If you don't want to do that then it's fine

The bug was pointed out by @DaniElectra and is fairly trivial to fix, we just need to add 2 more counters for unreliable PING and DATA packets and update the correct counter

Not sure what you mean by this. We already have these counters for outgoing, and the only counter we have for incoming packets is purely for reliable packets. This includes any reliable packets and is handled by the PacketDispatchQueue. This counter is needed because we want to order those packets correctly when we piece together a multi-fragment payload. This is mostly data packets, but we need to handle reliable pings here too because of #58. What would we use the other counters for? We don't support reordering of unreliable data packets, and ping packets don't have a payload that we need to do anything with.

The bug here is that when we handle ACK packets, we remove packets from the ResendScheduler purely based on their sequenceID and not based on the type of packet it is.

To prevent PING packets from doing this we can do as I suggested and just check the packet type before we remove them from the ResendScheduler (this seems similar to what you mentioned in the decomp, where nn::nex::PRUDPEndPoint::PacketAcknowledged is never called for PING packets.

For unreliable data packets. I don't think there's any way to prevent this as DATA ACK packets look the same for reliable and unreliable packets (since ACKs are always unreliable) But it's likely not an issue as it would only be a problem if the same substream was used for both reliable and unreliable DATA packets, which I don't think would ever happen.

Correct, I don't see any paths where PacketAcknowledged is called for PING packets.

On this note though, PacketDispatchQueue::Queue will add packets to the queue based only on the RELIABLE flag, regardless of type, which tells me that the official clients seem unable to actually handle RELIABLE PING packets from the server correctly? Since PING packets use their own counter, this would throw off the PacketDispatchQueue. NintendoClients seems to be able to handle this simply because it doesn't implement PacketDispatchQueue

On this note though, PacketDispatchQueue::Queue will add packets to the queue based only on the RELIABLE flag, regardless of type

Do you mean in our implementation or on official clients?

NintendoClients seems to be able to handle this simply because it doesn't implement PacketDispatchQueue

There's two parts to the problem:

Handling reordering of reliable packets

We use PacketDispatchQueue for this, originally we only added packets to it if they were DATA packets with a RELIABLE flag. This is why we had the issue with #58. Now we also add PING packets with a RELIABLE flag, but no other packet types.

NintendoClients uses SlidingWindow but it's doing exactly the same thing as our PacketDispatchQueue, they add any packets with the RELIABLE flag to the SlidingWindow

Handling ACKs for reliable packets

We use an instance of ResendScheduler stored on the SlidingWindow keyed on the substreamID, which stores a map of packets to resend with a key of the packet's sequenceID. When we receive an ACK packet, we look up the relevant ResendScheduler and then remove the packet based on sequenceID. This is where this problem is as different types of packets with the same sequenceID can interfere with each other

In NintendoClients the PRUDPClient instance has an ack_events field which stores packets awaiting acknowledgement in a map with a key of a tuple of:

  1. The packet type
  2. The packet substream (This is required as ack_events is stored on the client itself rather than stored per substream like our ResendScheduler)
  3. The packet sequenceID
    This doesn't have the issue we have above because the key is more specific.

Do you mean in our implementation or on official clients?

The official client. One of the very first things done in ServiceIncomingPacket is checking if a packet is RELIABLE and if so it adds it to the queue, regardless of type

All (valid) packets end up at the same handling logic at the end of the function (there's some sanity checks at the beginning with early returns but that's it), where GetNextToDispatch is called on the queue. Normally this wouldn't be an issue since PING packets are always sent as unreliable but in the case of an incoming RELIABLE PING the queue could be thrown off since there's only a single "next ID" counter and the queue map is based on the sequence ID (which can have overlap since PING packets have their own counter)

We already handle this case by checking that packets are DATA packets when we get them from GetNextToDispatch (link) is there anything like that happening in the official clients?

Yes, only DATA packets are dispatched by the official client (or if some value is true, though it seems to only ever get set to true when processing DATA packets). However we aren't implementing GetNextToDispatch and Dispatched 100% correctly, which I missed in the initial PR. GetNextToDispatch is what increases the internal counter, not Dispatched. That's why it seems like the official client doesn't handle RELIABLE PING packets correctly, since it will add the PING to the queue regardless of type, store it using it's sequence ID (which may either collide or be out of sync with the RELIABLE DATA counter), it will then get processed by GetNextToDispatch and one of 2 things will happen:

  1. The PING may be processed rather than the expected DATA
  2. The PING may have a sequence ID smaller than the next expected packet, and thus never be removed from the queue (which over a long enough period of time could eat up resources on the client) Disregard this point. Queue does check to see if the incoming packet's sequence ID is >= to the queue counter, and if not then it does nothing

Here's a rough decomp of ServiceIncomingPacket, isolating only what's done when a PING packet is being handled (heavily modified and trimmed):

void nn::nex::PRUDPEndPoint::ServiceIncomingPacket(nn::nex::PRUDPEndPoint *this, nn::nex::PacketIn *packet) {
	ushort type_flags = packet->type_flags;
	ushort packet_flags = type_flags >> 4;
	ushort packet_type = type_flags & 0xF;
	bool bVar5 = false;
	nn::nex::PacketDispatchQueue **dispatch_queue;

	// * Add packet to queue if it's reliable
	if (packet_flags & FLAG_RELIABLE) {
		this->field_0x330 = this->field_0x330 + 1;
		this->field_0x334 = this->field_0x334 + (uint)nn::nex::Packet::GetPayloadSize((nn::nex::Packet *)packet);
		dispatch_queue = (nn::nex::PacketDispatchQueue **)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->dispatch_queues, packet->substream_id);
		nn::nex::PacketDispatchQueue::Queue(*dispatch_queue, packet);
	}

	if (packet_type == PING) {
		ushort sequence_id;

		// * this->current_ping_packet is set in nn::nex::PRUDPEndPoint::StartPing
		// * which is called when processing the CONNECT packet
		if (!(packet_flags & FLAG_ACK) || (this->current_ping_packet == (nn::nex::PacketOut *)0x0)) {
				// * If the packet is an in-coming PING, store it's sequence ID?
				// * Not used anywhere else in this function
				this->field_0x2f4 = this->field_0x2f4 + 1;
				__ct__Q3_2nn3nex27LogicalClockTmpl__tm__4_UssFRCQ3_2nn3nex30LogicalClockTmpl__tm__7_Z1ZZ2Z(sequence_id, &packet->sequence_id);
				*(undefined4 *)&this->field_0x344 = GetValue__Q3_2nn3nex27LogicalClockTmpl__tm__4_UssCFv_Z1Z(sequence_id);
				__dt__Q3_2nn3nex27LogicalClockTmpl__tm__4_UssFv(sequence_id, 2);
			} else {
				ushort current_ping_sequence_id;

				// * If a PING ACK then check if its sequence ID matches the expected one
				// * and adjust the connections RTT as needed
				__ct__Q3_2nn3nex27LogicalClockTmpl__tm__4_UssFRCQ3_2nn3nex30LogicalClockTmpl__tm__7_Z1ZZ2Z(current_ping_sequence_id, &this->current_ping_packet->sequence_id);
				__ct__Q3_2nn3nex27LogicalClockTmpl__tm__4_UssFRCQ3_2nn3nex30LogicalClockTmpl__tm__7_Z1ZZ2Z(sequence_id, &packet->sequence_id);
				bool sequence_ids_match = __eq__Q3_2nn3nex27LogicalClockTmpl__tm__4_UssCFRCQ3_2nn3nex30LogicalClockTmpl__tm__7_Z1ZZ2Z_b(current_ping_sequence_id, sequence_id);
				__dt__Q3_2nn3nex27LogicalClockTmpl__tm__4_UssFv(sequence_id, 2);
				__dt__Q3_2nn3nex27LogicalClockTmpl__tm__4_UssFv(current_ping_sequence_id, 2);

				if (sequence_ids_match) {
					nn::nex::PRUDPEndPoint::AdjustRTT(this, packet, this->current_ping_packet);
					goto joined_r0x02e8f1c8;
				}
			}

			goto joined_r0x02e8f6bc;
	}

// * ALL packet types "goto" one of these
joined_r0x02e8f1c8:
	// * This is only ever set to "true" when handling DATA packets. Effectively "always false" for PING packets
	bVar5 = false;
joined_r0x02e8f6bc:
	if (!(packet_flags & FLAG_RELIABLE)) {
		// * Statistics on how many packets and the total number of
		// * bytes that have been sent?
		this->field_0x2d4 = this->field_0x2d4 + 1;
		this->field_0x2e4 = this->field_0x2e4 + (uint)nn::nex::Packet::GetPayloadSize((nn::nex::Packet *)packet);

		if (!(packet_flags & FLAG_0x20) && (packet_flags & FLAG_0x40)) {
				this->field_0x2e0 = this->field_0x2e0 + 1;
				this->field_0x2f0 = this->field_0x2f0 + (uint)nn::nex::Packet::GetPayloadSize((nn::nex::Packet *)packet);
		} else {
			this->field_0x2dc = this->field_0x2dc + 1;
			this->field_0x2ec = this->field_0x2ec + (uint)nn::nex::Packet::GetPayloadSize((nn::nex::Packet *)packet);
		}

		if (bVar5) {
			this->field_0x2d8 = this->field_0x2d8 + 1;
			this->field_0x2e8 = this->field_0x2e8 + (uint)nn::nex::Packet::GetPayloadSize((nn::nex::Packet *)packet);
			nn::nex::PRUDPEndPoint::Dispatch(this, packet); // * The signature also says a nn::nex::Time is passed, but it's never used?
		}
	}

	// * Start processing the incoming packets.
	// * "GetNextToDispatch" is what increases the sequence ID
	// * counter, not "Dispatched"! "Dispatched" only removes
	// * the packet from the queue!
	dispatch_queue = (nn::nex::PacketDispatchQueue **)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->dispatch_queues, packet->substream_id);
	nn::nex::PacketIn *next_packet = nn::nex::PacketDispatchQueue::GetNextToDispatch(*dispatch_queue);

	while (next_packet != (nn::nex::PacketIn *)0x0) {
		if ((next_packet->type_flags & 0xf) == DATA) {
			// * Only actually process DATA packets
			nn::nex::PRUDPEndPoint::Dispatch(this, next_packet); // * The signature also says a nn::nex::Time is passed, but it's never used?
		}

		nn::nex::PacketDispatchQueue::Dispatched(*dispatch_queue, next_packet);
		next_packet = nn::nex::PacketDispatchQueue::GetNextToDispatch(*dispatch_queue);
	}

	return;
}