k4mr4n / naga

Automatically exported from code.google.com/p/naga

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PacketReader and PacketWriter use arrays instead of objects

GoogleCodeExporter opened this issue · comments

PacketWriter use byte[] to return result to its delegate, but sometimes u need 
to parse incoming xml/json data.
I propose to replace the bytearrays to objects, in this case the result of the 
PacketReader is XML object.

for example 

---------------------------------------------------------------------------
public class XMLPacketReader implements PacketReader
{
    XMLObject xml = null;

    private XMLPacketReader()
    {
    }

    public Object nextPacket(ByteBuffer byteBuffer) throws ProtocolViolationException
    {
        // parse xml here
        boolean isDone = XMLParser.parse(byteBuffer, xml);

        if(isDone)
                return xml;

        return null;
    }

}

---------------------------------------------------------------------------

Original issue reported on code.google.com by abuhar...@gmail.com on 16 Jan 2013 at 11:14

Don't use the packet reader in this manner. Instead, use a raw packet reader to 
return a byte[], and assemble this XML in the SocketObserver's 
packetReceived(NIOSocket socket, byte[] packet). The packet readers are 
supposed to be very low level factories that simply group the byte stream into 
packets.

Original comment by christof...@gmail.com on 16 Jan 2013 at 11:21

  • Changed state: Invalid
Well, it's double work: first from the buffer into a byte array, then the array 
of bytes in XML

or I am wrong?

Original comment by abuhar...@gmail.com on 16 Jan 2013 at 11:27

Implementation-wise, you only need to implement the XML-conversion in the 
listener, since the RawPacketReader is default.

If you're worried about the extra copy of bytes, then usually you'll still need 
to take the following steps:

bytebuffer -> byte[] -> string

If it's the XML parser that does bytebuffer -> byte[] or if it's done by the 
RawPacketReader should not be important.

Sure, it's possible for an XML reader to -as an optimization - reuse a byte 
array for the conversion to string, but the same amount of copying will still 
be necessary. So this is 1 memory allocation we potentially could save. 
Compared to the XML parsing, this is a very small cost.

In many cases it is more optimal to hand over heavy parsing to a different 
thread and leave the IO thread to do the next read. This is quite easy if you 
get raw bytes to the socket observer. It can then dispatch these bytes for 
parsing in the same or to a different thread. There is no such flexibility 
available for the packet reader.

Similarly it might be tempting to do decryption / encryption in the packet 
reader. Again this is not recommended for the same reasons.

Keep the readers / writers simple.

Original comment by christof...@gmail.com on 16 Jan 2013 at 11:42

Thank you for your response.

I wrote simple event-driven parsers for xml and json. Its reads byte by byte 
and, when when tag(for xml) and object(for json) reached parser send 
notification to it delegate. It works like SAX or maybe XMLStreamReader, but 
for byte array as source.

I really worried about the extra copy of bytes in this case.

I think, that the best way - it minimize buffer size for PacketReader.

Original comment by abuhar...@gmail.com on 16 Jan 2013 at 11:52

I understand. What you can do is a simple 1 byte raw reader, which extracts 1 
byte from the byte buffer at the time. Since you immediately use this byte, you 
can basically do:

- in the constructor, create a byte array of length 1
- read one byte from the byte buffer
- put that in the byte array
- return the byte array
- feed the single byte in the byte array to the parser in the SocketObserver 
callback.


Original comment by christof...@gmail.com on 16 Jan 2013 at 12:04

I understand, but with byte buffers I can use CharsetDecoders from java.nio 
package. ByteBuffers only available inside PacketReader. That is the problem. 

Without that I must read chars from byte arrays in SocketObserver with my own 
mechanisms or use multiple wrappers and extra copy of bytes to use 
CharsetDecoders in SocketObserver.

Original comment by abuhar...@gmail.com on 16 Jan 2013 at 12:13

I believe that if you feel you'll encounter performance issues from the byte 
copy, then the XML parse would be way too slow in itself to be allowed on the 
IO thread - which again would suggest that the you should simply copy the bytes.

In addition, doing this change would make it less obvious how the reader should 
be used. I would not want people to think that in general packets should parsed 
in the reader.

Since the code is free to use without attribution, why not copy it and use a 
custom implementation that works the way you want it?

Original comment by christof...@gmail.com on 16 Jan 2013 at 12:45

I understand all this and I am not worried about performance issues related to 
copying the byte arrays, it's just not right and not rational.

I can do everything myself, but I respect your opinion and it is important to 
me.
You developed this great library is a very simple and functional, so I could 
not consult with you.

Original comment by abuhar...@gmail.com on 16 Jan 2013 at 12:51

Consider the two possible implementations:

a) SocketObserver receives an Object, which may be a byte[] or something else.

In this case, we have designed for the eventuality that we want to return 
something other than a byte[]. As I've mentioned, this is unlikely to be 
desirable in most cases, due to scalability concerns.

But in order to offer this flexibility, we now ask every practical 
implementation of SocketObserver to first do:

byte[] bytePacket = (byte[]) packet;

It might also send the wrong signal, causing people to deserialize data in the 
Reader, which it's unsuitable for (not only for performance concerns, but in 
order to propagate error handling for invalid data, this reader might need to 
propagate things like xml parse errors to the rest of the system).

On the other hand, if the reader only handles the lowest level of the packet 
reading - as it is intended to -, any error detected will be in the basic 
transfer protocol, and not have anything to do with data validation and actual 
parsing so all practical error handling can take place at the same level.

b) SocketObserver uses generics.

In this case, we don't have to cast to byte[], but lots of the issues in (a) 
still remains. In addition, The code uses a sentinel value in form of an empty 
byte[]. That solution would be problematic in this case.

-

Think of the Readers as simple settings. I could have made some predefined ones 
and have it look like this.

socket.setReadFormat(ASCII_LINE_FORMAT);

By making them objects I allow for them to be easily extended and even have 
some internal state, but that shouldn't be misunderstood to mean that they are 
suitable for any heavy lifting. Again, this is a threading issue, and just not 
in the performance sense.

For instance, let's say a malicious user sends you "<xml><a><a><a>…" what 
happens when it runs out of memory? Hopefully it doesn't scramble the whole 
thread running IO, but it's hard to make guarantees. On the other hand, if you 
did the parsing on a thread, any problem will first take out that thread and 
the rest of the server might survive.

Yet another concern is how and when the reader is accessed. There is no 
guarantee that it is safe to close a connection while the reader is running, or 
manipulate anything else related to NIO. You don't know if it will run many 
times in a single run, or just once every pass etc. This is the old framework 
problem, where you must understand how and when the classes you plug into the 
framework will be used.

For SocketObserver it's simple. Whenever you get a callback you're safe to do 
anything to the sockets or the NIOService. With the Reader, there are no 
guarantees.


I know it might seem pigheaded not to allow Object, but I feel this version is 
much more obvious in how it should work. It is easy for the way it's supposed 
to work (the Reader chopping data into byte[]), and hard the way it's not 
supposed to work (the Reader deserializes the data).

If I would create a library where you plug in a packet deserializers, the 
design would necessarily have been different.

As it is, Naga serves two purposes:
1) As example code on how to handle the usual ByteBuffer/SelectionKey juggling 
one is always forced to do.
2) To be used as a building block for customized asynchronous socket 
communication

Original comment by christof...@gmail.com on 16 Jan 2013 at 4:07

I understand, thank you for such a detailed response.

Original comment by abuhar...@gmail.com on 18 Jan 2013 at 7:36

I would like to thank you for pressing the issue, because it is very worthwhile 
to revisit design decisions.

I realize it would be nice to get hold of the original bytebuffer in some 
cases. Especially when you look at implementing things like SSL.

However, it turns out that in those cases the general design of Naga is a bad 
fit.

For Naga, the following assumptions are made:
1. You don't want to think about how to properly prepare your bytebuffer before 
every read.
2. You don't want to manually control the selection keys on the selector.

If those are invalid, then obviously the library will be less than ideal.

I personally tried hard to create an SSL implementation using java's SSL engine 
- which violates both (1) and (2) - but I was so intensely dissatisfied with it 
that I ended up deciding not to use it.

The more flexible you want to be the more difficult the library is to use.

I think the reasonable solution (as an alternative to increased flexibility) is 
to create a different library that deliberately avoids making these 
assumptions. By exposing some complexity, we might be able to simplify other 
parts of the system.

For instance, if we entirely delegate the bytebuffer transfer to the user, then 
the SocketObserver method could look something like: dataReceived(ByteBuffer) 
instead of packetReceived(byte[]). We can then remove the "Reader" classes 
entirely. This would be trading increased complexity in reading (you need to 
handle converting the stream to bytes yourself, know how to flip the byte 
buffer correctly etc) for simplified configuration (no need to figure out what 
reader to use)

Original comment by christof...@gmail.com on 18 Jan 2013 at 10:37

I am happy, if a little could be helpful to you.

>> For Naga, the following assumptions are made:
>> 1. You don't want to think about how to properly prepare your bytebuffer 
before every read.
>> 2. You don't want to manually control the selection keys on the selector.

This simplicity is great.
Library, like the simplicity of naga must be in java.nio.

>> The more flexible you want to be the more difficult the library is to use.

I agree.
I do not like java for the fact that many things are very low-level and contain 
a lot of required settings. It is difficult and some errors occur later.

On the other hand, a reasonable compromise can always be found.

>> For instance, if we entirely delegate the bytebuffer transfer to the user, 
then the SocketObserver method could look something like: dataReceived 
(ByteBuffer) instead of packetReceived (byte []). We can then remove the 
"Reader" classes entirely. This would be trading increased complexity in 
reading (you need to handle converting the stream to bytes yourself, know how 
to flip the byte buffer correctly etc) for simplified configuration (no need to 
figure out what reader to use)

I often use the naga, and in all cases I had to use the received Packet 
byteArray as ByteBuffer to read byte by byte and parse xml and json. 

Original comment by abuhar...@gmail.com on 18 Jan 2013 at 1:08