[Java] Map serialization bug
Munoon opened this issue · comments
Search before asking
- I had searched in the issues and found no similar issues.
Version
0.4.1
Component(s)
Java
Minimal reproduce step
import io.fury.Fury;
import io.fury.memory.MemoryBuffer;
import io.fury.serializer.Serializer;
import io.fury.serializer.collection.MapSerializers;
import java.util.HashMap;
public class Test {
public static void main(String[] args) {
Fury fury = Fury.builder().requireClassRegistration(false).build();
fury.registerSerializer(Storage.class, StorageSerializer.class);
HashMap<Key, String> map = new HashMap<>();
map.put(new Key(1, 2), "abc");
Storage storage = new Storage(map);
byte[] data = fury.serializeJavaObject(storage);
Storage deserializedStorage = fury.deserializeJavaObject(data, Storage.class);
System.out.println(deserializedStorage.map().get(new Key(1, 0)));
}
public static class StorageSerializer extends Serializer<Storage> {
public StorageSerializer(Fury fury) {
super(fury, Storage.class);
}
@Override
public void write(MemoryBuffer buffer, Storage value) {
MapSerializers.HashMapSerializer mapSerializer = new MapSerializers.HashMapSerializer(fury);
mapSerializer.setKeySerializer(new KeySerializer(fury));
mapSerializer.write(buffer, value.map());
}
@Override
public Storage read(MemoryBuffer buffer) {
MapSerializers.HashMapSerializer mapSerializer;
mapSerializer = new MapSerializers.HashMapSerializer(fury);
mapSerializer.setKeySerializer(new KeySerializer(fury));
HashMap<Key, String> map = mapSerializer.read(buffer);
return new Storage(map);
}
}
public static class KeySerializer extends Serializer<Key> {
public KeySerializer(Fury fury) {
super(fury, Key.class);
}
@Override
public void write(MemoryBuffer buffer, Key value) {
buffer.writeInt(value.a());
}
@Override
public Key read(MemoryBuffer buffer) {
int a = buffer.readInt();
return new Key(a, 0);
}
}
public record Storage(
HashMap<Key, String> map
) {
}
public record Key(
int a,
int b
) {
}
}
What did you expect to see?
Better performance. Also I don't wont to initialize MapSerializers.HashMapSerializer
on each read/write.
What did you see instead?
If I reuse MapSerializers.HashMapSerializer
, which was created in the serializer constructor for example, I got an exception.
Anything Else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!
When I execute this code, fury takes from 1673 ms to 3432 ms to initialize, which is super slow to me.
Why not make MapSerializers.HashMapSerializer mapSerializer
as an instance variable of your serializer
I have an idea, you can make the map
attr use a HashMap subclass, and you custimized the serialization for this class.
You can take org.apache.fury.serializer.collection.MapSerializers.StringKeyMapSerializer
as an example
Why not make
MapSerializers.HashMapSerializer mapSerializer
as an instance variable of your serializer
The following refactoring:
public static class StorageSerializer extends Serializer<Storage> {
private final MapSerializers.HashMapSerializer mapSerializer;
public StorageSerializer(Fury fury) {
super(fury, Storage.class);
this.mapSerializer = new MapSerializers.HashMapSerializer(fury);
this.mapSerializer.setKeySerializer(new KeySerializer(fury));
}
@Override
public void write(MemoryBuffer buffer, Storage value) {
mapSerializer.write(buffer, value.map());
}
@Override
public Storage read(MemoryBuffer buffer) {
HashMap<Key, String> map = mapSerializer.read(buffer);
return new Storage(map);
}
}
Leads to the following exception:
Exception in thread "main" java.lang.IndexOutOfBoundsException: readerIndex(13) + length(2) exceeds size(14): MemoryBuffer{size=14, readerIndex=13, writerIndex=0, heapMemory=len(14), heapData=[-1, 1, -1, 1, 0, 0, 0, -1, 44, 0, 3, 97, 98, 99], heapOffset=0, offHeapBuffer=null, address=16, addressLimit=30}
at io.fury.memory.MemoryBuffer.readShort(MemoryBuffer.java:2031)
at io.fury.resolver.EnumStringResolver.trySkipEnumStringBytes(EnumStringResolver.java:140)
at io.fury.resolver.EnumStringResolver.readEnumStringBytes(EnumStringResolver.java:110)
at io.fury.resolver.ClassResolver.readClassInfoFromBytes(ClassResolver.java:1628)
at io.fury.resolver.ClassResolver.readClassInfoFromBytes(ClassResolver.java:1606)
at io.fury.resolver.ClassResolver.readClassInfo(ClassResolver.java:1585)
at io.fury.Fury.readRef(Fury.java:814)
at io.fury.serializer.collection.AbstractMapSerializer.generalJavaRead(AbstractMapSerializer.java:569)
at io.fury.serializer.collection.AbstractMapSerializer.genericJavaRead(AbstractMapSerializer.java:444)
at io.fury.serializer.collection.AbstractMapSerializer.readElements(AbstractMapSerializer.java:436)
at io.fury.serializer.collection.MapSerializer.read(MapSerializer.java:47)
at Test$StorageSerializer.read(Test.java:38)
at Test$StorageSerializer.read(Test.java:22)
at io.fury.Fury.readDataInternal(Fury.java:899)
at io.fury.Fury.deserializeJavaObject(Fury.java:1060)
at io.fury.Fury.deserializeJavaObject(Fury.java:1042)
at Test.main(Test.java:18)
You need to invoke this.mapSerializer.setKeySerializer(keySerializer);
in write/read
method. It will be set to null every time to avoid nested map serialization exception
You need to invoke
this.mapSerializer.setKeySerializer(keySerializer);
inwrite/read
method. It will be set to null every time to avoid nested map serialization exception
Thank you, it works for me and solves my problem. But maybe it worth adding some option to remove this logic?
Also, is such long initialization normal to fury? Looks like a bug to me.
Could you provide a profiler flame graph for your init. @LiangliangSui is optimizing this in #1482
Sure. I've used the original code from the issue description.
Test__1__2024_04_08_192640.jfr.zip
I've also profile a code, which creates fury 100 times (it contains StorageSerializer
registration, but didn't serialize/deserialize anything). The execution takes 5s, so the issue seems to be in some static code.
Test__1__2024_04_08_192846.jfr.zip
Looks like the logger initialization issue (I have some log4j configuration files in classpath). However, disabling logging still didn't help, as it is still initialize in ShimDispatcher
. I guess, it should be refactored with Furies LoggerFactory
. WDYT?
Looks like the logger initialization issue (I have some log4j configuration files in classpath). However, disabling logging still didn't help, as it is still initialize in
ShimDispatcher
. I guess, it should be refactored with FuriesLoggerFactory
. WDYT?
Yes, all logger should use FURY LoggerFactory, would you like to submit a PR to fix this
Looks like fixed in #1485