apache / incubator-fury

A blazingly fast multi-language serialization framework powered by JIT and zero-copy.

Home Page:https://fury.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Java] Map serialization bug

Munoon opened this issue · comments

Search before asking

  • I had searched in the issues and found no similar issues.

Version

0.4.1

Component(s)

Java

Minimal reproduce step

import io.fury.Fury;
import io.fury.memory.MemoryBuffer;
import io.fury.serializer.Serializer;
import io.fury.serializer.collection.MapSerializers;

import java.util.HashMap;

public class Test {
    public static void main(String[] args) {
        Fury fury = Fury.builder().requireClassRegistration(false).build();
        fury.registerSerializer(Storage.class, StorageSerializer.class);

        HashMap<Key, String> map = new HashMap<>();
        map.put(new Key(1, 2), "abc");
        Storage storage = new Storage(map);
        byte[] data = fury.serializeJavaObject(storage);

        Storage deserializedStorage = fury.deserializeJavaObject(data, Storage.class);
        System.out.println(deserializedStorage.map().get(new Key(1, 0)));
    }

    public static class StorageSerializer extends Serializer<Storage> {
        public StorageSerializer(Fury fury) {
            super(fury, Storage.class);
        }

        @Override
        public void write(MemoryBuffer buffer, Storage value) {
            MapSerializers.HashMapSerializer mapSerializer = new MapSerializers.HashMapSerializer(fury);
            mapSerializer.setKeySerializer(new KeySerializer(fury));

            mapSerializer.write(buffer, value.map());
        }

        @Override
        public Storage read(MemoryBuffer buffer) {
            MapSerializers.HashMapSerializer mapSerializer;
            mapSerializer = new MapSerializers.HashMapSerializer(fury);
            mapSerializer.setKeySerializer(new KeySerializer(fury));

            HashMap<Key, String> map = mapSerializer.read(buffer);
            return new Storage(map);
        }
    }

    public static class KeySerializer extends Serializer<Key> {

        public KeySerializer(Fury fury) {
            super(fury, Key.class);
        }

        @Override
        public void write(MemoryBuffer buffer, Key value) {
            buffer.writeInt(value.a());
        }

        @Override
        public Key read(MemoryBuffer buffer) {
            int a = buffer.readInt();
            return new Key(a, 0);
        }
    }

    public record Storage(
            HashMap<Key, String> map
    ) {
    }

    public record Key(
            int a,
            int b
    ) {
    }
}

What did you expect to see?

Better performance. Also I don't wont to initialize MapSerializers.HashMapSerializer on each read/write.

What did you see instead?

If I reuse MapSerializers.HashMapSerializer, which was created in the serializer constructor for example, I got an exception.

Anything Else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

When I execute this code, fury takes from 1673 ms to 3432 ms to initialize, which is super slow to me.

Why not make MapSerializers.HashMapSerializer mapSerializer as an instance variable of your serializer

I have an idea, you can make the map attr use a HashMap subclass, and you custimized the serialization for this class.
You can take org.apache.fury.serializer.collection.MapSerializers.StringKeyMapSerializer as an example

Why not make MapSerializers.HashMapSerializer mapSerializer as an instance variable of your serializer

The following refactoring:

public static class StorageSerializer extends Serializer<Storage> {
        private final MapSerializers.HashMapSerializer mapSerializer;

        public StorageSerializer(Fury fury) {
            super(fury, Storage.class);
            this.mapSerializer = new MapSerializers.HashMapSerializer(fury);
            this.mapSerializer.setKeySerializer(new KeySerializer(fury));
        }

        @Override
        public void write(MemoryBuffer buffer, Storage value) {
            mapSerializer.write(buffer, value.map());
        }

        @Override
        public Storage read(MemoryBuffer buffer) {
            HashMap<Key, String> map = mapSerializer.read(buffer);
            return new Storage(map);
        }
    }

Leads to the following exception:

Exception in thread "main" java.lang.IndexOutOfBoundsException: readerIndex(13) + length(2) exceeds size(14): MemoryBuffer{size=14, readerIndex=13, writerIndex=0, heapMemory=len(14), heapData=[-1, 1, -1, 1, 0, 0, 0, -1, 44, 0, 3, 97, 98, 99], heapOffset=0, offHeapBuffer=null, address=16, addressLimit=30}
	at io.fury.memory.MemoryBuffer.readShort(MemoryBuffer.java:2031)
	at io.fury.resolver.EnumStringResolver.trySkipEnumStringBytes(EnumStringResolver.java:140)
	at io.fury.resolver.EnumStringResolver.readEnumStringBytes(EnumStringResolver.java:110)
	at io.fury.resolver.ClassResolver.readClassInfoFromBytes(ClassResolver.java:1628)
	at io.fury.resolver.ClassResolver.readClassInfoFromBytes(ClassResolver.java:1606)
	at io.fury.resolver.ClassResolver.readClassInfo(ClassResolver.java:1585)
	at io.fury.Fury.readRef(Fury.java:814)
	at io.fury.serializer.collection.AbstractMapSerializer.generalJavaRead(AbstractMapSerializer.java:569)
	at io.fury.serializer.collection.AbstractMapSerializer.genericJavaRead(AbstractMapSerializer.java:444)
	at io.fury.serializer.collection.AbstractMapSerializer.readElements(AbstractMapSerializer.java:436)
	at io.fury.serializer.collection.MapSerializer.read(MapSerializer.java:47)
	at Test$StorageSerializer.read(Test.java:38)
	at Test$StorageSerializer.read(Test.java:22)
	at io.fury.Fury.readDataInternal(Fury.java:899)
	at io.fury.Fury.deserializeJavaObject(Fury.java:1060)
	at io.fury.Fury.deserializeJavaObject(Fury.java:1042)
	at Test.main(Test.java:18)

You need to invoke this.mapSerializer.setKeySerializer(keySerializer); in write/read method. It will be set to null every time to avoid nested map serialization exception

You need to invoke this.mapSerializer.setKeySerializer(keySerializer); in write/read method. It will be set to null every time to avoid nested map serialization exception

Thank you, it works for me and solves my problem. But maybe it worth adding some option to remove this logic?
Also, is such long initialization normal to fury? Looks like a bug to me.

Could you provide a profiler flame graph for your init. @LiangliangSui is optimizing this in #1482

Sure. I've used the original code from the issue description.
Test__1__2024_04_08_192640.jfr.zip

I've also profile a code, which creates fury 100 times (it contains StorageSerializer registration, but didn't serialize/deserialize anything). The execution takes 5s, so the issue seems to be in some static code.
Test__1__2024_04_08_192846.jfr.zip

Looks like the logger initialization issue (I have some log4j configuration files in classpath). However, disabling logging still didn't help, as it is still initialize in ShimDispatcher. I guess, it should be refactored with Furies LoggerFactory. WDYT?

Looks like the logger initialization issue (I have some log4j configuration files in classpath). However, disabling logging still didn't help, as it is still initialize in ShimDispatcher. I guess, it should be refactored with Furies LoggerFactory. WDYT?

Yes, all logger should use FURY LoggerFactory, would you like to submit a PR to fix this

image
SLFJ4 seems took too long for init.

Actually, I'm thinking whether we should remove SLFJ4

Looks like fixed in #1485