mzaks / FlatBuffersSwift

This project brings FlatBuffers (an efficient cross platform serialization library) to Swift.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduce Padding and CPU friendliness

mzaks opened this issue · comments

This could be the last bit which could bring us to C speed.
Right now we are memory aligned for structs but we are not CPU Cache friendly as we don't introduce padding to make every entry CPU friendly. I would only introduce it as an option if it makes decoding faster. Because it will make binaries larger.

Isn't #4 still an issue really btw that should be looked at first? The encoded message size of the flatbuffers benchmarks differs from the original, so some layout is presumably different?

#4 is solved as Swift aligns structs correctly. So by direct memory mapping the structs we solved it.

This CPU friendly padding is not important for compatibility. As we jump to a position and than read x bytes from this position. Padding for CPU friendliness means that values should be on the same "page". For this we have to add padding when inserting values. Which will result in different vTable. So the reading is compatible if we are padding or not because the vTable is setup accordingly.

I think I may have found another issue - trying to investigate now, stat tuned.

Ok, I haven't been able to nail it down, but will share a couple of observations in case you have run into anything similar:

  1. 40ms baseline
private func flatuseStruct(buffer : UnsafePointer<UInt8>, start : Int) -> Int
{
    var sum:Int = start

    var foobarcontainer = FooBarContainer.Fast(buffer)

    sum = sum + Int(foobarcontainer.location!.count)
    sum = sum + Int(foobarcontainer.fruit!.rawValue)
    sum = sum + (foobarcontainer.initialized ? 1 : 0)
    let list = foobarcontainer.list

    for i in 0..<list.count { 
        let foobar = list[i]!
        sum = sum + Int(foobar.name!.count)
        sum = sum + Int(foobar.postfix)
        sum = sum + Int(foobar.rating)

        let bar = foobar.sibling!
        sum = sum + Int(bar.ratio)
        sum = sum + Int(bar.size)
        sum = sum + Int(bar.time) 

        let foo = bar.parent
        sum = sum + Int(foo.count) 
        sum = sum + Int(foo.id) 
        sum = sum + Int(foo.length)
        sum = sum + Int(foo.prefix) 

    }
    return sum
}

  1. 35us, > 10% improvement on whole run!
private func flatuseStruct(buffer : UnsafePointer<UInt8>, start : Int) -> Int
{
    var sum:Int = start

    var foobarcontainer = FooBarContainer.Fast(buffer)

    sum = sum + Int(foobarcontainer.location!.count)
    sum = sum + Int(foobarcontainer.fruit!.rawValue)
    sum = sum + (foobarcontainer.initialized ? 1 : 0)
    let list = foobarcontainer.list

    for i in 0..<list.count { 
        let foobar = list[i]!
        sum = sum + Int(foobar.name!.count)
        sum = sum + Int(foobar.postfix)
        sum = sum + Int(foobar.rating)

        let bar = foobar.sibling!
     //   sum = sum + Int(bar.ratio) Comment out this single line! ***********
        sum = sum + Int(bar.size)
        sum = sum + Int(bar.time) 

        let foo = bar.parent
        sum = sum + Int(foo.count) 
        sum = sum + Int(foo.id) 
        sum = sum + Int(foo.length)
        sum = sum + Int(foo.prefix) 

    }
    return sum
}

  1. 37ms, ~10% improvement by using Doubles!
private func flatuseStruct(buffer : UnsafePointer<UInt8>, start : Int) -> Int
{
    var sum:Double = Double(start)
    var foobarcontainer = FooBarContainer.Fast(buffer)

    sum = sum + Double(foobarcontainer.location!.count)
    sum = sum + Double(foobarcontainer.fruit!.rawValue)
    sum = sum + (foobarcontainer.initialized ? 1 : 0)
    let list = foobarcontainer.list
    for i in 0..<list.count {
        var foobar = list[i]!
        sum = sum + Double(foobar.name!.count)
        sum = sum + Double(foobar.postfix)
        sum = sum + Double(foobar.rating)

        let bar = foobar.sibling!

        sum = sum + Double(bar.ratio)
        sum = sum + Double(bar.size)
        sum = sum + Double(bar.time)

        let foo = bar.parent
        sum = sum + Double(foo.count)
        sum = sum + Double(foo.id)
        sum = sum + Double(foo.length)
        sum = sum + Double(foo.prefix)
    }
    return Int(sum)
}

Something seems fishy in the float-to-int conversion at least, comment out both ratio rating and we drop to 32ms for the whole run..

As another reference, removing safety checks by the compiler gives us 35ms as well.

BinaryBuildConfig has now a property fullMemoryAlignment which will produce memory aligned binary. I could not measure any decoding performance increase for memory aligned binaries, however they are 10% to 15% larger than the not fully memory aligned ones. This is why this option is tuned off per default.