redis / rueidis

A fast Golang Redis client that supports Client Side Caching, Auto Pipelining, Generics OM, RedisJSON, RedisBloom, RediSearch, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

performance testing, what is the fastest way as a client to connect to custom developed redis server e.g. redhub

ouvaa opened this issue · comments

https://github.com/OpenAtomFoundation/pika

can you provide the fastest way to push outgoing connections? i've developed my own redis server and it's able to outperform official redis server by 100k per cpu core but the rueidis client cannot keep up. how can i do this as below? running multiple client connections.

if possible i would prefer to use 1 outgoing connection only (if it does not seriously affect performance)
i am running codis proxy which will have multiple incoming redis client and the server will have many fd open ports coz i have other services running on it.

is this possible? else how to force "auto pipelining" etc.

p.s. : my multicore enabled customized redis solution is not fully utilizing the capability of rueidis client.

u can try to simulate the problem i'm having, rueidis is unable to fully utilize multiple core.
https://github.com/IceFireDB/redhub

pls provide the fastest way to push the maximum throughput / rps for the client. thx

p.p.s : thx for the -1 pipelinemultiplex code but no matter what i tried the value, i've read on the default 8 but it's not working as well as i hoped for.

package main

import (
        "context"
        "fmt"
        "sync"
        "time"

        "github.com/redis/rueidis"
        "runtime"

        "golang.org/x/sys/unix"
)

const totalRequests = 1000000

func main() {
        runtime.GOMAXPROCS(runtime.NumCPU()) // Ensure all CPUs are used.


        numCores := uint16(runtime.NumCPU())
        requestsPerCore := totalRequests / int(numCores)

        var wg sync.WaitGroup
        start := time.Now()

        client, err := rueidis.NewClient(rueidis.ClientOption{
                InitAddress: []string{"localhost:6379"},
                DisableCache: true, // Disable the internal command cache
                PipelineMultiplex: -1,
        })
        if err != nil {
                panic(fmt.Sprintf("Failed to create Redis client: %v", err))
        }
        defer client.Close()



        for i := uint16(0); i < numCores; i++ {
                wg.Add(1)
                go func(coreID uint16) {
                        runtime.LockOSThread()
                        defer runtime.UnlockOSThread()
                        SetCPUAffinity(coreID + 1)
                        defer wg.Done()
                        ctx := context.Background()
                        for j := 0; j < requestsPerCore; j++ {
                                key := fmt.Sprintf("key%d_%d", coreID, j)
                                value := fmt.Sprintf("value%d_%d", coreID, j)

                                // Set
                                setCmd := client.B().Set().Key(key).Value(value).Build()
                                if err := client.Do(ctx, setCmd).Error(); err != nil {
                                        fmt.Printf("Failed to set key: %v\n", err)
                                        return
                                }

                                // Get
                                _, err := client.Do(ctx, client.B().Get().Key(key).Build()).ToString()
                                if err != nil {
                                        fmt.Printf("Failed to get key: %v\n", err)
                                        return
                                }
                        }
                }(i)
        }

        wg.Wait()
        elapsed := time.Since(start)
        fmt.Printf("Executed %d requests in %s\n", totalRequests, elapsed)
}


func SetCPUAffinity(cpu uint16) error {
        var newMask unix.CPUSet
        newMask.Zero()
        newMask.Set(int(cpu) - 1)
        return unix.SchedSetaffinity(0, &newMask)
}

Hi @ouvaa, please just simply use as many concurrent goroutines as you can and don't use the runtime.LockOSThread(). You can optionally use DoMulti() instead of Do() to further increase the outgoing throughput.

@rueian i was retesting again and i was wondering which was faster. can you help run this and help us understand what's the problem? both completed in the same time but the one with client declared inside the goroutine is uses more cpu of a 12 core laptop.

(my custom server is running 4x more cpu usage [the server] but results tps is the same for both)

can u help this?

i understand using more concurrent goroutines but my program is ran with 1 goroutine per core. i'm simplifying to testing rueidis with running it as such.

slower

package main

import (
        "context"
        "fmt"
        "sync"
        "time"

        "github.com/redis/rueidis"
        "runtime"

        "golang.org/x/sys/unix"
)

const totalRequests = 1000000

func main() {
        runtime.GOMAXPROCS(runtime.NumCPU()) // Ensure all CPUs are used.


        numCores := uint16(runtime.NumCPU())
        requestsPerCore := totalRequests / int(numCores)

        var wg sync.WaitGroup
        start := time.Now()



        for i := uint16(0); i < numCores; i++ {
                wg.Add(1)
                go func(coreID uint16) {
                        runtime.LockOSThread()
                        defer runtime.UnlockOSThread()
                        SetCPUAffinity(coreID + 1)
                        defer wg.Done()
                        ctx := context.Background()
                        client, err := rueidis.NewClient(rueidis.ClientOption{
                                InitAddress: []string{"localhost:6379"},
                                DisableCache: true, // Disable the internal command cache
                                PipelineMultiplex: -1,
                        })
                        if err != nil {
                                panic(fmt.Sprintf("Failed to create Redis client: %v", err))
                        }
                        defer client.Close()

                        for j := 0; j < requestsPerCore; j++ {
                                //key := fmt.Sprintf("key%d_%d", coreID, j)
                                //value := fmt.Sprintf("value%d_%d", coreID, j)

                                ///*
                                // Set
                                setCmd := client.B().Set().Key("key").Value("value").Build()
                                if err := client.Do(ctx, setCmd).Error(); err != nil {
                                        fmt.Printf("Failed to set key: %v\n", err)
                                        return
                                }
                                //*/

                                /*
                                // Get
                                _, err := client.Do(ctx, client.B().Get().Key("key").Build()).ToString()
                                if err != nil {
                                        fmt.Printf("Failed to get key: %v\n", err)
                                        return
                                }
                                //*/
                        }

                }(i)
        }

        wg.Wait()
        elapsed := time.Since(start)
        fmt.Printf("Executed %d requests in %s\n", totalRequests, elapsed)
}


func SetCPUAffinity(cpu uint16) error {
        var newMask unix.CPUSet
        newMask.Zero()
        newMask.Set(int(cpu) - 1)
        return unix.SchedSetaffinity(0, &newMask)
}

faster

package main

import (
        "context"
        "fmt"
        "sync"
        "time"

        "github.com/redis/rueidis"
        "runtime"

        "golang.org/x/sys/unix"
)

const totalRequests = 1000000

func main() {
        runtime.GOMAXPROCS(runtime.NumCPU()) // Ensure all CPUs are used.


        numCores := uint16(runtime.NumCPU())
        requestsPerCore := totalRequests / int(numCores)

        var wg sync.WaitGroup
        start := time.Now()


                        client, err := rueidis.NewClient(rueidis.ClientOption{
                                InitAddress: []string{"localhost:6379"},
                                DisableCache: true, // Disable the internal command cache
                                PipelineMultiplex: -1,
                        })
                        if err != nil {
                                panic(fmt.Sprintf("Failed to create Redis client: %v", err))
                        }
                        defer client.Close()


        for i := uint16(0); i < numCores; i++ {
                wg.Add(1)
                go func(coreID uint16) {
                        runtime.LockOSThread()
                        defer runtime.UnlockOSThread()
                        SetCPUAffinity(coreID + 1)
                        defer wg.Done()
                        ctx := context.Background()
                        for j := 0; j < requestsPerCore; j++ {
                                //key := fmt.Sprintf("key%d_%d", coreID, j)
                                //value := fmt.Sprintf("value%d_%d", coreID, j)

                                ///*
                                // Set
                                setCmd := client.B().Set().Key("key").Value("value").Build()
                                if err := client.Do(ctx, setCmd).Error(); err != nil {
                                        fmt.Printf("Failed to set key: %v\n", err)
                                        return
                                }
                                //*/

                                /*
                                // Get
                                _, err := client.Do(ctx, client.B().Get().Key("key").Build()).ToString()
                                if err != nil {
                                        fmt.Printf("Failed to get key: %v\n", err)
                                        return
                                }
                                //*/
                        }

                }(i)
        }

        wg.Wait()
        elapsed := time.Since(start)
        fmt.Printf("Executed %d requests in %s\n", totalRequests, elapsed)
}


func SetCPUAffinity(cpu uint16) error {
        var newMask unix.CPUSet
        newMask.Zero()
        newMask.Set(int(cpu) - 1)
        return unix.SchedSetaffinity(0, &newMask)
}

Hi @ouvaa, both two use cases are valid, but usually, there is no need to have multiple rueidis client instances. You can just use one rueidis client instance and fire multiple concurrent Do() and DoMulti() to it.

Again, please don't use runtime.LockOSThread() that will prevent background goroutines from being scheduled and can cause performance degradation.

@rueian thank you for your time in answering these questions. i've been trying to push for performance and if possible, please make a closer look at both the script above as my program needed LockOSThread so naturally rueidis is performing inside that goroutine as a slim down version of the program. i have requirement to run the LockOSThread this way.

having said so, please please take a look at both codes.

my findings:
a) client outside goroutines : faster because something about this client is running in "single-thread" faster and thus processing faster and using lesser cpu, communicating with single thread goroutine.

b) supposingly should be much much faster but because of some "locks" or "variable sharing / contention" of rueidis package that makes it use more CPU without being faster.

can you help think about the internal architecting of this and maybe consider redesigning this with consideration to this issue, which definitely can speed it up significantly and use more cores appropriately.

i think u have a contention for global variables within this rueidis package. can you pprof and find out what's causing the contention issue? supposingly the slower first version is supposed to run faster in all other packages used except rueidis.

can you please try to look at it? thanks

@ouvaa, there are no "locks" or "variable sharing / contentions" in your second case. If there were, then your first case would be even slower, not faster.

What makes you think the second case should be faster? Do you take IO into account?

@rueian have u tried running both? the slower version uses more CPU. IO of what? network? i was wondering if you can use something non standard golang net library but the only one currently supporting tls is https://github.com/lesismal/nbio

so i'm really wondering what's the issue here. you mentioned both are valid but one of them is faster.

in theory the slower version should be faster and the faster version should be slower than the first case because the first case is isolated goroutine processing, which shld hv no context switching whatsoever so i'm really puzzled why rueidis is slower with the isolated goroutine instance

Yes, the cost of doing network IO operations should be taken into account. That includes system call costs.

@rueian thank you