panjf2000 / gnet

🚀 gnet is a high-performance, lightweight, non-blocking, event-driven networking framework written in pure Go.

Home Page:https://gnet.host

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question]: The redis server service implemented using gnet has too many read tcp i/o timeouts in the production environment.

xingfu89 opened this issue · comments

Actions I've taken before I'm here

  • I've thoroughly read the documentations about this problem but still have no answer.
  • I've searched the Github Issues/Discussions but didn't find any similar problems that have been solved.
  • I've searched the internet for this problem but didn't find anything helpful.

Questions with details

我们之前是用go net包实现的redis server,线上环境client没有出现过io timeout,但换成gnet后,会出现很多的read tcp i/o timeoutl了,想请教下会是什么原因造成的
我们的server使用cgoup限了8核,NumEventLoop设置的8,其他配置都是默认值
线上redis命令都是mget 10个key,每个key的value值长度在1KB

Code snippets (optional)

No response

Please provide some example code.

🤖 Non-English text detected, translating...


There is another question, should NumEventLoop be the same as the value set by runtime.GOMAXPROCS to achieve optimal performance? We set runtime.GOMAXPROCS to 8, use memtier_benchmark stress test get, value128B,
Start NumEventLoop setting 64, p99.9Latency reaches 130ms, adjust NumEventLoop to 8, p99.9Latency drops to 3.5ms

线上打日志发现了,一次请求mget命令返回值可能在100KB大小,调用gnet.Conn Write方法需要几百毫秒的耗时,这个同步写怎么这么耗时呢,我看注释AsyncWrite方法不能在OnTraffic的EventHandler里调用,那这个问题有什么好的方法解决吗

func (s *Server) OnTraffic(conn gnet.Conn) (action gnet.Action) {
connBuf, _ := conn.Next(-1)
cmds, err := ReadCommands(connBuf)
if err != nil {
return gnet.Close
}

for _, cmd := range cmds {
respData := HandleRequest(cmd.Args)
conn.Write(respData) // 这里耗时会达到几百毫秒
}

return gnet.None
}

🤖 Non-English text detected, translating...


I found out through online logging that the return value of a request for the mget command may be 100KB, and it takes hundreds of milliseconds to call the gnet.Conn Write method. Why is this synchronous writing so time-consuming? I saw the comment that the AsyncWrite method cannot be used in OnTraffic. Called in EventHandler, is there any good way to solve this problem?

首先,AsyncWrite() 可以在 OnTraffic() 里调用,只不过这个方法通常是给 event loop 之外的 goroutine 使用的;其次,gnet.Conn.Write() 只是直接调用系统调用 write(2),而且 socket 都是非阻塞的,如果当前还有未发送完的数据则直接写入本地 buffer 然后返回,连系统调用都没有,所以写 100KB 的数据的延迟是几百毫秒肯定是有问题的,我建议你排查一下你的系统网络,或者就是你的代码有问题,是不是别的地方大量占用 CPU 导致 event loop 没有机会运行,跑一下性能测试看看 CPU 的分布状况;最后,如果是需要一次性写出多个 buffers,可以考虑使用 gnet.Conn.Writev() 减少系统调用,具体可看文档。

🤖 Non-English text detected, translating...


First of all, AsyncWrite() can be called in OnTraffic(), but this method is usually used by goroutines outside the event loop; secondly, gnet.Conn.Write() just directly calls the system call write(2), and sockets are non-blocking. If there is currently unsent data, it will be written directly to the local buffer and then returned. There is not even a system call, so the delay of writing 100KB data is definitely a few hundred milliseconds. If there is a problem, I suggest you check your system network, or there is a problem with your code. Is it that other places are occupying a lot of CPU, causing the event loop to have no chance to run? Run a performance test to see the distribution of the CPU; finally, If you need to write multiple buffers at once, you can consider using gnet.Conn.Writev() to reduce system calls. Please see the documentation for details.