etsy / logster

Parse log files, generate metrics for Graphite and Ganglia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DNS queries for the statsd host repeated for each socket.sendto call

elukey opened this issue · comments

Hi everybody!

The Wikimedia foundation is a happy user of logster, we use it to push Varnishkafka json logs to graphite and visualize useful metrics. While reviewing our traffic via tcpdump for an unrelated issue with noticed a ton of traffic towards our DNS resolvers for the statsd host passed to logster. From what I can see in https://github.com/etsy/logster/blob/master/logster/outputs/statsd.py#L23-L36 it seems that if you don't pass a raw ip to --statsd-host then the udp_sock.sendto call will trigger a DNS query to solve the domain.

We are using the 0.0.1 version but the issue seems still present in master. Would it be possible to add an command line option to cache the resolution of the IP address for X seconds?

Luca

This might not really be possible since the suggested use is in a crontab, I was convinced that there were different ways of tailing a log. Sorry for the spam! :)

Re-thinking about this issue made me realize that there is a small improvement that is possible for https://github.com/etsy/logster/blob/master/logster/outputs/statsd.py#L23-L36, namely a gethostbyname before the for loop in order to pass an IP address to the sendto() rather than a domain, triggering a lot less DNS queries for each run. Does it make sense?

Are these DNS queries expensive in your infrastructure? I would have expected them to be answered by the local DNS cache, and so result in very little impact to actual performance

Hi! We don't have local resolvers on the hosts but only "remote" resolvers behind a LVS load balancer. This is not affecting heavily our infrastructure but I wanted to reduce (what I think is) unnecessary traffic. In this case resolving the domain right before the loop doesn't seem a big problem and instantly reduces traffic towards our resolvers.