cjmamo / kafka-web-console

A web console for Apache Kafka (retired)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kafka Web Console release v2.0.0 is creating a high number of open file handles (against Kafka 0.8.1.1, ZooKeeper 3.3.4)

tonyfalabella opened this issue · comments

I'm running Kafka Web Console release v2.0.0 against Kafka 0.8.1.1 and ZooKeeper 3.3.4

I'm consistently seeing the number of open file handles increasing when I launch Kafka Web Console after navigating to a topic on Zookeeper.
Once the file handles start to increase, they increase without any more navigation being done in the browser - meaning I only need to launch the web console and do nothing else beside monitor the number of open files and I'll see it increase every few seconds.
I've confirmed there are no other producers or consumers connecting to Kafka or Zookeeper.

After this runs for a while you'll get either of these errors:

  • Run a Kafka command like this:
$INSTALLDIR/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --create --replication-factor 1 --partitions 4 --topic test2

You'll get an error like this:

Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 0; nested exception is:
    java.net.BindException: Address already in use
  • Java clients might get an error like this (due to "Too many open files"):
java.io.FileNotFoundException: /src1/fos/dev-team-tools/var/kafka/broker-0/replication-offset-checkpoint.tmp

The ulimit for the id that my Kafka process runs under has a very large value for the "open files".

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 610775
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 500000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 610775
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Note, I've also tried this Pull Request from @ibanner56 ( #40) which is related to these issues (#36 and #37 from @mungeol) but it did not fix the issue.

To reproduce on Linux do the following.

  1. Launch ZooKeeper
  2. Launch Kafka
  3. Create a topic with 4 partitions with 1 replication...
    $INSTALLDIR/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --create --replication-factor 1 --partitions 4 --topic test2
  4. Open a Putty session and run this script in that window
while [[ 1 == 1 ]]; do
  date
  echo "zookeeper: $(ls -ltr /proc/`ps -ef |grep zookeeper.server|grep -v grep|awk '{print $2}'`/fd |wc -l)"
  echo "Kafka: $(ls -ltr /proc/`ps -ef |grep kafka.Kafka     |grep -v grep|awk '{print $2}'`/fd |wc -l)"
  echo ""      
  sleep 5;
done
  1. Launch Kafka Web Console
  2. Browse to a topic
  3. Notice the number of "Kafka" connections in the Putty session should increase
  4. Wait several seconds. Notice the number of "Kafka" connections in the Putty session should increase again, without doing anything.
    Sample output from the script in #4 after running for a couple of hours (with 8 topics defined on the Zookeeper instance, 1 replication each, 4 partitions each).
Wed Jan 21 18:44:29 EST 2015
zookeeper: 37
Kafka: 6013

Wed Jan 21 18:44:34 EST 2015
zookeeper: 37
Kafka: 6013

Wed Jan 21 18:44:39 EST 2015
zookeeper: 37
Kafka: 6045

...

Wed Jan 21 18:51:23 EST 2015
zookeeper: 37
Kafka: 6461

Its like the files are not being closed I too experience this issue.
root@cerb ~ # sysctl fs.file-nr
fs.file-nr = 27424 0 6552758
root@cerb~ # sysctl fs.file-nr
fs.file-nr = 28864 0 6552758
root@cerb ~ # sysctl fs.file-nr
fs.file-nr = 29600 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 29600 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 29600 0 6552758
root@cerb~ # sysctl fs.file-nr
fs.file-nr = 29600 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 29760 0 6552758
root@cerb ~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerb~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30976 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30976 0 6552758

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515011
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 515011
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

I wrote up a stackoverflow issue http://stackoverflow.com/questions/28549868/kafka-web-console-using-twitter-finagle-not-responding before I found this thread. The only thing I could do was to restart the server. What have you been doing?

Hey

What we have been doing is setting the number of open files on our system
to the max. "65355". The application no longer crashes ..

Sean

On Mon, Feb 16, 2015 at 10:58 PM, Foo Lim notifications@github.com wrote:

I wrote up a stackoverflow issue
http://stackoverflow.com/questions/28549868/kafka-web-console-using-twitter-finagle-not-responding
before I found this thread. The only thing I could do was to restart the
server. What have you been doing?


Reply to this email directly or view it on GitHub
#47 (comment)
.

Hi,
I contemplated that as well. It's a good stop gap, but eventually, it'll hit the limit (faster if there are more partitions). I was looking for a more permanent solution, but this'll have to do for now, I guess.
-F

Yeah we have moved away and developed our own solution thats very similar
to kafka-web console/

On Tue, Feb 17, 2015 at 9:01 AM, Foo Lim notifications@github.com wrote:

Hi,
I contemplated that as well. It's a good stop gap, but eventually, it'll
hit the limit (faster if there are more partitions). I was looking for a
more permanent solution, but this'll have to do for now, I guess.
-F


Reply to this email directly or view it on GitHub
#47 (comment)
.

This is really a major issue. Not only does Kafka become unstable but it can reek havoic on any other process that needs to use ports when the "open files" limit has been reached. I've also observed instability even when that max has not been reached.

To fix the issue we used to kill web-console. I can't remember if we also then occassionally had to rebuild some of the topic files or not.

You'll also notice a ton of messages being generated in your zookeeper log file. The log file can quickly grow to be quite large.

Due to this issue we've stopped using kafka-web console and are also implementing our own solution. I love that @claudemamo created this and has offerred it to be used by others (it's a nice little GUI). Unfortunately I don't think the Kafka Wiki should suggest people consider using kafka-web-console until this issue is closed. It really makes Kafka (and possibly your entire server) unstable.

Duplicate of #30

This isint a duplicate.

On Wed, Feb 18, 2015 at 7:43 AM, Claude Mamo notifications@github.com
wrote:

Duplicate of #30
#30


Reply to this email directly or view it on GitHub
#47 (comment)
.

I tried the fork in development, & open files are kept under control. Will roll to production in the next few days to see if this helps..

With https://github.com/ibanner56/kafka-web-console the system still hangs but it takes longer & not due to too many connections to kafka. I get a bunch of these when I do a sudo lsof:

java 16240 root 1535w FIFO 0,8 0t0 42163244 pipe
java 16240 root 1536u 0000 0,9 0 7808 anon_inode
java 16240 root 1537u 0000 0,9 0 7808 anon_inode
java 16240 root 1538u 0000 0,9 0 7808 anon_inode
java 16240 root 1539w FIFO 0,8 0t0 42193027 pipe
java 16240 root 1541r FIFO 0,8 0t0 42186896 pipe
java 16240 root 1542w FIFO 0,8 0t0 42186896 pipe
java 16240 root 1543r FIFO 0,8 0t0 42174664 pipe
java 16240 root 1544w FIFO 0,8 0t0 42174664 pipe
java 16240 root 1545u 0000 0,9 0 7808 anon_inode
java 16240 root 1546u 0000 0,9 0 7808 anon_inode
java 16240 root 1547r FIFO 0,8 0t0 42199219 pipe
java 16240 root 1548r FIFO 0,8 0t0 42176277 pipe
java 16240 root 1549w FIFO 0,8 0t0 42176277 pipe

Eventually, the system runs out of open files.. Don't have time to debug this at the moment.