scylladb / scylla-jmx

Scylla JMX proxy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scylla-jmx failed to start after offline installation

amoskong opened this issue · comments

version: unified-package-0.20200824.9636a3399.tar.gz

Install steps:

start scylla

systemctl --user start scylla-server

systemctl --user status scylla-jmx -f |less

● scylla-jmx.service - Scylla JMX
   Loaded: loaded (/home/scylla-test/.config/systemd/user/../../../install_root/etc/systemd/scylla-jmx.service; linked; vendor preset: enabled)
  Drop-In: /home/scylla-test/.config/systemd/user/scylla-jmx.service.d
           └─nonroot.conf
   Active: failed (Result: exit-code) since Tue 2020-08-25 22:19:49 UTC; 25s ago
  Process: 48818 ExecStart=/home/scylla-test/install_root/jmx/scylla-jmx $SCYLLA_JMX_PORT $SCYLLA_API_PORT $SCYLLA_API_ADDR $SCYLLA_JMX_ADDR $SCYLLA_JMX_FILE $SCYLLA_JMX_LOCAL $SCYLLA_JMX_REMOTE $SCYLLA_JMX_DEBUG (code=exited, status=200/CHDIR)
 Main PID: 48818 (code=exited, status=200/CHDIR)

Aug 25 22:19:49 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[5910]: Started Scylla JMX.
Aug 25 22:19:49 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[48818]: scylla-jmx.service: Changing to the requested working directory failed: No such file or directory
Aug 25 22:19:49 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[48818]: scylla-jmx.service: Failed at step CHDIR spawning /home/scylla-test/install_root/jmx/scylla-jmx: No such file or directory
Aug 25 22:19:49 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[5910]: scylla-jmx.service: Main process exited, code=exited, status=200/CHDIR
Aug 25 22:19:49 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[5910]: scylla-jmx.service: Failed with result 'exit-code'.

/CC @syuu1228 @roydahan

This blocked nonroot install testing. /CC @slivne @roydahan

I tried to change scylla-jmx path in nonroot.conf:ExecStart to /home/scylla-test/install_root/jmx/symlinks/scylla-jmx, but it still failed.

● scylla-jmx.service - Scylla JMX
   Loaded: loaded (/home/scylla-test/.config/systemd/user/../../../install_root/etc/systemd/scylla-jmx.service; linked; vendor preset: enabled)
  Drop-In: /home/scylla-test/.config/systemd/user/scylla-jmx.service.d
           └─nonroot.conf
   Active: failed (Result: exit-code) since Tue 2020-08-25 22:54:07 UTC; 9s ago
  Process: 2567 ExecStart=/home/scylla-test/install_root/jmx/symlinks/scylla-jmx $SCYLLA_JMX_PORT $SCYLLA_API_PORT $SCYLLA_API_ADDR $SCYLLA_JMX_ADDR $SCYLLA_JMX_FILE $SCYLLA_JMX_LOCAL $SCYLLA_JMX_REMOTE $SCYLLA_JMX_DEBUG (code=exited, status=200/CHDIR)
 Main PID: 2567 (code=exited, status=200/CHDIR)

Aug 25 22:54:07 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[1620]: Started Scylla JMX.
Aug 25 22:54:07 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[2567]: scylla-jmx.service: Changing to the requested working directory failed: No such file or directory
Aug 25 22:54:07 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[2567]: scylla-jmx.service: Failed at step CHDIR spawning /home/scylla-test/install_root/jmx/symlinks/scylla-jmx: No such file or directory
Aug 25 22:54:07 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[1620]: scylla-jmx.service: Main process exited, code=exited, status=200/CHDIR
Aug 25 22:54:07 artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 systemd[1620]: scylla-jmx.service: Failed with result 'exit-code'.
[scylla-test@artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 ~]$ ls -l /home/scylla-test/install_root/jmx/symlinks/scylla-jmx
lrwxrwxrwx. 1 scylla-test scylla-test 13 Aug 25 08:25 /home/scylla-test/install_root/jmx/symlinks/scylla-jmx -> /usr/bin/java
[scylla-test@artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 ~]$ /home/scylla-test/install_root/jmx/symlinks/scylla-jmx -version
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)

However, I can successfully start scylla-jmx from cmdline:

[scylla-test@artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 ~]$ /home/scylla-test/install_root/jmx/symlinks/scylla-jmx -Xmx256m -XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.host=localhost -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=7199 -Djava.rmi.server.hostname=localhost -Dcom.sun.management.jmxremote.rmi.port=7199 -Djavax.management.builder.initial=com.scylladb.jmx.utils.APIBuilder -jar /home/scylla-test/install_root/jmx/scylla-jmx-1.0.jar
Connecting to http://localhost:10000
Starting the JMX server
JMX is enabled to receive remote connections on port: 7199
[scylla-test@artifacts-centos8-jenkins-db-node-b7a4fdf8-0-1 ~]$ install_root/share/cassandra/bin/nodetool status
Using /home/scylla-test/install_root/etc/scylla/scylla.yaml as the config file
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns    Host ID                               Rack
UN  127.0.0.1  981.38 KB  256          ?       379f7230-d7a6-4f8d-bce9-d1bc852e5389  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

This issue is solved by setting WorkingDirectory to empty in nonroot.conf
I posted a fix: #131

@amoskong Setting WorkingDirectory was for writing heap dump for the directory:
be8f1ac
Where heap dump goes when WorkingDirectory is empty?

@amoskong Setting WorkingDirectory was for writing heap dump for the directory:
be8f1ac
Where heap dump goes when WorkingDirectory is empty?

Coredump setup requires privilege, we can't do that for nonroot install.

@amoskong Setting WorkingDirectory was for writing heap dump for the directory:
be8f1ac
Where heap dump goes when WorkingDirectory is empty?

We can also set the WorkingDirectory to $prefix/ or $prefix/var/lib/scylla
Currently $prefix/var/lib/scylla directory won't be created after installation.

Coredump setup requires privilege, we can't do that for nonroot install.

@amoskong Setting WorkingDirectory was for writing heap dump for the directory:
be8f1ac
Where heap dump goes when WorkingDirectory is empty?

Coredump setup requires privilege, we can't do that for nonroot install.

No, I mean JVM heap dump, not coredump of native code that handled by Linux kernel.
Related scylladb/scylla-enterprise#1469

On the issue we found that without WorkingDirectory we running scylla-jmx.service at PWD="/", so when JVM tries to write heap dump it caused "Permission denied", so we changed the default WorkingDirectory to /var/lib/scylla:
be8f1ac

My question is, on --user mode with WorkingDirectory="", does JVM has enough permission to write the dump, and where is it?
If it's $HOME, it should be okay I think

@amoskong Setting WorkingDirectory was for writing heap dump for the directory:
be8f1ac
Where heap dump goes when WorkingDirectory is empty?

We can also set the WorkingDirectory to $prefix/ or $prefix/var/lib/scylla
Currently $prefix/var/lib/scylla directory won't be created after installation.

Right.

Currently $prefix/var/lib/scylla directory won't be created after installation.

That's because we uses $prefix as the data directory on nonroot mode:
https://github.com/scylladb/scylla/blob/master/install.sh#L374
So scylla will create commitlog/ data/ hints/ view_hints/ on $prefix (if it's doesn't working like that, should be a bug