No GPU activity when the job is running
patrick-douglas opened this issue · comments
Firstly I'm not using cgroups(This is required?)
My issue is:
I'm using the Torque version 5.1.3 (because version greater them failed to install in Linux Mint)
Configured with the following commands
#me@root:
cd torque-5.1.3-1462984387_205d70d
./configure --with-debug --enable-nvidia-gpus --with-sendmail
make
make install
cp contrib/init.d/debian.pbs_server /etc/init.d/pbs_server
cp contrib/init.d/debian.pbs_sched /etc/init.d/pbs_sched
cp contrib/init.d/debian.trqauthd /etc/init.d/trqauthd
sysv-rc-conf pbs_server on
sysv-rc-conf trqauthd on
sysv-rc-conf pbs_sched on
echo '/usr/local/lib'>/etc/ld.so.conf.d/torque.conf
ldconfig
service trqauthd restart
echo '/usr/local/lib'>/etc/ld.so.conf.d/torque.conf
echo "master.lbn.com">/var/spool/torque/server_name
./torque.setup root
echo "node01.lbn.com np=12 gpus=1" > /var/spool/torque/server_priv/nodes
service trqauthd restart
service pbs_server restart
service pbs_sched start
qmgr -c 'set server auto_node_np = True'
make packages
#Then I do ssh node01.lbn.com and run the following:
#root@node02
apt-get update
apt-get install g++ libssl-dev libxml2-dev sysv-rc-conf libboost-all-dev -y
cd torque-5.1.3-1462984387_205d70d
./configure --with-debug --enable-nvidia-gpus
make -j 2
make install -j 2
./torque-package-clients-linux-x86_64.sh --install
./torque-package-devel-linux-x86_64.sh --install
./torque-package-doc-linux-x86_64.sh --install
./torque-package-mom-linux-x86_64.sh --install
./torque-package-server-linux-x86_64.sh --install
echo '/usr/local/lib'>/etc/ld.so.conf.d/torque.conf
ldconfig
cp contrib/init.d/debian.pbs_mom /etc/init.d/pbs_mom
cp contrib/init.d/debian.trqauthd /etc/init.d/trqauthd
sysv-rc-conf trqauthd on
sysv-rc-conf pbs_mom on
service trqauthd restart
echo '$pbsserver master'>/var/spool/torque/mom_priv/config
echo '$logevent 225'>>/var/spool/torque/mom_priv/config
echo '$usercp *:/home /home'>>/var/spool/torque/mom_priv/config
service pbs_mom start
After run this I run "pbsnodes" command and node01 is ok, I can see all GPU info, however when a submit a job the nvidia-smi change the status of GPU to Exclusive-process and the GPU activity stay 0%, but the task still running (when I run "top")
My GPU is Nvidia-Tesla k40c but I already tested with GeforceGT 430 and no success
NOTE: I'm runing CUDA 8.0 and the latest NVIDIA Drivers
Please help-me!
Thank you in advance