adaptivecomputing / torque

Torque Repository

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

every job stderr returns "permission denied"

wghilliard opened this issue · comments

Hello, I have installed torque from source with the pam module (and without the pam module) and every time a non-root user submits a job, the stderr output file prints some variant of the following:

$USER@mybox: qsub myscript.sh
-bash: line 1: /var/spool/torque/mom_priv/jobs/0.mybox.SC: Permission denied

$USER@mybox: cat myscript.sh
echo 'hello world'

 $USER@mybox: stat myscript.sh
 File: 'myscript.sh'
 Size: 25        	Blocks: 1          IO Block: 512    regular file
 Device: 2bh/43d	Inode: 12718771    Links: 1
 Access: (0777/-rwxrwxrwx)  Uid: ( 1000/$USER)   Gid: ( 1000/$USER)
 Access: 2016-11-07 13:58:41.096290480 -0600
 Modify: 2016-11-07 11:50:04.890717155 -0600
 Change: 2016-11-07 11:50:04.994712220 -0600
 Birth: -

Torque Version:

root@mybox:~# pbs_server --version
Version: 6.0.2
Commit: d9a34839a0f975d5c487bbfcf5dcb10b6a8f1e79 

./configure output:

Building components: server=yes mom=yes clients=yes gui=no drmaa=no pam=yes
PBS Machine type    : linux
Remote copy         : /usr/bin/scp -rpB
PBS home            : /var/spool/torque
Default server      : mybox

Unix Domain sockets : 
Linux cpusets       : no
Tcl                 : disabled
Tk                  : disabled
Authentication      : classic (pbs_iff)

OS info

 root@mybox:~# uname -a
 Linux mybox 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Something must be wrong with the permissions on the machine. What are the permissions set to on /var/spool/torque/mom_priv_jobs/ on that machine? Can you run interactive jobs?

root@mybox:/var/spool/torque# stat ./mom_priv/jobs/
File: './mom_priv/jobs/'
Size: 2               Blocks: 1          IO Block: 131072 directory
Device: 2eh/46d Inode: 60          Links: 2
Access: (0751/drwxr-x--x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2016-11-11 16:14:31.270695877 -0600
Modify: 2016-11-11 16:14:47.781986313 -0600
Change: 2016-11-11 16:14:47.781986313 -0600
Birth: -

Yes! I can run interactively:

$USER@mybox:~$ qsub -I
qsub: waiting for job 6.mybox to start
qsub: job 6.mybox ready

$USER@mybox:~$

It seems that the permissions issue is just around the job script on the mom. Did you check the permissions on that directory and any files in it?

There current are not any files in the $TORQUE_HOME/mom_priv/jobs directory but when I place a file there with my user as the owner, I cannot execute the script without executing sh or bash and passing the file as an argument:

$USER@mybox:~$ /var/spool/torque/mom_priv/jobs/myscript.sh
bash: /var/spool/torque/mom_priv/jobs/myscript.sh: Permission denied
$USER@mybox:~$ sh /var/spool/torque/mom_priv/jobs/myscript.sh
hello friend
$USER@mybox:~$ ls -l /var/spool/torque/mom_priv/jobs/
ls: cannot open directory '/var/spool/torque/mom_priv/jobs/': Permission denied
$USER@mybox:~$ sudo !!
sudo ls -l /var/spool/torque/mom_priv/jobs/
total 1
-rwxr-xr-x 1 $USER root 37 Nov 11 16:38 myscript.sh
$USER@mybox:~$ ls -l /var/spool/torque/mom_priv/
ls: cannot open directory '/var/spool/torque/mom_priv/': Permission denied
$USER@mybox:~$ sudo !!
sudo ls -l /var/spool/torque/mom_priv/
total 2
-rw-r--r-- 1 root root 25 Nov 11 16:14 config
-rw-r--r-- 1 root root 23 Nov 11 16:13 config~
drwxr-x--x 2 root root  3 Nov 11 16:45 jobs
-rw-r--r-- 1 root root  7 Nov 11 16:14 mom.lock
$USER@mybox:~$     

Am I making some obvious Linux mistake??

It looks like your permissions are okay for the jobs directory. When I run a job, the permissions on the job script are:

-rwx------ 1

What are the permissions on your job script when you get the error?

Hey so the this is a stat of the temp file torque creates when the job is submitted, is that the job script you are referring to?

root@mybox:~# stat /var/spool/torque/mom_priv/jobs/11.mybox.SC
File: '/var/spool/torque/mom_priv/jobs/11.mybox.SC'
Size: 25            Blocks: 1          IO Block: 512    regular file
Device: 2eh/46d Inode: 516         Links: 1 
Access: (0700/-rwx------)  Uid: ( 1000/$USER)   Gid: ( 1000/$USER)
Access: 2016-11-14 12:55:05.699956187 -0600 
Modify: 2016-11-14 12:55:05.699956187 -0600
Change: 2016-11-14 12:55:05.703956019 -0600
Birth: -

It's the file with the .SC extension that you need to look at.

I messed up the filename consistency by anonymizing the data, I've updated my previous message to reflect the correct filename.

Ok, those permissions look correct. I'm really not sure what can be happening on your system, but it has to be something around the permissions the job needs. I cannot reproduce this bug.

I would check /proc/mounts and make sure the mount flags affecting
/var/spool/torque don't include "noexec" or similar that would inhibit
direct execution (e.g., /path/to/foo) but permit directory traversal and
file reading (e.g., sh /path/to/foo).

Michael

On Nov 15, 2016 11:38 AM, "David Beer" notifications@github.com wrote:

Ok, those permissions look correct. I'm really not sure what can be
happening on your system, but it has to be something around the permissions
the job needs. I cannot reproduce this bug.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#400 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACzGqRJnEBx00XZzo0Bk2XqFv1z3pzLxks5q-fv1gaJpZM4KwH0b
.

@mej that was the issue, I'm running a ZFS filesystem and the noexec flag was set on the pool.

Thanks for the help @dbeer @mej