huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sbatch arguements treated as filepath

Anacheron51 opened this issue · comments

Not clear why, breaks any attempt to execute the common crawl example script

2024-02-08 03:42:32.169 | INFO     | datatrove.executor.slurm:launch_job:249 - Launching Slurm job cc_datatrove-test (8000 tasks) with launch script "s3://datatrove-test/base_processing//logs/base_processing/datatrove-test/launch_script.slurm"
Traceback (most recent call last):
  File "/home/ubuntu/dt1.py", line 55, in <module>
    executor.run()
  File "/home/ubuntu/datatrove/src/datatrove/executor/slurm.py", line 169, in run
    self.launch_job()
  File "/home/ubuntu/datatrove/src/datatrove/executor/slurm.py", line 262, in launch_job
    self.job_id = launch_slurm_job(launch_file_contents, *args)
  File "/home/ubuntu/datatrove/src/datatrove/executor/slurm.py", line 349, in launch_slurm_job
    return subprocess.check_output(["sbatch", *args, f.name]).decode("utf-8").split()[-1]
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'sbatch'```

sbatch only appears in slurm.py and is not supplied by the common crawl example script. It appears to be useing the default empty dictionary.

Hi,
sbatch is a command from slurm used to submit jobs. Are you running the example on a slurm cluster? Is sbatch on your path?

The SlurmPipelineExecutor creates a sbatch script and then calls sbatch (filepath) on it to launch a job. This is expected. On your error the problem seems to be with the sbatch command itself