Usage with AWS S3 and Ray
0x2b3bfa0 opened this issue · comments
- Inspired by rom1504/img2dataset#272
- Depends on #58
- Depends on #60
Usage
Cluster creation
ray up --yes cluster.yml
ray dashboard cluster.yml
Job submission
git clone https://github.com/mlfoundations/datacomp
ray job submit \
--address=http://localhost:8265 \
--working-dir=datacomp \
--runtime-env-json="$(
jq --null-input '
{
conda: "datacomp/environment.yml",
env_vars: {
AWS_ACCESS_KEY_ID: env.AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY: env.AWS_SECRET_ACCESS_KEY,
AWS_SESSION_TOKEN: env.AWS_SESSION_TOKEN
}
}
'
)" \
-- \
python download_upstream.py \
--subjob_size=11520 \
--thread_count=128 \
--processes_count=1 \
--distributor=ray \
--metadata_dir=/tmp/metadata \
--data_dir=s3://datacomp-small \
--scale=small
Note
Image shards would be saved to the datacomp-small
AWS S3 bucket, specified with the --data_dir
option.
Cluster deletion
$ ray down --yes cluster.yml
Configuration
Sample cluster.yml
cluster_name: datacomp-downloader
min_workers: 0
max_workers: 10
upscaling_speed: 1.0
docker:
run_options: [--dns=127.0.0.1]
image: rayproject/ray:2.6.1-py310
container_name: ray
provider:
type: aws
region: us-east-1
cache_stopped_nodes: false
available_node_types:
ray.head.default:
resources: {}
node_config:
InstanceType: m5.12xlarge
ImageId: ami-068d304eca3399469
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
DeleteOnTermination: true
VolumeSize: 200
VolumeType: gp2
ray.worker.default:
resources: {}
node_config:
InstanceType: m5.12xlarge
ImageId: ami-068d304eca3399469
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
DeleteOnTermination: true
VolumeSize: 200
VolumeType: gp2
initialization_commands:
- wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
- sudo dpkg --install knot-resolver-release.deb
- sudo apt-get update
- sudo apt-get install --yes knot-resolver
- echo $(hostname --all-ip-addresses) $(hostname) | sudo tee --append /etc/hosts
- sudo systemctl start kresd@{1..48}.service
- echo nameserver 127.0.0.1 | sudo tee /etc/resolv.conf
- sudo systemctl stop systemd-resolved
setup_commands:
- sudo apt-get update
- sudo apt-get install --yes build-essential ffmpeg
Obscure details
-
When
--data_dir
points to a cloud storage like S3, we also have to specify a local--metadata_dir
because the downloader script doesn't support saving metadata to cloud storage. -
The last
pip install
on thesetup_commands
section is needed for compatibility with AWS S3, because the required libraries aren't included in theconda
environment file. -
There is no need to provide additional AWS credentials if the destination bucket is on the same account as the cluster, because it already has S3 full access through an instance profile.- While the cluster has a default instance profile that grants full S3 access, it doesn't seem to work as intended (probably due to rate limit of IMDS endpoint), and I ended up having to pass my local AWS credentials as environment variables.
-
The Python version in
environment.yml
must match the Python version of the Ray cluster; make sure thatdocker.image
oncluster.yaml
has exactly the same version as theenvironment.yml
from this project.
Hey why did you close it ?
I think it's a good improvement and people will review the PRs soon
Hello! I closed the issue because it wasn't quite actionable, but rather a “note to my future self” that could eventually become documentation. 🙈 I'll reopen it if you wish, though.
Alternative version, without containers.
cluster_name: datacomp-downloader
min_workers: 0
max_workers: 10
upscaling_speed: 1.0
provider:
type: aws
region: us-east-1
cache_stopped_nodes: false
available_node_types:
ray.head.default:
resources: {}
node_config:
InstanceType: m5.12xlarge
ImageId: ami-068d304eca3399469
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
DeleteOnTermination: true
VolumeSize: 200
VolumeType: gp2
ray.worker.default:
resources: {}
node_config:
InstanceType: m5.12xlarge
ImageId: ami-068d304eca3399469
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
DeleteOnTermination: true
VolumeSize: 200
VolumeType: gp2
initialization_commands:
# Knot Resolver
- wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
- sudo dpkg --install knot-resolver-release.deb
- rm knot-resolver-release.deb
- sudo apt-get update
- sudo apt-get install --yes knot-resolver
- echo $(hostname --all-ip-addresses) $(hostname) | sudo tee --append /etc/hosts
- sudo systemctl start kresd@{1..48}.service
- echo nameserver 127.0.0.1 | sudo tee /etc/resolv.conf
- sudo systemctl stop systemd-resolved
# Anaconda
- sudo mkdir /opt/miniconda3 && sudo chown $USER /opt/miniconda3
- wget https://repo.anaconda.com/miniconda/Miniconda3-py39_22.11.1-1-Linux-x86_64.sh
- bash Miniconda3-py39_22.11.1-1-Linux-x86_64.sh -f -b -p /opt/miniconda3
- rm Miniconda3-py39_22.11.1-1-Linux-x86_64.sh
- /opt/miniconda3/bin/conda init bash
# Ray
- conda create --yes --name=ray python=3.10.8
- echo conda activate ray >> ~/.bashrc
- pip install ray[all]==2.7.0
setup_commands:
- sudo apt-get update
- sudo apt-get install --yes build-essential ffmpeg