mlfoundations / open_lm

A repository for research on medium sized language models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speed up loading remote checkpoints

achalddave opened this issue · comments

def pt_load(file_path, map_location=None):

fsspec is somehow really slow at loading large files in my experience, and right now we have every process reading from s3. This is quite slow at large model sizes; it would be nice to speed this up, probably via subprocess.run("aws s3 cp ...") in local_rank=0 and then loading locally from each worker.