Some issues when combining `deep_architect` and `ray.tune`
iacolippo opened this issue · comments
Hi, first of all, I'd like to thank you for building and releasing deep_architect
.
I am opening this issue because I'd like to use deep_architect
together with ray.tune
to get the best of both worlds, but I encountered some issues. Feel free to close this if you think it is out of the scope of the project.
My goal is to use the sampling capabilities of deep_architect
and the tools for multiprocessing and logging of ray
and ray.tune
. Therefore I'm using tune.run
and tune.Trainable
with the searchers
, helpers
and modules
of deep_architect
.
If I write my code with the call to the sampling function inside the _setup
method of a tune.Trainable
https://gist.github.com/iacolippo/1262c8afbfd9f5e491add5fbae105afa (line 124)
then I have an issue with ray(tensorboard)
logging. I'd say this is not an issue of deep_architect
, and it shouldn't be too hard to fix in the source code of ray if need be.
If I write my code as ray
wants it (the config["model"]
is the model object, in this case, a PytorchModel
from deep_architect
), then I have a different error.
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
https://gist.github.com/iacolippo/3f815fa90c254f7a065bdc446406233a (not that the ()
disappeared at line 124)
This might be an issue with deep_architect
and multiprocessing, or Pytorch itself, I don't know, I didn't dig into it too much for lack of time. Here is the traceback.
I am using
-e git+git@github.com:negrinho/deep_architect.git@3427c5d45b0cbdc9c2fe1f4e5213f6961ef41749#egg=deep_architect
ray==0.8.4
torch==1.5.0
torchvision==0.6.0a0+82fd1c8
Stay safe!
Hey there - this seems like a problem with Ray's documentation being unclear.
What if you just did:
class SimpleClassifierTrainable(tune.Trainable):
def _setup(self, config):
use_cuda = torch.cuda.is_available()
self.device = torch.device("cuda" if use_cuda else "cpu")
self.batch_size = config["batch_size"]
self.learning_rate = config.get("lr", 0.01)
self.train_loader, self.val_loader = get_dataloaders(self.batch_size)
##############################
# CREATE MODEL HERE
model = sample_model(in_features=784, num_classes=10))
self.model = model.to(self.device)
###############################
self.criterion = nn.CrossEntropyLoss()
self.optimizer = optim.Adam(self.model.parameters(),
lr=self.learning_rate)
Hi Iacopo. Apologies for the delay. Unfortunately, I haven't been able to dedicate much time to the DeepArchitect lately, but I'm looking to resume soon. I'm curious about whether how far did you go with DeepArchitect in your work. I'm not familiar with Ray but happy to integrate some functionality as it seems widely adopted now. I don't see any inherent problems in using DeepArchitect with Ray, provided that Ray does not need too much information about the workload that it is running (e.g., the exact architecture).
Hi @negrinho
No need to apologize :-) didn't have much time to work on this either.
The idea would be to be able to use DeepArchitect functions as a sampler for a Pytorch (or TF) model in the tune.run
parameter config
(see here: https://gist.github.com/iacolippo/3f815fa90c254f7a065bdc446406233a#file-ray_deep_architect_ex2-py-L201). This would make it really easy to scale an architecture search from a single machine to a cluster.
I will have a person working on a closely related project starting in October, so I will hopefully be able to give more detailed information soon.