Autopilot Run Time

Question

Autopilot Run Time

benofben opened this issue 2 years ago · comments

The Autopilot job is taking about an hour to run. We reduced the jobs from 5 to 3. We tried setting the stopping parameters that take times but that seems to cause some outputs to not be printed. So, that isn't going to work.

Rumi suggested two other things:
(1) Size up the machine
(2) Cut the dataset down

I'll try those and see how quick we can get it.

Ben Lackey · Answer 1 · Sun Jun 26 2022 10:17:50 GMT+0800 (China Standard Time)

I truncated the data down to 10k rows and the runtime remained unchanged. Per a conversation with Rumi, it seems the runtime is dominated by deployment of machines, not actually processing.

I'm not sure how to scale the Autopilot infrastructure up. It seems serverless. Open question...

The only thing that remains per the Rumi conversation is to switch to another algorithm. I'd rather not as Autopilot is the key SageMaker feature. I'm going to work with SageMaker PM and request a lower runtime Autopilot invocation to validate infra. I think something analogous to terraform plan versus terraform apply would be useful.

Ben Lackey · Answer 2 · Sun Jun 26 2022 11:23:17 GMT+0800 (China Standard Time)

I put some notes notes about leaving the job running into the notebooks.

Ben Lackey · Answer 3 · Mon Jun 27 2022 05:56:19 GMT+0800 (China Standard Time)

It looks like maybe the notebook doesn't continue running if the browser is closed.

Ben Lackey · Answer 4 · Fri Jul 22 2022 02:37:52 GMT+0800 (China Standard Time)

Resummarizing the issue ---

There are many use cases that require quick run times for building a machine learning model

Prototyping -- often it's nice to be able to run a model and verify that the inputs and outputs are as expected before investing hours in training a high quality model.
Demos -- Both in internal (selling to management) and external (selling SageMaker to end customer) cases, it can be very useful to be able to demonstrate the product running in real time and interactively.
Labs -- The lab material in this repo is intended to be run in three hours. Taking an hour to run an Autopilot job is untenable because the lab attendees have nothing to do for that time.

To this end, SageMaker Autopilot should offer the ability to train a low quality model in ~5 minutes. This would enable all these use cases.

Ben Lackey · Answer 5 · Wed Nov 01 2023 04:53:17 GMT+0800 (China Standard Time)

We're stripping autopilot out of the new version of the lab, so this is now irrelevant.