neo4j-partners / hands-on-lab-neo4j-and-bedrock

Hands on lab for Neo4j and Amazon Bedrock

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autopilot Run Time

benofben opened this issue · comments

The Autopilot job is taking about an hour to run. We reduced the jobs from 5 to 3. We tried setting the stopping parameters that take times but that seems to cause some outputs to not be printed. So, that isn't going to work.

Rumi suggested two other things:
(1) Size up the machine
(2) Cut the dataset down

I'll try those and see how quick we can get it.

I truncated the data down to 10k rows and the runtime remained unchanged. Per a conversation with Rumi, it seems the runtime is dominated by deployment of machines, not actually processing.

I'm not sure how to scale the Autopilot infrastructure up. It seems serverless. Open question...

The only thing that remains per the Rumi conversation is to switch to another algorithm. I'd rather not as Autopilot is the key SageMaker feature. I'm going to work with SageMaker PM and request a lower runtime Autopilot invocation to validate infra. I think something analogous to terraform plan versus terraform apply would be useful.

I put some notes notes about leaving the job running into the notebooks.

It looks like maybe the notebook doesn't continue running if the browser is closed.

Resummarizing the issue ---

There are many use cases that require quick run times for building a machine learning model

  • Prototyping -- often it's nice to be able to run a model and verify that the inputs and outputs are as expected before investing hours in training a high quality model.
  • Demos -- Both in internal (selling to management) and external (selling SageMaker to end customer) cases, it can be very useful to be able to demonstrate the product running in real time and interactively.
  • Labs -- The lab material in this repo is intended to be run in three hours. Taking an hour to run an Autopilot job is untenable because the lab attendees have nothing to do for that time.

To this end, SageMaker Autopilot should offer the ability to train a low quality model in ~5 minutes. This would enable all these use cases.

We're stripping autopilot out of the new version of the lab, so this is now irrelevant.