The Neuron Compiler optimizes ML models to run on Neuron devices. It accepts Machine Learning models in various formats (TensorFlow, MXNet, PyTorch, XLA HLO). The Neuron compiler is invoked within the ML framework, where ML models are sent to the compiler by the Neuron Framework plugin
. The resulting compiler artifact is called a NEFF file (Neuron Executable File Format)
that in turn is loaded by the Neuron runtime
to the Neuron device.
Neuron Runtime is responsible for executing ML models on Neuron Devices. Neuron Runtime determines which NeuronCore will execute which model and how to execute it. Configuration of the Neuron Runtime is controlled through the use of Environment variables at the process level. Neuron runtime consists of kernel driver and C/C++ libraries which provides APIs to access Inferentia and Trainium Neuron devices.
The Neuron ML frameworks plugins for TensorFlow, PyTorch and Apache MXNet use the Neuron runtime to load and run models on the NeuronCores
. Neuron runtime loads compiled
deep learning models, also referred to as Neuron Executable File Format (NEFF)
to the Neuron devices and is optimized for high-throughput and low-latency
.
Neuron Runtime Library consists of the libnrt.so and header files. These artifacts are version controlled and installed via the aws-neuronx-runtime-lib package. After installing the package, the binary (libnrt.so) is found in /opt/aws/neuron/lib
Ref:
Neuron devices are exposed to the containers using the –device option in the docker run command. Docker runtime (runc) does not yet support the ALL option to expose all neuron devices to the container. In order to do that an environment variable, AWS_NEURON_VISIBLE_DEVICES=ALL
can be used.
Context:
- The hooks enable Containers to be aware of events in their management lifecycle and run code implemented in a handler when the corresponding lifecycle hook is executed. There are two hooks that are exposed to Containers:
prestart
PostStart
PreStop
Downsides
- Multiple container applications running in the same host can share the devices but the cores cannot be shared. This is similar to running multiple applications in the host.
- In the kubernetes environment the devices cannot be shared by multiple containers in the pod.
Ref:
Neuron Collectives refers to a set of libraries used to support collective compute operations within the Neuron SDK.
https://huggingface.co/blog/pytorch-xla
Available images: https://github.com/aws/deep-learning-containers/blob/master/available_images.md
Fix to an error with AMI in the guide https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-11-amazon-linux-2/ ref: aws-neuron/aws-neuron-sdk#632