Merlin: HugeCTR

HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs). HugeCTR supports model-parallel embedding tables and data-parallel neural networks and their variants such as Wide and Deep Learning (WDL), Deep Cross Network (DCN), DeepFM, and Deep Learning Recommendation Model (DLRM). HugeCTR is a component of NVIDIA Merlin Open Beta, which is used to build large-scale deep learning recommender systems. For additional information, see HugeCTR User Guide.

Design Goals:

Fast: HugeCTR is a speed-of-light CTR model framework that can outperform popular recommender systems such as TensorFlow (TF).
Efficient: HugeCTR provides the essentials so that you can efficiently train your CTR model.
Easy: Regardless of whether you are a data scientist or machine learning practitioner, we've made it easy for anybody to use HugeCTR.

Core Features

HugeCTR supports a variety of features, including the following:

To learn about our latest enhancements, see our release notes.

Getting Started

If you'd like to quickly train a model using the Python interface, follow these steps:

Start a NGC container with your local host directory (/your/host/dir mounted) by running the following command:
```
docker run --runtime=nvidia --rm -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) -it nvcr.io/nvidia/hugectr:v3.0
```
NOTE: The /your/host/dir directory is just as visible as the /your/container/dir directory. The /your/host/dir directory is also your starting directory.
Inside the container, copy the DCN configuration file to our mounted directory (/your/container/dir).

This config file specifies the DCN model architecture and its optimizer. With any Python use case, the solver clause within the configuration file is not used at all.
Generate a synthetic dataset based on the configuration file by running the following command:
```
data_generator ./dcn.json ./dataset_dir 434428 1
```
The following set of files are created: ./file_list.txt, ./file_list_test.txt, and ./dataset_dir/*.

Write a simple Python code using the hugectr module as shown here:

# train.py
import sys
import hugectr
from mpi4py import MPI

def train(json_config_file):
  solver_config = hugectr.solver_parser_helper(batchsize = 16384,
                                               batchsize_eval = 16384,
                                               vvgpu = [[0,1,2,3,4,5,6,7]],
                                               repeat_dataset = True)
  sess = hugectr.Session(solver_config, json_config_file)
  sess.start_data_reading()
  for i in range(10000):
    sess.train()
    if (i % 100 == 0):
      loss = sess.get_current_loss()
      print("[HUGECTR][INFO] iter: {}; loss: {}".format(i, loss))

if __name__ == "__main__":
  json_config_file = sys.argv[1]
  train(json_config_file)

NOTE: Update the vvgpu (the active GPUs), batchsize, and batchsize_eval parameters according to your GPU system.

Train the model by running the following command:
```
python train.py dcn.json
```

For additional information, see the HugeCTR User Guide.

Support and Feedback

If you encounter any issues and/or have questions, please file an issue here so that we can provide you with the necessary resolutions and answers. To further advance the Merlin/HugeCTR Roadmap, we encourage you to share all the details regarding your recommender system pipeline using this survey.

About

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

Apache License 2.0

Languages

Language:C++ 43.7%Language:Cuda 23.8%Language:Jupyter Notebook 22.4%Language:Python 8.1%Language:CMake 1.3%Language:Shell 0.4%Language:Dockerfile 0.2%Language:Perl 0.1%