benfred / HugeCTR

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

logo Merlin: HugeCTR

v30

HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs). HugeCTR supports model-parallel embedding tables and data-parallel neural networks and their variants such as Wide and Deep Learning (WDL), Deep Cross Network (DCN), DeepFM, and Deep Learning Recommendation Model (DLRM). HugeCTR is a component of NVIDIA Merlin Open Beta, which is used to build large-scale deep learning recommender systems. For additional information, see HugeCTR User Guide.

Design Goals:

  • Fast: HugeCTR is a speed-of-light CTR model framework that can outperform popular recommender systems such as TensorFlow (TF).
  • Efficient: HugeCTR provides the essentials so that you can efficiently train your CTR model.
  • Easy: Regardless of whether you are a data scientist or machine learning practitioner, we've made it easy for anybody to use HugeCTR.

Table of Contents

Core Features

HugeCTR supports a variety of features, including the following:

To learn about our latest enhancements, see our release notes.

Getting Started

If you'd like to quickly train a model using the Python interface, follow these steps:

  1. Start a NGC container with your local host directory (/your/host/dir mounted) by running the following command:

    docker run --runtime=nvidia --rm -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) -it nvcr.io/nvidia/hugectr:v3.0
    

    NOTE: The /your/host/dir directory is just as visible as the /your/container/dir directory. The /your/host/dir directory is also your starting directory.

  2. Inside the container, copy the DCN configuration file to our mounted directory (/your/container/dir).

    This config file specifies the DCN model architecture and its optimizer. With any Python use case, the solver clause within the configuration file is not used at all.

  3. Generate a synthetic dataset based on the configuration file by running the following command:

    data_generator ./dcn.json ./dataset_dir 434428 1
    

    The following set of files are created: ./file_list.txt, ./file_list_test.txt, and ./dataset_dir/*.

  4. Write a simple Python code using the hugectr module as shown here:

    # train.py
    import sys
    import hugectr
    from mpi4py import MPI
    
    def train(json_config_file):
      solver_config = hugectr.solver_parser_helper(batchsize = 16384,
                                                   batchsize_eval = 16384,
                                                   vvgpu = [[0,1,2,3,4,5,6,7]],
                                                   repeat_dataset = True)
      sess = hugectr.Session(solver_config, json_config_file)
      sess.start_data_reading()
      for i in range(10000):
        sess.train()
        if (i % 100 == 0):
          loss = sess.get_current_loss()
          print("[HUGECTR][INFO] iter: {}; loss: {}".format(i, loss))
    
    if __name__ == "__main__":
      json_config_file = sys.argv[1]
      train(json_config_file)
    
    

    NOTE: Update the vvgpu (the active GPUs), batchsize, and batchsize_eval parameters according to your GPU system.

  5. Train the model by running the following command:

    python train.py dcn.json
    

For additional information, see the HugeCTR User Guide.

Support and Feedback

If you encounter any issues and/or have questions, please file an issue here so that we can provide you with the necessary resolutions and answers. To further advance the Merlin/HugeCTR Roadmap, we encourage you to share all the details regarding your recommender system pipeline using this survey.

About

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

License:Apache License 2.0


Languages

Language:C++ 43.7%Language:Cuda 23.8%Language:Jupyter Notebook 22.4%Language:Python 8.1%Language:CMake 1.3%Language:Shell 0.4%Language:Dockerfile 0.2%Language:Perl 0.1%