JainTwinkle / CRAC-early-development

This is a public repository for the early development of CRAC (Checkpoint-Restart Architecture for CUDA Streams and UVM). A free and open-source production version is planned for the future.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CRAC: Checkpoint-Restart Architecture for CUDA Streams and UVM

Table of Contents

Introduction

This is a new DMTCP(https://github.com/dmtcp/dmtcp.git) plugin to checkpoint- restart CUDA application with noval split-process architecture. The Plugin code is in the contrib/split-cuda directory. CRAC consists of the plugin on top of DMTCP.

TODO

We are in the process of porting our code that is developed for the cluster's specific environment to make it more general.

DMTCP: Distributed MultiThreaded CheckPointing

(http://dmtcp.sourceforge.net/) Build Status

DMTCP is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Among the applications supported by DMTCP are MPI (various implementations), OpenMP, MATLAB, Python, Perl, R, and many programming languages and shell scripting languages. DMTCP also supports GNU screen sessions, including vim/cscope and emacs. With the use of TightVNC, it can also checkpoint and restart X Window applications. For a multilib (mixture of 32- and 64-bit processes), see "./configure --enable-multilib".

DMTCP supports the commonly used OFED API for InfiniBand, as well as its integration with various implementations of MPI, and resource managers (e.g., SLURM).

To install DMTCP, see INSTALL.md.

For an overview DMTCP, see QUICK-START.md.

For the license, see COPYING.

For more information on DMTCP, see: http://dmtcp.sourceforge.net.

For the latest version of DMTCP (both official release and git), see: http://dmtcp.sourceforge.net/downloads.html.

About

This is a public repository for the early development of CRAC (Checkpoint-Restart Architecture for CUDA Streams and UVM). A free and open-source production version is planned for the future.

License:Other


Languages

Language:C++ 46.7%Language:C 31.5%Language:Makefile 10.7%Language:Shell 6.3%Language:Python 2.0%Language:Perl 1.4%Language:M4 0.7%Language:Cuda 0.5%Language:Assembly 0.2%Language:TeX 0.0%Language:Java 0.0%Language:Dockerfile 0.0%