juniorprincewang / virtio-cuda-module

cuda-supported qemu front-driver and test case

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

virtio-cuda-module

This is the para-virtualized front driver of cuda-supported qemu and test case.

The user runtime wrappered library in VM (guest OS) provides CUDA runtime access, interfaces of memory allocation, CUDA commands, and passes cmds to the driver.

The front-end driver is responsible for the memory management, transferring data, analyzing the ioctl cmds from the customed library, and passing the cmds by the control channel.

Installation

Prerequisites

The our experiment environment is as follows:

Host

  • Ubuntu 16.04.5 LTS (kernel v4.15.0-29-generic x86_64)
  • cuda-9.1
  • PATH
echo 'export PATH=$PATH:/usr/local/cuda/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib:/usr/local/cuda/lib64' >> ~/.bashrc
source ~/.bashrc

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig
  • Install required packages
sudo apt-get install -y  pkg-config bridge-utils uml-utilities zlib1g-dev libglib2.0-dev autoconf
automake libtool libsdl1.2-dev libsasl2-dev libcurl4-openssl-dev libsasl2-dev libaio-dev libvde-dev libspice-server-dev

Guest

  • Ubuntu 16.04 x86_64 image (guest OS)
  • cuda-9.1 toolkit

How to install

Host

Guest

  1. clone this repo.
    [to do]

A CUDA sample in guest OS

In the guest OS, nvcc compiles the source with host/device code and standard CUDA runtime APIs. To compare with a native OS, in the guest VM, compiling the CUDA program must add the nvcc flag "-cudart=shared", which can be dynamically linked to the userspace library as a shared library.
Therefore, the wrappered library provided functions that intercepted dynamic memory allocation of CPU code and CUDA runtime APIs.
After installing qCUdriver and qCUlibrary in the guest OS, modify the internal flags in the Makefile as below:

# internal flags
NVCCFLAGS   := -m${TARGET_SIZE} --cudart=shared      

Finally, run make and perform the executable file without change any source code by LD_PRELOAD or change the LD_LIBRARY_PATH.

LD_PRELOAD=\path\to\libvcuda.so ./vectorAdd
  • benchmarking vectorAdd

A command-line benchmarking tool hyperfine is recommended.
To run a benchmark, you can simply call hyperfine <command>.... , for example.

hyperfine 'LD_PRELOAD=\path\to\libvcuda.so ./vectorAdd'

By default, Hyperfine will perform at least 10 benchmarking runs. To change this, you can use the -m/--min-runs option or -M/--max-runs.

supported API

CUDA Runtime API

In our current version, we implement necessary CUDA runtime APIs. These CUDA runtime API are shown as below:

Classification supported CUDA runtime API
Memory Management cudaMalloc
cudaMemset
cudaMemcpy
cudaMemcpyAsync
cudaFree
cudaMemGetInfo
cudaMemcpyToSymbol
cudaMemcpyFromSymbol
Device Management cudaGetDevice
cudaGetDeviceCount
cudaSetDevice
cudaSetDeviceFlags
cudaGetDeviceProperties
cudaDeviceSynchronize
cudaDeviceReset
Stream Management cudaStreamCreate
cudaStreamCreateWithFlags
cudaStreamDestroy
cudaStreamSynchronize
cudaStreamWaitEvent
Event Management cudaEventCreate
cudaEventCreateWithFlags
cudaEventRecord
cudaEventSynchronize
cudaEventElapsedTime
cudaEventDestroy
cudaEventQuery
Error Handling cudaGetLastError
cudaGetErrorString
Zero-copy cudaHostRegister
~~cudaHostGetDevicePointer~~
cudaHostUnregister
cudaHostAlloc
cudaMallocHost
cudaFreeHost
cudaSetDeviceFlags
Thread Management cudaThreadSynchronize
Module & Execution Control __cudaRegisterFatBinary
__cudaUnregisterFatBinary
__cudaRegisterFunction
__cudaRegisterVar
cudaConfigureCall
cudaSetupArgument
cudaLaunch

CUBLAS API & CURAND API

To support Caffe, we implement CUBLAS & CURAND API in libcudart.so.

Classification supported API
CUBLAS API cublasCreate
cublasDestroy
cublasSetVector
cublasGetVector
cublasSetStream
cublasGetStream
cublasSasum
cublasDasum
cublasScopy
cublasDcopy
cublasSdot
cublasDdot
cublasSaxpy
cublasDaxpy
cublasSscal
cublasDscal
cublasSgemv
cublasDgemv
cublasSgemm
cublasDgemm
cublasSetMatrix
cublasGetMatrix
CURAND API curandCreateGenerator
curandCreateGeneratorHost
curandGenerate
curandGenerateUniform
curandGenerateUniformDouble
curandGenerateNormal
curandGenerateNormalDouble
curandDestroyGenerator
curandSetGeneratorOffset
curandSetPseudoRandomGeneratorSeed

Supported Software

Last but not least, thanks qcuda for inspiring.
Also, what we use for message channels is [chan: Pure C implementation of Go channels. ](https://github.com/tylertreat/chan.gitPure C implementation of Go channels. )

About

cuda-supported qemu front-driver and test case


Languages

Language:C++ 39.4%Language:C 24.6%Language:Makefile 19.4%Language:Cuda 16.4%Language:Objective-C 0.2%Language:GLSL 0.1%Language:Python 0.0%Language:Shell 0.0%