Stanford CS149

NOTICE: If your are doing homework, you should not refer to the code in this repo, which might violate the bottom line of ACADEMIC INTEGRITY.

krr's implementation of cs149. Thanks to the original author for open source this project, so that we can have access to such a wonderful learning material.

asst1/: Performance Analysis
asst2/: Task Execution Library
asst3/: A Simple CUDA Render
asst4/: Big Graph Processing in OpenMP

Hints

asst1/: Refer to asst1/report.ipynb
asst2/: Coding problem, my implementation is to use a single queue with a lock, which is not the optimal solution. In part_b, I use a offline DAG calculation when calling sync(), which can only meet the requirements of abstraction.
asst3/: Consider collect the indices using prefix_sum as index, output[prefix_sum[i]] = i. The render uses the same idea however harder to implement(shared memory version is easier since handling 3D Array in CUDA is disgusting).

TODO

gemm_extra_credit

About

krr's own implementation for stanford CS149.

Languages

Language:C++ 60.2%Language:Jupyter Notebook 25.7%Language:Cuda 6.6%Language:Python 3.9%Language:Makefile 1.7%Language:C 0.9%Language:Raku 0.5%Language:Perl 0.4%Language:Shell 0.2%