CNugteren / CLBlast

Tuned OpenCL BLAS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GFLOPs of cuBLAS Subprograms Being Computed as "inf", Causing Plotting Errors When Graphing Benchmark Results

tedliosu opened this issue · comments

(base) tedliosu@victus-ted:~/Documents/all_git/CLBlast_vs_CUDA/build$ python3 ../scripts/benchmark/benchmark.py --comparisons CPU-BLAS cuBLAS --platform 0 --device 0 --precision 32 --benchmark gemm
...
[benchmark] Saving benchmark results to '/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/sgemm_benchmarks.json'
[plot] Plotting subplot 0
[plot] Plotting subplot 1
[plot] Plotting subplot 2
[plot] Plotting subplot 3
[plot] Plotting subplot 4
Traceback (most recent call last):
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/../scripts/benchmark/benchmark.py", line 183, in <module>
    benchmark_single(**parsed_arguments)
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/../scripts/benchmark/benchmark.py", line 174, in benchmark_single
    plot.plot_graphs(results["benchmarks"], pdf_file_name, results["num_rows"], results["num_cols"],
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/scripts/benchmark/plot.py", line 94, in plot_graphs
    y_max = [max(y) if len(y) else 1 for y in y_list]
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/scripts/benchmark/plot.py", line 94, in <listcomp>
    y_max = [max(y) if len(y) else 1 for y in y_list]
TypeError: '>' not supported between instances of 'float' and 'str'
(base) tedliosu@victus-ted:~/Documents/all_git/CLBlast_vs_CUDA/build$ python3 ../scripts/benchmark/benchmark.py --comparisons CPU-BLAS cuBLAS --platform 0 --device 0 --precision 64 --benchmark gemm
...
[benchmark] Saving benchmark results to '/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/dgemm_benchmarks.json'
[plot] Plotting subplot 0
[plot] Plotting subplot 1
[plot] Plotting subplot 2
[plot] Plotting subplot 3
[plot] Plotting subplot 4
Traceback (most recent call last):
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/../scripts/benchmark/benchmark.py", line 183, in <module>
    benchmark_single(**parsed_arguments)
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/../scripts/benchmark/benchmark.py", line 174, in benchmark_single
    plot.plot_graphs(results["benchmarks"], pdf_file_name, results["num_rows"], results["num_cols"],
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/scripts/benchmark/plot.py", line 94, in plot_graphs
    y_max = [max(y) if len(y) else 1 for y in y_list]
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/scripts/benchmark/plot.py", line 94, in <listcomp>
    y_max = [max(y) if len(y) else 1 for y in y_list]
TypeError: '>' not supported between instances of 'float' and 'str'
(base) tedliosu@victus-ted:~/Documents/all_git/CLBlast_vs_CUDA/build$ python3 ../scripts/benchmark/benchmark.py --comparisons CPU-BLAS cuBLAS --platform 0 --device 0 --precision 3232 --benchmark gemm
...
[benchmark] Saving benchmark results to '/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/cgemm_benchmarks.json'
[plot] Plotting subplot 0
[plot] Plotting subplot 1
[plot] Plotting subplot 2
[plot] Plotting subplot 3
[plot] Plotting subplot 4
Traceback (most recent call last):
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/../scripts/benchmark/benchmark.py", line 183, in <module>
    benchmark_single(**parsed_arguments)
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/build/../scripts/benchmark/benchmark.py", line 174, in benchmark_single
    plot.plot_graphs(results["benchmarks"], pdf_file_name, results["num_rows"], results["num_cols"],
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/scripts/benchmark/plot.py", line 94, in plot_graphs
    y_max = [max(y) if len(y) else 1 for y in y_list]
  File "/home/tedliosu/Documents/all_git/CLBlast_vs_CUDA/scripts/benchmark/plot.py", line 94, in <listcomp>
    y_max = [max(y) if len(y) else 1 for y in y_list]
TypeError: '>' not supported between instances of 'float' and 'str'

As you may see within each JSON file produced by each benchmark run listed above (I've put the JSON files within here: offending_files2.zip), some of the cuBLAS subprograms are being computed as performing at an "infinite" number of GFLOPs, which causes the errors above when the Python script attempts to plot out the benchmark results. I'm not sure where the "inf" results are coming from, but if at least the script can be fixed so that results can be still plotted when "inf" GFLOPs are computed as the result of the performance of a BLAS subprogram that'd be great 😄

Closed by #447

By the way, sorry @CNugteren that I didn't ask about this earlier, but isn't there fundamentally an issue with the way that the GFlops of each BLAS execution is calculated if some of the resulting GFlops is being computed as "infinity?" So instead of just handling cases in the plotting script when the computed GFlops is "inf," won't the code used to compute the GFlops need to be rewritten so that it doesn't spit out "inf" esp. for the cuBLAS benchmark results? 👀

Hope you understand and don't mind 🙂

Yes you are right. However this is just a small plotting utility for benchmarking, so it doesn't have to be too sophisticated in my opinion. The likely cause here is that your cuBLAS benchmark got back with an execution time of 0ms for whatever reason. That got translated into a GFLOPS results of 'inf', which caused the whole benchmarking script to fail. Now if you look at the resulting plot you'll see that for some of the cuBLAS results the GFLOPS is just set to 0. But feel free to make a PR to make this more sophisticated if you think it is useful for others as well.

@CNugteren When I looked at my SGEMM graphed results I got on my 3050 Mobile, I just realized that the shape of the graph for the layouts/transpose "sub-benchmark" looked extremely similar to both the layouts/transpose "sub-benchmark" graphs for the 750 Ti and Titan X Pascal that you've posted on your website, as you may see in the three images below:
sgemm_plot_3050m_ted
sgemm_plot_cnugteren
sgemm_plot_cnugteren_750ti
Funny enough, I'm pretty sure that in all cases in the first graph (i.e. the graphed results for my 3050 mobile) where the GFLOPS is graphed as "0" (I've circled those data points in red and their "corresponding data points" in the two other graphs as well in red) the actual GFLOPS recorded were "inf"s from the raw benchmark results. So is it very likely that cuBLAS somehow doesn't support the layouts/transpose procedures that correspond to the first 4 data points of each of those layouts/transpose "sub-benchmark" graphs, or has there been a bug in calling the corresponding cuBLAS functions this entire time?

Btw here's the raw benchmark results that generated the first graph shown above for your reference:
sgemm_nv_benchmarks-json.txt

Good observation. I had a quick look at the cuBLAS docs and it seems only column-major layout is supported, so this is probably not supported by cuBLAS indeed. And that also seems to be what the CLBlast test infrastructure assumes.

Good observation. I had a quick look at the cuBLAS docs and it seems only column-major layout is supported, so this is probably not supported by cuBLAS indeed. And that also seems to be what the CLBlast test infrastructure assumes.

@CNugteren Thank you so much for confirming my observations; it seems then that not much can be done about the "inf"s then other than to treat them as zeros when graphing the benchmark results at least with the current state of things.

Since this issue has already been closed I'll leave things here for now but I may comment back here in the future just in case if cuBLAS changes its infrastructure (which is probably unlikely but we'll see).