NVIDIA / stdexec

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]: segfault with HPC SDK 23.7

gonzalobg opened this issue · comments

Reproducer:

#if 0                                                                                                                                                                                                                                              
  set -ex                                                                                                                                                                                                                                          
  nvc++ -std=c++20 --experimental-stdpar -stdpar -o bug $0                                                                                                                                                                                         
  ./bug                                                                                                                                                                                                                                            
  exit 0                                                                                                                                                                                                                                           
#endif                                                                                                                                                                                                                                             
#include <algorithm>                                                                                                                                                                                                                               
#include <execution>                                                                                                                                                                                                                               
#include <iostream>                                                                                                                                                                                                                                
#include <stdexec/execution.hpp>                                                                                                                                                                                                                   
#include <exec/static_thread_pool.hpp>                                                                                                                                                                                                             
#include <nvexec/stream_context.cuh>                                                                                                                                                                                                               
#include <exec/inline_scheduler.hpp>                                                                                                                                                                                                               
int main() {                                                                                                                                                                                                                                       
    auto gpu_ctx = nvexec::stream_context{};                                                                                                                                                                                                       
    auto cpu_ctx = exec::static_thread_pool{1};  // exec::inline_scheduler{};                                                                                                                                                                      
    auto sg = gpu_ctx.get_scheduler();                                                                                                                                                                                                             
    auto sc = cpu_ctx.get_scheduler();                                                                                                                                                                                                             
    std::vector<int> v;                                                                                                                                                                                                                            
    v.push_back(0);                                                                                                                                                                                                                                
    auto t = stdexec::schedule(sg)                                                                                                                                                                                                                 
      | stdexec::bulk(1, [v = v.data()](int i) { v[0] = 1; })                                                                                                                                                                                      
      | stdexec::bulk(1, [v = v.data()](int i) { v[0] += 1; })                                                                                                                                                                                     
      | stdexec::transfer(sc) | stdexec::then([v = v.data()] { v[0] += 1;})                                                                                                                                                                        
      | stdexec::transfer(sg) | stdexec::bulk(1, [v = v.data()](int i) { v[0] += 1; })                                                                                                                                                             
      | stdexec::transfer(sc) | stdexec::then([v = v.data()] { v[0] += 1;})                                                                                                                                                                        
      | stdexec::transfer(sg) | stdexec::bulk(1, [v = v.data()](int i) { v[0] += 1; })                                                                                                                                                             
    ;                                                                                                                                                                                                                                              
    stdexec::sync_wait(std::move(t));                                                                                                                                                                                                              
    if (v[0] == 6) {                                                                                                                                                                                                                               
        std::cerr << "Success!" << std::endl;                                                                                                                                                                                                      
        return 0;                                                                                                                                                                                                                                  
    }                                                                                                                                                                                                                                              
    std::cerr << "Failed: " << v[0] << std::endl;                                                                                                                                                                                                  
    return 1;                                                                                                                                                                                                                                      
}  

I can reproduce the crash with the HPC SDK 23.7.

The --experimenal-stdpar flag makes the compiler use the version of stdexec that shipped with the SDK. If, however, I remove that flag and instead point the build at the latest version of stdexec from main, the the program runs successfully for me.

@gonzalobg can you confirm? closing this for now.