eth-cscs / SpFFT

Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to get correct result using openMPI

Ondraol opened this issue · comments

Hi,

I try to use SpFFT library with openMPI. When i run following code using only single process i get correct result. However when i use 2 processes on same input, the result of Sparse FFT is incorrect. I tried different orientation (indexing of points in input domain) but unsuccessfully.

I attache simplified code I am using to make computation. This code is inspired by code in benchmark.cpp and example.cpp file.

#include <vector>
#include <omp.h>

#include "spfft/spfft.hpp"

std::vector<std::pair<int, int>> createAllXYIndexPairs(int dimX, int dimY) {
    std::vector<std::pair<int, int>> indices;
    indices.reserve(dimX * dimY);
    for (int x = 0; x < dimY; x++) {
        for (int y = 0; y < dimX; y++) {
            indices.emplace_back(x, y);
        }
    }
    return indices;
}

std::vector<int>
createTripleIndices(int dimZ, int numLocalZSticks, int offset, std::vector<std::pair<int, int>> xyIndicesGlobal) {
    std::vector<int> indices;
    indices.reserve(numLocalZSticks);
    for (int i = offset; i < offset + numLocalZSticks; i++) {
        for (int z = 0; z < dimZ; z++) {
            indices.push_back(xyIndicesGlobal[i].first);
            indices.push_back(xyIndicesGlobal[i].second);
            indices.push_back(z);
        }
    }
    return indices;
}


int main(int argc, char **argv) {
    int dimX = 2;
    int dimY = 2;
    int dimZ = 2;

    int num;
    MPI_Init_thread(nullptr, nullptr, MPI_THREAD_MULTIPLE, &num);

    int commRank = 0;
    int commSize = 1;

    MPI_Comm_size(MPI_COMM_WORLD, &commSize);
    MPI_Comm_rank(MPI_COMM_WORLD, &commRank);

    // Use default OpenMP value
    const int numThreads = omp_get_max_threads();

    // Input signal
    std::vector<double> signal = {2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};

    // Create all global x-y index pairs
    std::vector<std::pair<int, int>> xyIndicesGlobal = createAllXYIndexPairs(dimX, dimY);

    // Distribute z
    int numLocalZSticks =
            (xyIndicesGlobal.size()) / commSize + (commRank < (xyIndicesGlobal.size()) % commSize ? 1 : 0);
    const int offset =
            ((xyIndicesGlobal.size()) / commSize) * commRank +
            std::min(commRank, static_cast<int>(xyIndicesGlobal.size()) % commSize);

    // Assemble vector of x y z indices
    std::vector<int> xyzIndices = createTripleIndices(dimZ, numLocalZSticks, offset, xyIndicesGlobal);

    int maxLocalZLength = (dimZ / commSize) + (commRank < dimZ % commSize ? 1 : 0);

    spfft::Transform transform(numThreads, MPI_COMM_WORLD, SPFFT_EXCH_DEFAULT, SPFFT_PU_HOST, SPFFT_TRANS_C2C, dimX,
                               dimY, dimZ, maxLocalZLength, xyzIndices.size() / 3, SPFFT_INDEX_TRIPLETS,
                               xyzIndices.data());

    MPI_Barrier(MPI_COMM_WORLD);

    transform.backward(signal.data(), SPFFT_PU_HOST);

    double *spaceDomainPtr = transform.space_domain_data(SPFFT_PU_HOST);

    // Print result
    for (int i = 0; i < transform.local_slice_size(); ++i) {
        std::cout << "Backward transform point rank " << commRank << ": "  << spaceDomainPtr[2 * i] << ", " << spaceDomainPtr[2 * i + 1] << std::endl;
    }

    MPI_Finalize();
    return 0;
}

Same issue occurs when using grid instead of only Transform as follow:

    spfft::Grid grid(dimX, dimY, dimZ, numLocalZSticks, maxLocalZLength,
                     SPFFT_PU_HOST, numThreads, MPI_COMM_WORLD, SPFFT_EXCH_DEFAULT);

    MPI_Barrier(MPI_COMM_WORLD);

    auto transform = grid.create_transform(SPFFT_PU_HOST, SPFFT_TRANS_C2C, dimX, dimY, dimZ,
                                           maxLocalZLength, xyzIndices.size() / 3, SPFFT_INDEX_TRIPLETS,
                                           xyzIndices.data());

Thank you very much for any advice that allow me to solve this issue. I really appreciate it.

I believe the issue is, that the input pointer when executing the transforms needs to point to the local data.
So changing
transform.backward(signal.data(), SPFFT_PU_HOST);
to
transform.backward(signal.data() + 2 * (xyIndicesGlobal.size() / commSize) * dimZ, SPFFT_PU_HOST);
should give you the correct result in this example.

Hello,

Thank you for reply. You are right. The problem is in backward function input data. I focused too much on MPI, indexing atc. so I missed input data distribution.
I am just not sure about your computation of pointer shift . In this example for 2 * (xyIndicesGlobal.size() / commSize) * dimZ gives 8 for each rank. I think that correct computation in data shift for rank is for this case offset * xyIndicesGlobal.size() * 2.

Thank you very much for your help. I really appreciate it.

I'm glad I could help.
You are right, my suggested change is not entirely correct, since the MPI rank is missing and I assumed a different distribution of excess elements. I think the correct shift for your example might actually be offset * dimZ * 2, because the Z dimension is not split for the input signal. Only the output is split in Z, so essentially the output is transposed.

I'll close this issue for now, but feel free to reopen or create a new issue if you have further questions.