Maratyszcza / NNPACK

Acceleration package for neural networks on multi-core CPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nnp_convolution_inference returns output with all zeros

srikanthparupati opened this issue · comments

I wrote small snippet of code to check NNPack convolution inference speed. For some reason nnp_convolution_inference method returns output with zeros. I am not able to figure out the issue with below snippet of code. Can you please help me with the issue.


nnp_status status = nnp_initialize();
nnp_convolution_transform_strategy transform_strategy = nnp_convolution_transform_strategy_precompute;
const nnp_convolution_algorithm algorithm = nnp_convolution_algorithm_auto; //nnp_convolution_algorithm_implicit_gemm;

size_t input_channels = 1;
size_t output_channels = 3; 
const nnp_size input_size = {5, 5}; 
const nnp_size output_size = {3, 3}; 
const nnp_padding input_padding = { 0, 0, 0, 0 };
const nnp_size kernel_size = {3, 3}; 
const nnp_size stride = { 1, 1 };

std::vector<float> input(input_size.width * input_size.height * input_channels);
std::vector<float> kernel(input_channels * kernel_size.width * kernel_size.height * output_channels);
std::vector<float> bias(output_channels);
std::vector<float> output(output_channels * input_size.width * input_size.height);

kernel = {1.0576, -0.0638, -0.3667, 0.2912,  0.9600, -0.2763, 0.4745,  0.0218, -0.4153, -0.2512,  2.2507,  0.3270,
      -0.5482, -0.0241, -0.3120, 0.5434, -2.8615,  0.9707, 1.5259, -0.8924, -0.4584, -0.3262,  1.2160, -0.5744, 1.2048, -1.1605,  0.7418};

input = {1.0163, -1.7396, -0.1464, -1.2687, -2.7988,
           0.1436, -0.0367,  0.0719, -1.0046,  0.7306,
           -0.5130, -1.0900, -0.8827,  0.5993,  0.8043,
           0.6443, -1.7176,  0.5912,  0.2367,  0.5063,
           -1.0304,  1.2539, -1.4350, -2.2669, -0.2690};
 bias = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};

//std::vector<uint8_t, AlignedAllocator<uint8_t, 32>> transformedKernel, workspace_buffer;
std::vector<float> workspace_buffer;

pthreadpool_t threadpool = pthreadpool_create(2);

size_t workspace_size = 0;
status = nnp_convolution_inference(
		algorithm, nnp_convolution_transform_strategy_precompute,
		input_channels, output_channels,
		input_size, input_padding, kernel_size, stride,
		NULL, NULL, NULL, NULL, NULL, &workspace_size,
		nnp_activation_identity, NULL,
		NULL, NULL);

if (status != nnp_status_success) {
    std::cout << "nnp failure status  " << status << std::endl;
    return -1; 
} 

std::cout << "Workspace buffer size  " << workspace_size << std::endl;

workspace_buffer.resize(workspace_size);

auto begin = chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now().time_since_epoch()).count();


        status = nnp_convolution_inference(
        algorithm,
        transform_strategy,
        input_channels,
        output_channels,
        input_size,
        input_padding,
        kernel_size,
        stride,
        input.data(),
        kernel.data(),
        bias.data(),
        output.data(),
        nullptr, //static_cast<void*>(workspace_buffer.data()),
        &workspace_size,
        nnp_activation_identity,
        NULL,
        NULL,
        NULL);
        std::cout << status << std::endl;


auto end = chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now().time_since_epoch()).count();
std::cout << "Use time " << (end - begin) / (times + 0.0) << "\n";

If you want output to have the same size as input, you should set input_padding = { 1, 1, 1, 1 }; (for 3x3 kernel)

@Maratyszcza In the above sample, I set the input padding to 0 since I set the output_size to {3, 3}.

I tried the above snippet by setting padding to 1 and output size same as input for 3x3 kernel. I am still seeing incorrect convolution output. (output array is set to 0). Do you see any issue with my code?

Do you check the status of the second nnp_convolution_inference call?

Also, strategy must be nnp_convolution_transform_strategy_compute

@Maratyszcza thanks a lot for response.

I am seeing "nnp_status_success" for second inference call with compute, precompute strategies.

If you want output to have the same size as input, you should set input_padding = { 1, 1, 1, 1 }; (for 3x3 kernel)

Also, strategy must be nnp_convolution_transform_strategy_compute

Can you tell me where is nnp_convolution_inference is defined please?