NNPACK randomly crashes in specific case
wnagchenghku opened this issue · comments
Dear NNPACK author,
We are trying to deploy NNPACK in our production environment, but it segmentation faults in a simple case below. It is interesting that it crashes randomly: sometimes it runs with segfault, sometimes without any error.
#include <nnpack.h>
#include <string.h>
#include <stdlib.h>
#include <malloc.h>
#include <stdio.h>
#include <assert.h>
#include <sys/time.h>
void conv_1(float *T0, float *T1, char * workspaceBuffer, size_t * workspaceSize) {
float *bias = (float[64]){0};
float *kernelData = (float[9408]){0};
enum nnp_activation activation = nnp_activation_identity;
enum nnp_convolution_algorithm algorithm = nnp_convolution_algorithm_auto;
enum nnp_convolution_transform_strategy strategy = nnp_convolution_transform_strategy_compute;
enum nnp_status status = nnp_convolution_inference(algorithm, strategy, 3, 64, (struct nnp_size){.width = 224, .height = 224}, (struct nnp_padding){.top = 3, .right = 3, .bottom = 3, .left = 3}, (struct nnp_size){.width = 7, .height = 7}, (struct nnp_size){.width = 2, .height = 2}, T0, kernelData, bias, T1, workspaceBuffer, workspaceSize, activation, NULL, NULL, NULL);
}
void forward(float *T0) {
int sizes[4] = {1, 3, 224, 224};
char workspaceBuffer[39096320];
size_t workspaceSize = 39096320;
float T1[802816];
conv_1(T0, T1, workspaceBuffer, &workspaceSize);
float T4[1 * 64 * 56 * 56];
return;
}
int main(int argc, char const *argv[]) {
enum nnp_status status = nnp_initialize();
assert(status == nnp_status_success);
float T0[1 * 3 * 224 * 224];
for(int i = 0; i < 1 * 3 * 224 * 224; i++)
T0[i] = 1;
forward(T0);
status = nnp_deinitialize();
assert(status == nnp_status_success);
return 0;
}
Some information about our environment:
- X86_64 Linux with 3.5.0-17-generic kernel (carried with Ubuntu 12.10)
- Build NNPACK with cmake using gcc-4.7.2.
- Intel Xeon CPU E3-1231 with AVX2 and FMA3.
We tried to gdb it, it segfaults in nnp_sgemm_only_4x24__fma3()
. Since there is no debug symbol in peachpy generated object files, we are not able to go further.
To run this simple example, we set ulimit -s unlimited
to make unlimited stack size.
Please inform us if any more information is needed to fix it.
Many thanks for your time.
Update:
I have tested this code with NNPACK on Debian 9 (gcc 6.3, kernel 4.9). Still the same error. Anything wrong with this code?
I guess I have somehow found the problem. The allocated workspace size is not aligned.