Maratyszcza / NNPACK

Acceleration package for neural networks on multi-core CPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

/usr/bin/ld: error: src/x86_64-fma/2d-fourier-16x16.py.o: unaligned data

yurivict opened this issue · comments

It fails on FreeBSD 12 with this error:

[43/43] : && /usr/bin/cc -fPIC -O2 -pipe  -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -O2 -pipe  -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing  -fstack-protector-strong -L/usr/local/lib -shared -Wl,-soname,libnnpack.so -o libnnpack.so src/x86_64-fma/2d-fourier-8x8.py.o src/x86_64-fma/2d-fourier-16x16.py.o src/x86_64-fma/2d-winograd-8x8-3x3.py.o src/x86_64-fma/blas/s8gemm.py.o src/x86_64-fma/blas/c8gemm.py.o src/x86_64-fma/blas/s4c6gemm.py.o src/x86_64-fma/blas/conv1x1.py.o src/x86_64-fma/blas/sgemm.py.o src/x86_64-fma/max-pooling.py.o src/x86_64-fma/relu.py.o src/x86_64-fma/softmax.py.o src/x86_64-fma/blas/sdotxf.py.o src/x86_64-fma/blas/shdotxf.py.o CMakeFiles/nnpack.dir/src/init.c.o CMakeFiles/nnpack.dir/src/convolution-inference.c.o CMakeFiles/nnpack.dir/src/fully-connected-inference.c.o CMakeFiles/nnpack.dir/src/pooling-output.c.o CMakeFiles/nnpack.dir/src/relu-output.c.o CMakeFiles/nnpack.dir/src/softmax-output.c.o CMakeFiles/nnpack.dir/src/fully-connected-output.c.o CMakeFiles/nnpack.dir/src/relu-input-gradient.c.o CMakeFiles/nnpack.dir/src/convolution-input-gradient.c.o CMakeFiles/nnpack.dir/src/convolution-kernel-gradient.c.o CMakeFiles/nnpack.dir/src/convolution-output.c.o CMakeFiles/nnpack.dir/src/x86_64-fma/softmax.c.o  -Wl,-rpath,/wrkdirs/usr/ports/science/nnpack/work/.build/deps/cpuinfo: deps/cpuinfo/libcpuinfo.so -lpthreadpool && :
FAILED: libnnpack.so
: && /usr/bin/cc -fPIC -O2 -pipe  -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -O2 -pipe  -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing  -fstack-protector-strong -L/usr/local/lib -shared -Wl,-soname,libnnpack.so -o libnnpack.so src/x86_64-fma/2d-fourier-8x8.py.o src/x86_64-fma/2d-fourier-16x16.py.o src/x86_64-fma/2d-winograd-8x8-3x3.py.o src/x86_64-fma/blas/s8gemm.py.o src/x86_64-fma/blas/c8gemm.py.o src/x86_64-fma/blas/s4c6gemm.py.o src/x86_64-fma/blas/conv1x1.py.o src/x86_64-fma/blas/sgemm.py.o src/x86_64-fma/max-pooling.py.o src/x86_64-fma/relu.py.o src/x86_64-fma/softmax.py.o src/x86_64-fma/blas/sdotxf.py.o src/x86_64-fma/blas/shdotxf.py.o CMakeFiles/nnpack.dir/src/init.c.o CMakeFiles/nnpack.dir/src/convolution-inference.c.o CMakeFiles/nnpack.dir/src/fully-connected-inference.c.o CMakeFiles/nnpack.dir/src/pooling-output.c.o CMakeFiles/nnpack.dir/src/relu-output.c.o CMakeFiles/nnpack.dir/src/softmax-output.c.o CMakeFiles/nnpack.dir/src/fully-connected-output.c.o CMakeFiles/nnpack.dir/src/relu-input-gradient.c.o CMakeFiles/nnpack.dir/src/convolution-input-gradient.c.o CMakeFiles/nnpack.dir/src/convolution-kernel-gradient.c.o CMakeFiles/nnpack.dir/src/convolution-output.c.o CMakeFiles/nnpack.dir/src/x86_64-fma/softmax.c.o  -Wl,-rpath,/wrkdirs/usr/ports/science/nnpack/work/.build/deps/cpuinfo: deps/cpuinfo/libcpuinfo.so -lpthreadpool && :
/usr/bin/ld: error: src/x86_64-fma/2d-fourier-16x16.py.o: unaligned data
cc: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

Looking into it, it seems to be a very bad idea to compile the python code into binaries. People who write in Python usually believe that Python is "good enough" for most everything. If this isn't true, then this code should be rewritten in a more performant language, like C++ (or C or Rust). Writing in Python first, and then discovering that this was a bad idea and trying to accelerate it using some custom-written compiler isn't a good idea IMO, because it leads to errors like this.

I agree. I like NNPACK, but the avx2 part should been written with psimd.h like it does for the SSE4.1 instruction set. Now there's a dependency on PeachPy. When porting NNPACK for Windows the src/x86_64-fma/2d-fourier-16x16.py is the only kernel script were things go wrong when generating object code for the Windows platform.