Luca96 / dlib-for-android

Compile and embed Dlib in your Android projects with ease.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance optimisation

lemberh opened this issue · comments

Do you support neon instruction set optimization from this thread davisking/dlib#276

Hi @lemberh, unfortunately I support only the basic architectures.

Did you try to use those optimizations?
I have changed cmake to :

  ${AndroidCmake}   -DBUILD_SHARED_LIBS=1 \
	  -DANDROID_NDK=${NDK} \
	  -DCMAKE_SYSTEM_NAME=Android \
	  -DCMAKE_TOOLCHAIN_FILE=${TOOLCHAIN} \
	  -DCMAKE_BUILD_TYPE=Release \
	  -DCMAKE_CXX_FLAGS="-std=c++11 -frtti -fexceptions -march=armv7-a -mfpu=neon" \
	  -DANDROID_ARM_NEON=TRUE \
	  -DCMAKE_C_FLAGS=-O3 \
	  -DANDROID_ABI=${abi} \
	  -DANDROID_PLATFORM=${MIN_SDK} \
	  -DANDROID_TOOLCHAIN=clang \
	  -DANDROID_STL=c++_shared \
	  -DANDROID_CPP_FEATURES=rtti exceptions \
	  -DCMAKE_PREFIX_PATH=../../ \
	  ../../			  

But it doesnt seems to have any impact on performance

Have you tried to add the ABI "armeabi-v7a with NEON" in the script?

You should edit line 17 of setup.sh (I guess you're using Linux), having something like this:

ABI=('armeabi-v7a' 'armeabi-v7a with NEON' 'arm64-v8a' 'x86' 'x86_64')

Let me know if now it works.

For more information you can read this.

I have tried this.
Unfortunately, I don't see any difference.
In Android CMake documentation it is said that

armeabi-v7a with NEON | Same as -DANDROID_ABI=armeabi-v7a -DANDROID_ARM_NEON=ON.

https://developer.android.com/ndk/guides/cmake#android_abi

I'm using face recognition, this example http://dlib.net/dnn_face_recognition_ex.cpp.html
but slightly modified without face detection.
On devices with arm64-v8a it takes around 700ms to calculate face vector.
But on devices with 'armeabi-v7a' from 2 up to 5 seconds to calculate face vector.
I'm wondering if that can be improved with NEON instructions.

To gain a performance speed you can try to:

  • Process grayscale images, instead of rgb.
  • Downscale the input images, as well as the network input.
  • Reduce the neural network size, i.e. less layers, less filters.

Thanks for suggestions, will try them!

building dlib with linked OpenBLAS improve performance greatly. Instructions can be found here davisking/dlib#1238 (comment)

for example on redmi 7a
face descriptor calculation took ~3.5 sec for me on default prebuilt dlib .so's,
after rebuilding dlib with OpenBLAS it takes ~350ms
10 times faster