getting error while training, .solverstate

Question

getting error while training, .solverstate

mostafa8026 opened this issue 3 years ago · comments

Mostafa commented 3 years ago

(A Markdown syntax reminder is available here: https://guides.github.com/features/mastering-markdown/)

Before creating a new issue, please make sure that:

the problem isn't already identified in the in the FAQ page,
your problem isn't listed in an existing issue (open or closed one),
all prerequisite tools and dependencies are installed.

If Ok, please give as many details as possible to help us solve the problem more efficiently.

Configuration

Version of DeepDetect:
- Locally compiled on:
  - Ubuntu 18.04 LTS
  - Other:
- Docker CPU
- Docker GPU
- Amazon AMI
Commit (shown by the server when starting):

Your question / the problem you're facing:

When I want to train a simple classification (dog_cat example), I face the following error:

resuming a model requires a .solverstate file in model repository

Error message (if any) / steps to reproduce the problem:

list of API calls:
as shown in the sample.
Server log output:

4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.201] [dogs_cats] [info] selected solver: SGD
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.201] [dogs_cats] [info] solver flavor : rectified
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.201] [dogs_cats] [info] detected network type is classification
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.203] [dogs_cats] [error] resuming a model requires a .solverstate file in model repository
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.219] [dogs_cats] [error] training status call failed: Dynamic exception type: dd::MLLibBadParamException
4c01f52c6171_cpu_deepdetect_1 | std::exception::what: resuming a model requires a .solverstate file in model repository
4c01f52c6171_cpu_deepdetect_1 |
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.220] [api] [error] {"code":400,"msg":"BadRequest","dd_code":1006,"dd_msg":"Service Bad Request Error: resuming a model requires a .solverstate file in model repository"}

Emmanuel Benazera · Answer 1 · Sun Sep 26 2021 22:05:39 GMT+0800 (China Standard Time)

Hi, make sure resume is set to False

Mostafa · Answer 2 · Mon Sep 27 2021 03:58:34 GMT+0800 (China Standard Time)

Thanks. I didn't know that. maybe it is a good idea to mention that in the docs.

Mostafa · Answer 3 · Mon Sep 27 2021 14:35:24 GMT+0800 (China Standard Time)

@beniz previous error was fixed, but when I want to use the trained service I get the following error when:

request:

curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "dogs_cats",
 "parameters": {
  "input": {},
  "output": {
   "confidence_threshold": 0.3,
   "bbox": true
  },
  "mllib": {
   "gpu": true
  }
 },
 "data": [
  "/opt/platform/data/10021/01.jpg"
 ]
}'

response:

{
 "status": {
  "code": 400,
  "msg": "BadRequest",
  "dd_code": 1006,
  "dd_msg": "Service Bad Request Error: no deploy file in /opt/platform/models/private/dogs_cats for initializing the net"
 }
}

How could I create a deploy file?

Guillaume Infantes · Answer 4 · Mon Sep 27 2021 14:58:18 GMT+0800 (China Standard Time)

Hi
in order to do inference (predict), you need to have the network definition (in cas e of caffe inference, the deploy file) and the network weights into the repository .
The deploy file is created by deepdetect when you use a template definition and learn your own net, or comes with a pre-train network definition.
What are you trying to do exactly ?

Mostafa · Answer 5 · Tue Sep 28 2021 02:22:52 GMT+0800 (China Standard Time)

@fantes last problem has been resolved, and now I get another problem, the main api engine got killed whenever a large number of pics are added for indexing. I'm implementing similarity search by this link

request:

curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
       "service":"simsearch",
       "parameters":{
         "input":{ "height": 224, "width": 224 },
         "output":{
  "index":true,
  "index_gpu":true,
  "index_gpuid":0,
  "index_type":"IVF20,SQ8",
  "train_samples":500,
  "ondisk":true,
  "nprobe":10  },
         "mllib":{ "extract_layer":"pool5/7x7_s1"  }
       },
       "data":["/opt/platform/data/large-number-of-files/"]
     }'

response

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.21.1</center>
</body>
</html>

logs

4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.098] [torchlib] [info] Ignoring source layer conv4_3_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.099] [torchlib] [info] Ignoring source layer conv4_4_1x1_increase/bn_conv4_4_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.099] [torchlib] [info] Ignoring source layer conv4_4_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.100] [torchlib] [info] Ignoring source layer conv4_5_1x1_increase/bn_conv4_5_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.100] [torchlib] [info] Ignoring source layer conv4_5_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.101] [torchlib] [info] Ignoring source layer conv4_6_1x1_increase/bn_conv4_6_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.101] [torchlib] [info] Ignoring source layer conv4_6_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.103] [torchlib] [info] Ignoring source layer conv5_1_1x1_increase/bn_conv5_1_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.104] [torchlib] [info] Ignoring source layer conv5_1_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.109] [torchlib] [info] Ignoring source layer conv5_2_1x1_increase/bn_conv5_2_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.109] [torchlib] [info] Ignoring source layer conv5_2_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.112] [torchlib] [info] Ignoring source layer conv5_3_1x1_increase/bn_conv5_3_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.112] [torchlib] [info] Ignoring source layer conv5_3_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.114] [torchlib] [info] Ignoring source layer classifier_classifier_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.197] [simsearch] [info] Net total flops=3860541312 / total params=28070976
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.197] [simsearch] [info] detected network type is classification
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.208] [simsearch] [info] imginputfileconn: list subdirs size=0
4c01f52c6171_cpu_deepdetect_1 | tcmalloc: large alloc 1355153408 bytes == 0x5609fb00e000 @
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:12.194] [api] [info] HTTP/1.1 "GET /info" <n/a> 200 0ms
platform_ui_1    | 172.29.229.1 - - [27/Sep/2021:18:21:12 +0000] "GET /api/deepdetect/info? HTTP/1.1" 200 433 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
4c01f52c6171_cpu_deepdetect_1 | tcmalloc: large alloc 1355153408 bytes == 0x560a4bc6e000 @
4c01f52c6171_cpu_deepdetect_1 | Killed

Emmanuel Benazera · Answer 6 · Tue Sep 28 2021 04:56:28 GMT+0800 (China Standard Time)

Hi, you need to iterate your image list and send batches. Your machine does not have enough RAM to hold them all at once.

Mostafa · Answer 7 · Tue Sep 28 2021 17:17:18 GMT+0800 (China Standard Time)

Thanks, I can index and build them, but after searching, the pictures are not shown, because they are addressed through /opt/platform/data, how can I fix the problem. where is the correct picture addresses?
take a look

Mostafa · Answer 8 · Tue Sep 28 2021 17:24:56 GMT+0800 (China Standard Time)

they must addressed from /data/ but they are addressed from /opt/platform/data

Emmanuel Benazera · Answer 9 · Tue Sep 28 2021 18:55:36 GMT+0800 (China Standard Time)

/opt/platform/data should automatically link to your $DD_PLATFORM/data directory on the host.

In your case, maybe make sure to link your /data directory the $DD_PLATFORM/data.

Mostafa · Answer 10 · Tue Sep 28 2021 18:57:34 GMT+0800 (China Standard Time)

another question:
how can I persist service data, because whenever the service got killed, all the services are vanished (I mean added services that use in the predicate)

Mostafa · Answer 11 · Tue Sep 28 2021 19:20:46 GMT+0800 (China Standard Time)

after restarting the dd engine, I get this error after adding new service:

[error] service creation mllib bad param: using template while model prototxt and network weights exist, remove 'template' from 'mllib' or remove prototxt files instead

I create the service with the following command:


curl -X PUT 'http://172.29.229.69:1913/api/deepdetect/services/simsearch' -d '{
       "mllib":"caffe",
       "description":"similarity search service",
       "type":"unsupervised",
       "parameters":{
         "input":{
           "connector":"image",
           "height": 224,
           "width": 224
         },
    "output": {
      "store_config": true
    },
         "mllib":{
           "nclasses":1000,
           "template": "se_resnet_50"
         }
       },
       "model":{
         "repository":"/opt/platform/models/private/simsearch/",
         "templates":"/opt/deepdetect/build/templates/caffe/"
       }
     }'

Mostafa · Answer 12 · Tue Sep 28 2021 20:10:48 GMT+0800 (China Standard Time)

I have to delete this two files every time. Is there any way to avoid that?

$ rm models/private/simsearch/se_resnet_50.prototxt
$ rm models/private/simsearch/se_resnet_50_solver.prototxt

Emmanuel Benazera · Answer 13 · Tue Sep 28 2021 20:25:15 GMT+0800 (China Standard Time)

after restarting the dd engine, I get this error after adding new service:
[error] service creation mllib bad param: using template while model prototxt and network weights exist, remove 'template' from 'mllib' or remove prototxt files instead

The error says it all, remove the template argument from your API call, since the model already exists. This error prevents sending calls with neural net templates that do not match the existing model.

Mostafa · Answer 14 · Tue Sep 28 2021 20:53:15 GMT+0800 (China Standard Time)

I get a new error:

request:

curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {},
  "output": {
   "confidence_threshold": 0.3,
   "search_nn": 10,
   "search": true
  },
  "mllib": {
   "gpu": true,
   "extract_layer": "pool5/7x7_s1"
  }
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

deepdetect_1     | [2021-09-28 12:50:11.236] [simsearch] [error] unsupervised output needs mllib.extract_layer param
deepdetect_1     | [2021-09-28 12:50:11.237] [api] [info] HTTP/1.1 "POST /predict" simsearch 200 1042ms
platform_ui_1    | 172.29.229.1 - - [28/Sep/2021:12:50:11 +0000] "POST /api/deepdetect/predict HTTP/1.1" 200 125 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1     | [2021-09-28 12:50:15.377] [api] [info] HTTP/1.1 "GET /info" <n/a> 200 0ms
platform_ui_1    | 172.29.229.1 - - [28/Sep/2021:12:50:15 +0000] "GET /api/deepdetect/info? HTTP/1.1" 200 433 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1     | [2021-09-28 12:50:17.751] [api] [info] HTTP/1.1 "POST /predict" simsearch 200 664ms
platform_ui_1    | 172.29.229.1 - - [28/Sep/2021:12:50:17 +0000] "POST /api/deepdetect/predict HTTP/1.1" 200 11545 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1     | open existing index db
deepdetect_1     | [2021-09-28 12:50:19.829] [torchlib] [info] Opened lmdb /opt/platform/models/private/simsearch//names.bin
platform_ui_1    | 2021/09/28 12:50:19 [error] 25#25: *1449 upstream prematurely closed connection while reading response header from upstream, client: 172.29.229.1, server: , request: "POST /api/deepdetect/predict HTTP/1.1", upstream: "http://172.23.0.3:8080/predict", host: "172.29.229.69:1913", referrer: "http://172.29.229.69:1913/"
platform_ui_1    | 172.29.229.1 - - [28/Sep/2021:12:50:19 +0000] "POST /api/deepdetect/predict HTTP/1.1" 502 559 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1     | Segmentation fault

and killed the process

Emmanuel Benazera · Answer 15 · Tue Sep 28 2021 21:03:40 GMT+0800 (China Standard Time)

Hard to tell... what sizes are the db and names.bin files ? Also, you've seem to have gotten the system to work earlier as shown on the UI ?

Mostafa · Answer 16 · Tue Sep 28 2021 21:08:01 GMT+0800 (China Standard Time)

It works ok for the first time. but if I restart the docker, that error is shown after calling search api.

names.bin:

$ ls models/private/simsearch/names.bin/ -lh
total 80K
72K Sep 28 16:13 data.mdb
8.0K Sep 28 16:34 lock.mdb

indexes:

$ ls models/private/simsearch/index* -lh
179K Sep 28 16:13 models/private/simsearch/index.faiss
4.0M Sep 28 16:12 models/private/simsearch/index_mmap.faiss

Emmanuel Benazera · Answer 17 · Tue Sep 28 2021 21:10:53 GMT+0800 (China Standard Time)

By first time you mean it works right after indexing has completed, but not afterwards ?

Mostafa · Answer 18 · Tue Sep 28 2021 21:13:42 GMT+0800 (China Standard Time)

yes, when I restart the docker and create the service again, this error is occured.
after the indexing and building has been finished, everything is ok, I can search as much as I can, but after restarting the docker and creating the service again, It can't read the db correctly and segmentation fault is occured.

Emmanuel Benazera · Answer 19 · Tue Sep 28 2021 21:22:23 GMT+0800 (China Standard Time)

Could you post the exact list of API calls, from index creation to prediction after restart, typically for a single image please ? At minima, this would help reproduce without docker.

Mostafa · Answer 20 · Tue Sep 28 2021 21:51:57 GMT+0800 (China Standard Time)

create the service:

$ mkdir models/private/simsearch
$ cp SE-ResNet-50.caffemodel models/private/simsearch/
$ curl -X PUT 'http://172.29.229.69:1913/api/deepdetect/services/simsearch' -d '{
  "mllib": "caffe",
  "description": "similarity search service",
  "type": "unsupervised",
  "parameters": {
    "input": {
      "connector": "image",
      "width": 224,
      "height": 224
    },
    "mllib": {
      "nclasses":1000,
      "template": "se_resnet_50"
    },
    "output": {
      "store_config": true
    }
  },
  "model": {
    "templates": "../templates/caffe/",
    "repository": "/opt/platform/models/private/simsearch/",
    "create_repository": true
  }
}'
{"status":{"code":201,"msg":"Created"}}

indexing:

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
        "confidence_threshold": 0.3,
        "index": true,
        "index_type": "IVF20,SQ8",
        "train_samples": 500,
        "ondisk": true,
        "nprobe": 10
    },
  "mllib": {"extract_layer": "pool5/7x7_s1"}
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"simsearch","time":523.0},"body":{"predictions":[{"indexed":true,"last":true,"vals":[0.0,1.1430059671401978,0.0,0.0,0.0,0.0,0.13838012516498567,0.02941223978996277,16.24222183227539,0.0,0.0,3.5232527256011965,0.0,0.0,0.25434765219688418,0.0660916417837143,0.0,0.0,0.0,0.044068124145269397,0.0,0.38675710558891299,0.0,0.0706....

build:

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
        "index": false,
        "build_index": true
    },
  "mllib": {"extract_layer": "pool5/7x7_s1"}
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"simsearch","time":534.0},"body":{"predictions":[{"last":true,"vals":[0.0,1.1430059671401978,0.0,0.0,0.0,0.0,0.13838012516498567,0.02941223978996277,16.24222183227539,0.0,0.0,3.5232527256011965,0.0,0.0,0.25434765219688418,0.0660916417837143,0.0,0.0,0.0,0.044068124145269397,0.0,0.38675710558891299,0.0,0.0706....

search (also it doesn't a result)

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
   "search_nn": 10,
   "search": true
  },
  "mllib": {
   "extract_layer": "pool5/7x7_s1"
  }
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"simsearch","time":584.0},"body":{"predictions":[{"last":true,"vals":[0.0,1.1430059671401978,0.0,0.0,0.0,0.0,0.13838012516498567,0.02941223978996277,16.24222183227539,0.0,0.0,3.5232527256011965,0.0,0.0,0.25434765219688418,0.0660916417837143,0.0,0.0,0.0,0.044068124145269397,0.0,0.38675710558891299,0.0,0.0706...

restart the docker:

$ docker-compose restart
Restarting cpu_platform_ui_1   ... done
Restarting cpu_deepdetect_1    ... done
Restarting cpu_filebrowser_1   ... done
Restarting cpu_jupyter_1       ... done
Restarting cpu_platform_data_1 ... done
Restarting cpu_dozzle_1        ... done

create the service again (without template):

$ curl -X PUT 'http://172.29.229.69:1913/api/deepdetect/services/simsearch' -d '{
  "mllib": "caffe",
  "description": "similarity search service",
  "type": "unsupervised",
  "parameters": {
    "input": {
      "connector": "image",
      "width": 224,
      "height": 224
    },
    "mllib": {
      "nclasses":1000
    },
    "output": {
      "store_config": true
    }
  },
  "model": {
    "templates": "../templates/caffe/",
    "repository": "/opt/platform/models/private/simsearch/",
    "create_repository": true
  }
}'

{"status":{"code":201,"msg":"Created"}}

in this case that I train one picture, search is ok, and return nothing, becase

could not train, maybe not enough data to train with selected index type. index likely to be emptyError in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /opt/deepdetect/build/faiss/src/faiss/faiss/Clustering.cpp:276: Error: 'nx >= k' failed: Number of training points (0) should be at least as large as number of clusters (20)

here is the request:

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
   "search_nn": 10,
   "search": true
  },
  "mllib": {
   "extract_layer": "pool5/7x7_s1"
  }
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"simsearch","time":876.0},"body":{"predictions":[{"last":true,"vals":[0.0,1.1430059671401978,0.0,0.0,0.0,0.0,0.13838012516498567,0.02941223978996277,16.24222183227539,0.0,0.0,3.5232527256011965,0.0,0.0,0.25434765219688418,0.0660916417837143,0.0,0.0,0.0,0.044068124145269397,0.0,0.38675710558891299,0.0,0.07067309319972992,0.0,0.007104216609150171,0.0,0.5068966746330261,0.0,0.5041558146476746,0.0,0.0,0.0,0.0,0.199878320097923.....

but if I do just the same steps as described above (in batches of 5), for about 400 picture, I get this error:

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
   "search_nn": 10,
   "search": true
  },
  "mllib": {
   "extract_layer": "pool5/7x7_s1"
  }
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.21.1</center>
</body>
</html>

deepdetect_1 | [2021-09-28 13:50:03.005] [torchlib] [info] Ignoring source layer classifier_classifier_0_split
deepdetect_1 | [2021-09-28 13:50:03.087] [simsearch] [info] Net total flops=3860541312 / total params=28070976
deepdetect_1 | [2021-09-28 13:50:03.087] [simsearch] [info] detected network type is classification
platform_ui_1 | 172.29.229.69 - - [28/Sep/2021:13:50:03 +0000] "PUT /api/deepdetect/services/simsearch HTTP/1.1" 201 39 "-" "curl/7.71.1"
deepdetect_1 | [2021-09-28 13:50:03.088] [api] [info] HTTP/1.1 "PUT /services/simsearch" <n/a> 201 524ms
deepdetect_1 | [2021-09-28 13:50:07.362] [api] [info] HTTP/1.1 "GET /info" <n/a> 200 0ms
platform_ui_1 | 172.29.229.1 - - [28/Sep/2021:13:50:07 +0000] "GET /api/deepdetect/info? HTTP/1.1" 200 433 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1 | [2021-09-28 13:50:07.437] [api] [info] HTTP/1.1 "GET /services/simsearch" <n/a> 200 0ms
platform_ui_1 | 172.29.229.1 - - [28/Sep/2021:13:50:07 +0000] "GET /api/deepdetect/services/simsearch HTTP/1.1" 200 420 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1 | open existing index db
deepdetect_1 | [2021-09-28 13:50:14.307] [torchlib] [info] Opened lmdb /opt/platform/models/private/simsearch//names.bin
platform_ui_1 | 2021/09/28 13:50:14 [error] 24#24: *9 upstream prematurely closed connection while reading response header from upstream, client: 172.29.229.69, server: , request: "POST /api/deepdetect/predict HTTP/1.1", upstream: "http://172.23.0.3:8080/predict", host: "172.29.229.69:1913"
platform_ui_1 | 172.29.229.69 - - [28/Sep/2021:13:50:14 +0000] "POST /api/deepdetect/predict HTTP/1.1" 502 157 "-" "curl/7.71.1"
deepdetect_1 | Segmentation fault

here is my python code for indexing:

# Download dd_client.py from:
# https://github.com/jolibrain/deepdetect/blob/master/clients/python/dd_client.py
import glob
import sys

from dd_client import DD

host = '172.29.229.69'
port = 1913
path = '/api/deepdetect'
dd = DD(host, port, 0, path=path)
dd.set_return_format(dd.RETURN_PYTHON)

parameters_input = {"height": 224, "width": 224}
parameters_mllib = {"extract_layer": "pool5/7x7_s1"}
images = glob.glob(sys.argv[1])
service_name = 'simsearch'
data = []
all_data = []


def classify_build(data_in):
    print(data_in)
    parameters_output = {
        "confidence_threshold": 0.3,
        "index": True,
        "index_type": "IVF20,SQ8",
        "train_samples": 500,
        "ondisk": True,
        "nprobe": 10
    }
    dd.post_predict(service_name, data_in, parameters_input, parameters_mllib, parameters_output)
    parameters_output = {
        "index": False,
        "build_index": True
    }
    dd.post_predict(service_name, data_in, parameters_input, parameters_mllib, parameters_output)


try:
    for image_index, image in enumerate(images):
        print("parsing", image_index, "/", len(images), image, "\n")
        image = image.replace('../img-dataset/images', '/data')
        data.append(image)
        all_data.append(image)
        if image_index % 5 == 0:
            classify_build(data)
            data = []
    if len(data) > 0:
        classify_build(data)
        data = []
except Exception as e:
    print('could not process image index', image_index, 'image', image)
    print(str(e))

Emmanuel Benazera · Answer 21 · Tue Sep 28 2021 22:58:18 GMT+0800 (China Standard Time)

OK, thanks, I'll investigate. However, have you tried the default settings, i.e. not modifying index_type for instance ?

Emmanuel Benazera · Answer 22 · Tue Sep 28 2021 23:04:38 GMT+0800 (China Standard Time)

Also, I believe you may want to try to build the index once after your 400 images have passed. Aside from the current issue, for better results.

Mostafa · Answer 23 · Tue Sep 28 2021 23:25:43 GMT+0800 (China Standard Time)

OK, thanks, I'll investigate. However, have you tried the default settings, i.e. not modifying index_type for instance ?

It was ok with default parameters:

{
    "index": true,
    "ondisk": true,
}