regel / loudml

Loud ML is the first open-source AI solution for ICT and IoT automation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ERROR] docker image 1.6.0 : "pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml"

toni-moreno opened this issue · comments

Helo @regel

After created a model , when running it , this error appeared in the output docker log and no data in the output db has been generated. Any idea on what to do?

loudml_1   | 172.20.0.3 - - [2020-07-28 07:13:32] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002119
loudml_1   | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(linux_metrics_cpu_mean_usage_system_host_myhost_time_5m)') (last run: 2020-07-28 07:12:37, next run: 2020-07-28 07:13:37)
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 172.20.0.3 - - [2020-07-28 07:13:49] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002227
loudml_1   | INFO:schedule:Running job Every 1 minute do daemon_clear_jobs() (last run: 2020-07-28 07:13:04, next run: 2020-07-28 07:14:04)
loudml_1   | 172.20.0.3 - - [2020-07-28 07:14:05] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002722
loudml_1   | 172.20.0.3 - - [2020-07-28 07:14:19] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002560
loudml_1   | 172.20.0.3 - - [2020-07-28 07:14:33] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.001884
loudml_1   | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(linux_metrics_cpu_mean_usage_system_host_myhost_time_5m)') (last run: 2020-07-28 07:13:38, next run: 2020-07-28 07:14:38)

This is the model info.

> version
1.6.0
> list-models
linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
> show-model linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
- settings:
    bucket_interval: 5m
    default_bucket: myhost_linux
    features:
    - default: 0
      field: usage_system
      io: io
      match_all:
      - tag: host
        value: myhost
      measurement: cpu
      metric: mean
      name: mean_usage_system
    grace_period: 0
    interval: 60s
    max_evals: 10
    max_threshold: 0
    min_threshold: 0
    name: linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
    offset: 10s
    run:
      flag_abnormal_data: true
      output_bucket: myhost_loudml
      save_output_data: true
    seasonality:
      daytime: false
      weekday: false
    span: 100
    type: donut
  training:
    job_id: fdb8d872-865d-4cdf-912a-1625a214fc54
    progress:
      eval: 10
      max_evals: 10
    state: done
> list-buckets 
myhost_linux
myhost_loudml
> show-bucket myhost_loudml
- addr: X.X.X.X:8086
  annotation_db: loudml_annotations
  create_database: false
  database: loudml_metrics
  dbuser: loudml_user
  measurement: loudml
  name: myhost_loudml
  retention_policy: autogen
  type: influxdb
  use_ssl: true
  verify_ssl: false

Hello @regel , I've tested again in a new server with loudml:1.6.0 image and also with today loudml:nightly image, in both the error persist

As a help, I've found a 'bypass' (while no need to change image) by installing some basic python packages as root direct inside the image

$ docker exec -it -u 0 7e011d7c0881  bash
root@7e011d7c0881:/#  apt-get update && apt-get install -y python3-pip python3-setuptools python3-dev && apt-get install -y --no-install-recommends build-essential gcc git && apt-get purge -y

no restart needed!!! , suddenly the error log has disappeared and loudml began to write the the output database.

right now

image

Oops. Very good catch. Thanks Toni. Something is odd in the build. I'm patching the Dockerfile.

Solved. Toni, see the above patches and new Dockerfile in develop branch if you need to build a local image.

I will tag a new 1.6 release e/o the month.

Hello @regel , thanks a lot for this fix.
I've build a new image and pushed here if you want to test it. tonimoreno/loudml:1.6.0

but when restarted the service with the new image this error appeared. Can you help me to understand what I did wrong?

Attaching to loudml-poc_loudml_1
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   np_resource = np.dtype([("resource", np.ubyte, 1)])
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@95percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@10m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@1m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@30m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1    | INFO:root:starting Loud ML server on 0.0.0.0:8077
loudml_1    | 192.168.48.3 - - [2020-08-05 05:17:55] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.001249
loudml_1    | 192.168.48.3 - - [2020-08-05 05:18:10] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000694
loudml_1    | 192.168.48.3 - - [2020-08-05 05:18:25] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000983
loudml_1    | 192.168.48.3 - - [2020-08-05 05:18:40] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000804
loudml_1    | INFO:schedule:Running job Every 1 minute do daemon_clear_jobs() (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1    | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)') (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1    | INFO:root:job[0be19343-c409-4db5-af7f-540f4475efee] starting, nice=0
loudml_1    | INFO:root:predict(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m) range=2020-08-05T05:15:00.000Z-2020-08-05T05:20:00.000Z
loudml_1    | XXX lineno: 115, opcode: 0
loudml_1    | ERROR:root:unknown opcode
loudml_1    | Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1    |     res = getattr(self, func_name)(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1    |     **kwargs
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1    |     num_gpus=num_gpus,
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1    |     self.load(num_cpus, num_gpus)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1    |     self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1    |     keras_model = load_model(path, compile=False)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1    |     model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1    |     return deserialize(config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1    |     printable_module_name='layer')
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1    |     list(custom_objects.items())))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1    |     process_node(layer, node_data)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1    |     layer(input_tensors, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1    |     outputs = self.call(inputs, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1    |     return self.function(inputs, **arguments)
loudml_1    |   File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1    |     z_mean, z_log_var = args
loudml_1    | SystemError: unknown opcode
loudml_1    | ERROR:root:job[0be19343-c409-4db5-af7f-540f4475efee] failed: unknown opcode
loudml_1    | [2020-08-05 05:18:54,323] ERROR in app: Exception on /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval [POST]
loudml_1    | pebble.common.RemoteTraceback: Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/pebble/common.py", line 174, in process_execute
loudml_1    |     return function(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 351, in run
loudml_1    |     return g_worker.run(job_id, nice, func_name, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 58, in run
loudml_1    |     raise exn
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1    |     res = getattr(self, func_name)(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1    |     **kwargs
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1    |     num_gpus=num_gpus,
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1    |     self.load(num_cpus, num_gpus)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1    |     self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1    |     keras_model = load_model(path, compile=False)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1    |     model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1    |     return deserialize(config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1    |     printable_module_name='layer')
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1    |     list(custom_objects.items())))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1    |     process_node(layer, node_data)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1    |     layer(input_tensors, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1    |     outputs = self.call(inputs, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1    |     return self.function(inputs, **arguments)
loudml_1    |   File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1    |     z_mean, z_log_var = args
loudml_1    | SystemError: unknown opcode
loudml_1    | 
loudml_1    | 
loudml_1    | The above exception was the direct cause of the following exception:
loudml_1    | 
loudml_1    | Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
loudml_1    |     response = self.full_dispatch_request()
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
loudml_1    |     rv = self.handle_user_exception(e)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask_restful/__init__.py", line 269, in error_router
loudml_1    |     return original_handler(e)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
loudml_1    |     reraise(exc_type, exc_value, tb)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
loudml_1    |     raise value
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
loudml_1    |     rv = self.dispatch_request()
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
loudml_1    |     return self.view_functions[rule.endpoint](**req.view_args)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 1602, in model_eval
loudml_1    |     return jsonify(job.result())
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 393, in result
loudml_1    |     return self._future.result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
loudml_1    |     return self.__get_result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1    |     raise self._exception
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 372, in _done_cb
loudml_1    |     self._result = self._future.result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
loudml_1    |     return self.__get_result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1    |     raise self._exception
loudml_1    | SystemError: unknown opcode
loudml_1    | ERROR:loudml.server:Exception on /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval [POST]
loudml_1    | pebble.common.RemoteTraceback: Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/pebble/common.py", line 174, in process_execute
loudml_1    |     return function(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 351, in run
loudml_1    |     return g_worker.run(job_id, nice, func_name, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 58, in run
loudml_1    |     raise exn
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1    |     res = getattr(self, func_name)(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1    |     **kwargs
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1    |     num_gpus=num_gpus,
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1    |     self.load(num_cpus, num_gpus)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1    |     self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1    |     keras_model = load_model(path, compile=False)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1    |     model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1    |     return deserialize(config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1    |     printable_module_name='layer')
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1    |     list(custom_objects.items())))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1    |     process_node(layer, node_data)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1    |     layer(input_tensors, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1    |     outputs = self.call(inputs, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1    |     return self.function(inputs, **arguments)
loudml_1    |   File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1    |     z_mean, z_log_var = args
loudml_1    | SystemError: unknown opcode
loudml_1    | 
loudml_1    | 
loudml_1    | The above exception was the direct cause of the following exception:
loudml_1    | 
loudml_1    | Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
loudml_1    |     response = self.full_dispatch_request()
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
loudml_1    |     rv = self.handle_user_exception(e)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask_restful/__init__.py", line 269, in error_router
loudml_1    |     return original_handler(e)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
loudml_1    |     reraise(exc_type, exc_value, tb)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
loudml_1    |     raise value
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
loudml_1    |     rv = self.dispatch_request()
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
loudml_1    |     return self.view_functions[rule.endpoint](**req.view_args)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 1602, in model_eval
loudml_1    |     return jsonify(job.result())
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 393, in result
loudml_1    |     return self._future.result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
loudml_1    |     return self.__get_result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1    |     raise self._exception
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 372, in _done_cb
loudml_1    |     self._result = self._future.result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
loudml_1    |     return self.__get_result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1    |     raise self._exception
loudml_1    | SystemError: unknown opcode
loudml_1    | 127.0.0.1 - - [2020-08-05 05:18:54] "POST /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval?output_bucket=test-loudml&flag_abnormal_data=True&save_output_data=True&from=1596604664&to=1596604724 HTTP/1.1" 500 156 0.260177
loudml_1    | ERROR:root:error executing scheduled job '_eval(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)':INTERNAL SERVER ERROR
loudml_1    | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)') (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1    | INFO:root:job[4cb5ec63-e3f0-475d-8075-bbdc3bb38264] starting, nice=0
loudml_1    | INFO:root:predict(swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m) range=2020-08-05T05:15:00.000Z-2020-08-05T05:20:00.000Z
loudml_1    | XXX lineno: 115, opcode: 0
loudml_1    | ERROR:root:unknown opcode

Hi Toni. Interesting finding. I upgraded the Python version to 3.7. The Python serialisation format is probably different in this version causing ‘load_model’ to fail.

What if you delete model state and re-train the model? Solves the issue?

commented

Same here!

Using @toni fix, seems to works 👍

$ docker exec -it -u 0 7e011d7c0881 bash
root@7e011d7c0881:/# apt-get update && apt-get install -y python3-pip python3-setuptools python3-dev && apt-get install -y --no-install-recommends build-essential gcc git && apt-get purge -y

Hi- I was wondering if this ever got resolved and included in the final release? If I use "FROM loudml/loudml:1.6.0 in my dockerfile I still get this error.

I had to create my own docker image like this:

Dockerfile

FROM loudml/loudml:latest

# SHELL ["/bin/bash", "-o", "pipefail", "-c"]
USER 0

# https://github.com/regel/loudml/issues/370
RUN apt-get update && \
    apt-get install -y \
    python3-pip python3-setuptools \
    python3-dev && \ 
    apt-get install -y --no-install-recommends \
    build-essential gcc git &&\
    apt-get purge -y


ENTRYPOINT ["loudmld"]

Note USER 0 is needed because for some reason base image uses uid 1001 that doesn't have permission to install deps

Then in docker-compose:

# image: loudml/loudml:1.6.0
    build: .
    container_name: loudml