shinken-monitoring / mod-retention-mongodb

Shinken module for saving retention data from schedulers to a mongodb cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exception in module

mohierf opened this issue · comments

When restarting scheduler :

2014-03-14 08:58:09,462 [1394783889] Error : The instance MongodbRetention raise an exception [Errno 4] Interrupted system call. I disable it and set it to restart it later
2014-03-14 08:58:09,463 [1394783889] Error : Exception trace follows: Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/shinken/scheduler.py", line 368, in hook_point
 f(self)
 File "/var/lib/shinken/modules/retention-mongodb/module.py", line 138, in hook_save_retention
 self.services_fs.delete(key)
 File "/usr/lib/python2.7/dist-packages/gridfs/__init__.py", line 232, in delete
 self.__files.remove({"_id": file_id}, safe=True)
 File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 481, in remove
 safe, kwargs, self.__uuid_subtype), safe)
 File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 748, in _send_message
 raise AutoReconnect(str(e))
AutoReconnect: [Errno 4] Interrupted system call

any logs in the mongodb server side?

On Fri, Mar 14, 2014 at 9:48 AM, Frédéric MOHIER
notifications@github.comwrote:

When restarting scheduler :

2014-03-14 08:58:09,462 [1394783889] Error : The instance MongodbRetention raise an exception [Errno 4] Interrupted system call. I disable it and set it to restart it later
2014-03-14 08:58:09,463 [1394783889] Error : Exception trace follows: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/shinken/scheduler.py", line 368, in hook_point
f(self)
File "/var/lib/shinken/modules/retention-mongodb/module.py", line 138, in hook_save_retention
self.services_fs.delete(key)
File "/usr/lib/python2.7/dist-packages/gridfs/init.py", line 232, in delete
self.__files.remove({"_id": file_id}, safe=True)
File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 481, in remove
safe, kwargs, self.__uuid_subtype), safe)
File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 748, in _send_message
raise AutoReconnect(str(e))
AutoReconnect: [Errno 4] Interrupted system call

Reply to this email directly or view it on GitHubhttps://github.com//issues/3
.

Nothing ... except connections ending :

Fri Mar 14 09:53:40 [conn38] end connection 127.0.0.1:50728
Fri Mar 14 09:53:50 [conn40] end connection 127.0.0.1:50745
Fri Mar 14 09:53:53 [conn39] end connection 127.0.0.1:50735

All other Mongo modules (logstore, webui) do not have any problems ...

Is there something specific to install for using GridFS in MongoDB ?

nop, but i already see lately with some mongodb. I don't know currently how
to catch and manage this, maybe there is a parameter to automatically
enable the reconnect

On Fri, Mar 14, 2014 at 9:55 AM, Frédéric MOHIER
notifications@github.comwrote:

Nothing ... except connections ending :

Fri Mar 14 09:53:40 [conn38] end connection 127.0.0.1:50728
Fri Mar 14 09:53:50 [conn40] end connection 127.0.0.1:50745
Fri Mar 14 09:53:53 [conn39] end connection 127.0.0.1:50735

All other Mongo modules (logstore, webui) do not have any problems ...

Is there something specific to install for using GridFS in MongoDB ?

Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-37627257
.

I googled for this problem and I suppose it is because the scheduler is killed before finishing to store data in the DB ... what you think about it ?

For the moment, I am trying to set a Picjkle retention ... my intent is to make my downtimes persistent ;-)

it only got a QUIT signal and it's up to it to save and quit.

Do you have save=false or fsync=false on your mongodb uri?

On Fri, Mar 14, 2014 at 10:14 AM, Frédéric MOHIER
notifications@github.comwrote:

I googled for this problem and I suppose it is because the scheduler is
killed before finishing to store data in the DB ... what you think about it
?

For the moment, I am trying to set a Picjkle retention ... my intent is to
make my downtimes persistent ;-)

Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-37628449
.

My configuration is default from shinken.io :

    module_name     MongodbRetention
    module_type     mongodb_retention
    uri             mongodb://localhost/?safe=true
    database        shinken

put save=false :)

will avoid the fsync that can be long (and useless)

On Fri, Mar 14, 2014 at 10:20 AM, Frédéric MOHIER
notifications@github.comwrote:

My configuration is default from shinken.io :

module_name     MongodbRetention
module_type     mongodb_retention
uri             mongodb://localhost/?safe=true
database        shinken

Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-37628839
.

You mean safe=false and not save ...

I made a test and it does not change anything :(

arg :(

can you try to restart your mongodb server?

On Fri, Mar 14, 2014 at 10:45 AM, Frédéric MOHIER
notifications@github.comwrote:

You mean safe=false and not save ...

I made a test and it does not change anything :(

Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-37630687
.

Not better !

Hum.. I don't know :(

can you tail in debug mode and see it crash live to see the time and the
signal send?

On Fri, Mar 14, 2014 at 10:54 AM, Frédéric MOHIER
notifications@github.comwrote:

Not better !

Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-37631445
.

I restarted scheduler in debug mode and then stopped ... what I get is :

2014-03-14 18:14:42,711 [1394817282] Debug : Nb Broks send: 1
2014-03-14 18:14:42,711 [1394817282] Debug : Check average = 2 checks/s
2014-03-14 18:14:43,283 [1394817283] Debug : HTTP: calling lock for get_checks
2014-03-14 18:14:43,284 [1394817283] Debug : HTTP: calling lock for get_broks
2014-03-14 18:14:43,324 [1394817283] Debug : HTTP: calling lock for put_results
2014-03-14 18:14:43,618 [1394817283] Debug : HTTP: calling lock for get_checks
2014-03-14 18:14:43,656 [1394817283] Debug : HTTP: calling lock for put_results
2014-03-14 18:14:43,712 [1394817283] Debug : Load: (sleep) 1.00 (average: 1.00) -> 0%
2014-03-14 18:14:43,713 [1394817283] Debug : hook_point: MongodbRetention: False get_new_actions
2014-03-14 18:14:43,713 [1394817283] Debug : Time to send 0 broks (after 0 secs)
2014-03-14 18:14:43,713 [1394817283] Debug : Checks: total 29, scheduled 29, inpoller 0, zombies 0, notifications 13
2014-03-14 18:14:43,714 [1394817283] Debug : Latency (avg/min/max): 1.10/0.00/2.38
2014-03-14 18:14:43,714 [1394817283] Debug : Nb Broks send: 1
2014-03-14 18:14:43,714 [1394817283] Debug : Check average = 2 checks/s
2014-03-14 18:14:44,327 [1394817284] Debug : HTTP: calling lock for get_checks
2014-03-14 18:14:44,328 [1394817284] Debug : HTTP: calling lock for get_broks
2014-03-14 18:14:44,369 [1394817284] Debug : HTTP: calling lock for put_results
2014-03-14 18:14:44,678 [1394817284] Debug : HTTP: calling lock for get_checks
2014-03-14 18:14:44,714 [1394817284] Warning : Received a SIGNAL 15
2014-03-14 18:14:44,714 [1394817284] Debug : I'm process 5015 and I received signal 15
2014-03-14 18:14:44,714 [1394817284] Debug : Load: (sleep) 1.00 (average: 1.00) -> 0%
2014-03-14 18:14:44,714 [1394817284] Debug : hook_point: MongodbRetention: False get_new_actions
2014-03-14 18:14:44,714 [1394817284] Debug : Time to send 0 broks (after 0 secs)
2014-03-14 18:14:44,714 [1394817284] Debug : Checks: total 29, scheduled 29, inpoller 0, zombies 0, notifications 13
2014-03-14 18:14:44,715 [1394817284] Debug : Latency (avg/min/max): 1.10/0.00/2.38
2014-03-14 18:14:44,715 [1394817284] Debug : Check average = 2 checks/s
2014-03-14 18:14:44,715 [1394817284] Debug : hook_point: MongodbRetention: True save_retention
2014-03-14 18:14:44,715 [1394817284] Debug : [MongodbRetention] asking me to update the retention objects
2014-03-14 18:14:44,716 [1394817284] Debug : HTTP: calling lock for put_results
2014-03-14 18:14:45,372 [1394817285] Debug : HTTP: calling lock for get_broks
2014-03-14 18:14:45,373 [1394817285] Debug : HTTP: calling lock for get_checks
2014-03-14 18:14:48,295 [1394817288] Warning : Received a SIGNAL 15
2014-03-14 18:14:48,295 [1394817288] Debug : I'm process 5015 and I received signal 15
2014-03-14 18:14:48,295 [1394817288] Error : The instance MongodbRetention raise an exception [Errno 4] Interrupted system call. I disable it and set it to restart it later
2014-03-14 18:14:48,296 [1394817288] Error : Exception trace follows: Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/shinken/scheduler.py", line 368, in hook_point
 f(self)
 File "/var/lib/shinken/modules/retention-mongodb/module.py", line 139, in hook_save_retention
 self.services_fs.delete(key)
 File "/usr/lib/python2.7/dist-packages/gridfs/__init__.py", line 232, in delete
 self.__files.remove({"_id": file_id}, safe=True)
 File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 481, in remove
 safe, kwargs, self.__uuid_subtype), safe)
 File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 748, in _send_message
 raise AutoReconnect(str(e))
AutoReconnect: [Errno 4] Interrupted system call
2014-03-14 18:14:48,296 [1394817288] Debug : Unlinking /var/run/shinken/schedulerd.pid
2014-03-14 18:14:48,296 [1394817288] Info : [scheduler] Stopping all network connections
2014-03-14 18:14:48,296 [1394817288] Warning : Received a SIGNAL 15
2014-03-14 18:14:48,297 [1394817288] Debug : I'm process 5018 and I received signal 15
2014-03-14 18:14:48,298 [1394817288] Debug : HTTP: calling lock for put_results
2014-03-14 18:14:48,706 [1394817288] Debug : HTTP: calling lock for get_checks
2014-03-14 18:14:48,744 [1394817288] Debug : HTTP: calling lock for put_results
2014-03-14 18:14:49,301 [1394817289] Debug : HTTP: calling lock for get_checks

Hello @mohierf ,
Do you still have this "old" issue ?
Want me to investigate ?