LinkedDataFragments / Client.js

[DEPRECATED] A JavaScript client for Triple Pattern Fragments interfaces.

Home Page:http://linkeddatafragments.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Socket hang up

migalkin opened this issue · comments

Still doing experiments on Fedbench on top of LDF HDT.
The datasets are taken from the LDF website.

Now we observe sporadic crashes of the LDF client with a strange error 'Socket hang up'.
For example, Linkedmdb endpoint:

[Mon Dec 12 2016 19:25:47 GMT+0000 (UTC)] INFO HttpClient Requesting http://linkedmdb-ldh:3000/linkedmdb?subject=http%3A%2F%2Fdata.linkedmdb.org%2Fresource%2Ffilm%2F15996&predicate=http%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23sameAs
[Mon Dec 12 2016 19:25:48 GMT+0000 (UTC)] DEBUG TriplePatternIterator 1823276 155 {"?x":"http://data.linkedmdb.org/resource/film/22246","?genre":"http://data.linkedmdb.org/resource/film_genre/23"} ?x sameAs ?film. 0
[Mon Dec 12 2016 19:25:48 GMT+0000 (UTC)] DEBUG TriplePatternIterator 1823277 199 {"?x":"http://data.linkedmdb.org/resource/film/14817","?genre":"http://data.linkedmdb.org/resource/film_genre/9"} ?x sameAs ?film. 0
[Mon Dec 12 2016 19:25:48 GMT+0000 (UTC)] DEBUG TriplePatternIterator 1823278 165 {"?x":"http://data.linkedmdb.org/resource/film/20875","?genre":"http://data.linkedmdb.org/resource/film_genre/23"} ?x sameAs ?film. 2
[Mon Dec 12 2016 19:25:48 GMT+0000 (UTC)] DEBUG TriplePatternIterator 1823279 161 {"?x":"http://data.linkedmdb.org/resource/film/20927","?genre":"http://data.linkedmdb.org/resource/film_genre/4"} ?x sameAs ?film. 2
[Mon Dec 12 2016 19:25:48 GMT+0000 (UTC)] DEBUG TriplePatternIterator 1823280 161 {"?x":"http://data.linkedmdb.org/resource/film/20927","?genre":"http://data.linkedmdb.org/resource/film_genre/9"} ?x sameAs ?film. 2
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: socket hang up
    at createHangUpError (_http_client.js:253:15)
    at Socket.socketOnEnd (_http_client.js:345:23)
    at emitNone (events.js:91:20)
    at Socket.emit (events.js:185:7)
    at endReadableNT (_stream_readable.js:974:12)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

Then, the unified all-in-one Fedbench:

[Tue Dec 13 2016 00:05:33 GMT+0000 (UTC)] DEBUG TriplePatternIterator 4116436 2650228 {"?drug1":"http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB00598","?o":"http://www4.wiwiss.fu-berlin.de/drugbank/resource/targets/556","?drug":"http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01162","?gr":"http://www4.wiwiss.fu-berlin.de/drugbank/resource/references/8396931","?ps":"\">Alpha-1A adrenergic receptor\nMVFLSGNASDSSNCTQPPAPVNISKAILLGVILGGLILFGVLGNILVILSVACHRHLHSV\nTHYYIVNLAVADLLLTSTVLPFSAIFEVLGYWAFGRVFCNIWAAVDVLCCTASIMGLCII\nSIDRYIGVSYPLRYPTIVTQRRGLMALLCVWALSLVISIGPLFGWRQPAPEDETICQINE\nEPGYVLFSALGSFYLPLAIILVMYCRVYVVAKRESRGLKSGLKTDKSDSEQVTLRIHRKN\nAPAGGSGMASAKTKTHFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPD\nFKPSETVFKIVFWLGYLNSCINPIIYPCSSQEFKKAFQNVLRIQCLCRKQSSKHALGYTL\nHPPSQAVEGQHKDMVRIPVGSRETFYRISKTDGVCEWKFFSSMPRGSARITVSKDQSSCT\nTARVRSKSFLQVCCCVGPSTPSLDKNHQVPTIKVHTISLSENGEEV\"","?sn":"\"ADA1A_HUMAN\"","?hp":"\"00081\"","?mw":"\"51487\"","?l":"\"8p21-p11.2\"","?g":"\"D25235\"","?drug5":"http://www4.wiwiss.fu-berlin.de/sider/resource/drugs/5401"} ?drug5 type Drug. 0
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: socket hang up
    at createHangUpError (_http_client.js:253:15)
    at Socket.socketCloseListener (_http_client.js:285:23)
    at emitOne (events.js:101:20)
    at Socket.emit (events.js:188:7)
    at TCP._handle.close [as _onclose] (net.js:501:12)

What might be the source of the error?

That happens when the server becomes overloaded. In order to avoid this, I recommend:

  • starting the server with sufficient threads (ldf-server.config.json 3000 4 to start the server on port 3000 with 4 threads)
  • putting an HTTP cache in front of the server

TPF has been designed with caching in mind, so a caching server is a must for a good comparison.

I'm getting the same error

events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: socket hang up
    at createHangUpError (_http_client.js:331:15)
    at Socket.socketCloseListener (_http_client.js:363:23)
    at emitOne (events.js:121:20)
    at Socket.emit (events.js:211:7)
    at TCP._handle.close [as _onclose] (net.js:554:12)

Server is running on port 3000 with 4 threads on a machine with 4 CPU and 16 GB memory. Caching with Apache is enabled (and working).

This happens always after a long client runtime of 2h or so. I'm always inspecting memory and CPU usage but I cannot see any overloading (I have only one machine with server, client and other processes running parallel). But looking at the logging output it seems that over time requests are made slower and slower.

Do you see any evidence of a high number of open connections?

I guess

> netstat -s
[...]
Tcp:
    217278 active connections openings
    189717 passive connection openings
    649 failed connection attempts
    1114 connection resets received
    28 connections established
    273149138 segments received
    274075086 segments send out
    396620 segments retransmited
    0 bad segments received.
    5671 resets sent
[...]

The number of active/passive connections openings are increasing while client is working.

^ that's it. This right there is the main issue for this problem. Now what I want to figure out is whether these connections are between Apache and the LDF server, or the client and Apache. Any insight there?

How can I check this?

Is your client running on a different machine? If so, keep an eye on the connections of that machine.

Otherwise, I wonder whether Apache can give you stats about this. A fullnetstat view should also show the from and to of the connections.

Client is running on the same machine. Is this helpful?


Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name Timer
tcp        0      0 10.44.3.57:22           10.96.15.110:50120      VERBUNDEN   10438/sshd: prod [p keepalive (6194,05/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:57748      VERBUNDEN   13048/sshd: prod [p keepalive (1000,32/0/0)
tcp        1      0 10.44.3.57:42800        10.44.3.57:3000         CLOSE_WAIT  28575/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55014        TIME_WAIT   -                timewait (17,10/0/0)
tcp        1      0 10.44.3.57:42806        10.44.3.57:3000         CLOSE_WAIT  28572/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:42824        10.44.3.57:3000         VERBUNDEN   28567/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55012        TIME_WAIT   -                timewait (12,81/0/0)
tcp        0     36 10.44.3.57:22           10.96.15.110:60343      VERBUNDEN   14345/sshd: prod [p ein (0,31/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:51483      VERBUNDEN   10939/sshd: prod [p keepalive (2179,97/0/0)
tcp        0      0 10.44.3.57:42822        10.44.3.57:3000         VERBUNDEN   28581/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55045        VERBUNDEN   28567/apache2    keepalive (7193,47/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55033        TIME_WAIT   -                timewait (39,21/0/0)
tcp        0      0 10.44.3.57:55039        10.44.3.57:80           VERBUNDEN   29662/node       keepalive (0,40/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55042        VERBUNDEN   29690/apache2    keepalive (7193,47/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:60344      VERBUNDEN   14348/sshd: prod [p keepalive (246,66/0/0)
tcp        1      0 10.44.3.57:42811        10.44.3.57:3000         CLOSE_WAIT  28566/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55023        TIME_WAIT   -                timewait (24,08/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:51479      VERBUNDEN   10937/sshd: prod [p keepalive (2573,18/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:60014      VERBUNDEN   14042/sshd: prod [p keepalive (6570,88/0/0)
tcp        1      0 10.44.3.57:42812        10.44.3.57:3000         CLOSE_WAIT  29693/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55036        TIME_WAIT   -                timewait (44,60/0/0)
tcp        0      0 10.44.3.57:42818        10.44.3.57:3000         VERBUNDEN   28199/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:42814        10.44.3.57:3000         VERBUNDEN   29683/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:55045        10.44.3.57:80           VERBUNDEN   29662/node       keepalive (0,52/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55024        TIME_WAIT   -                timewait (24,25/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55020        TIME_WAIT   -                timewait (19,50/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55031        TIME_WAIT   -                timewait (38,73/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55035        TIME_WAIT   -                timewait (46,38/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:57777      VERBUNDEN   13179/sshd: prod [p keepalive (1262,46/0/0)
tcp        0      0 10.44.3.57:55047        10.44.3.57:80           VERBUNDEN   29662/node       keepalive (0,53/0/0)
tcp        1      0 10.44.3.57:42801        10.44.3.57:3000         CLOSE_WAIT  29694/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55041        VERBUNDEN   28199/apache2    keepalive (7193,47/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55047        VERBUNDEN   28581/apache2    keepalive (7193,47/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55039        VERBUNDEN   29683/apache2    keepalive (7193,47/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:57750      VERBUNDEN   13050/sshd: prod [p keepalive (836,48/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:60015      VERBUNDEN   14045/sshd: prod [p keepalive (6194,05/0/0)
tcp        1      0 10.44.3.57:42795        10.44.3.57:3000         CLOSE_WAIT  28408/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:55041        10.44.3.57:80           VERBUNDEN   29662/node       keepalive (0,53/0/0)
tcp        1      0 10.44.3.57:42803        10.44.3.57:3000         CLOSE_WAIT  29684/apache2    aus (0.00/0/0)
tcp        1      0 10.44.3.57:42808        10.44.3.57:3000         CLOSE_WAIT  28414/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:55042        10.44.3.57:80           VERBUNDEN   29662/node       keepalive (0,40/0/0)
tcp        0      0 10.44.3.57:42823        10.44.3.57:3000         VERBUNDEN   29690/apache2    aus (0.00/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:57778      VERBUNDEN   13181/sshd: prod [p keepalive (803,71/0/0)
tcp        0      0 10.44.3.57:80           10.44.3.57:55028        TIME_WAIT   -                timewait (34,12/0/0)
tcp        0      0 10.44.3.57:22           10.96.15.110:50116      VERBUNDEN   10435/sshd: prod [p keepalive (6701,95/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42791        TIME_WAIT   -                timewait (40,27/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42806        FIN_WAIT2   -                timewait (43,35/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42808        FIN_WAIT2   -                timewait (43,70/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42801        FIN_WAIT2   -                timewait (27,60/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42790        TIME_WAIT   -                timewait (37,34/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42814        VERBUNDEN   19130/node       aus (0.00/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42822        VERBUNDEN   19130/node       aus (0.00/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42812        FIN_WAIT2   -                timewait (48,03/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42773        TIME_WAIT   -                timewait (24,77/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42811        FIN_WAIT2   -                timewait (48,71/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42784        TIME_WAIT   -                timewait (24,50/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42824        VERBUNDEN   19136/node       aus (0.00/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42800        FIN_WAIT2   -                timewait (28,83/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42796        TIME_WAIT   -                timewait (45,80/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42774        TIME_WAIT   -                timewait (17,44/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42770        TIME_WAIT   -                timewait (20,61/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42817        TIME_WAIT   -                timewait (57,22/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42795        FIN_WAIT2   -                timewait (23,20/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42820        TIME_WAIT   -                timewait (57,88/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42818        VERBUNDEN   19136/node       aus (0.00/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42803        FIN_WAIT2   -                timewait (38,42/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42799        TIME_WAIT   -                timewait (12,54/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42823        VERBUNDEN   19134/node       aus (0.00/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42792        TIME_WAIT   -                timewait (15,49/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42771        TIME_WAIT   -                timewait (14,75/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42789        TIME_WAIT   -                timewait (43,67/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42762        TIME_WAIT   -                timewait (12,85/0/0)
tcp6       0      0 10.44.3.57:3000         10.44.3.57:42804        TIME_WAIT   -                timewait (47,92/0/0)


Thanks! I see lots of connections between Apache and the LDF server. Apache should recycle connections, I wonder why that's not happening, and I wonder whether the problem is on the Apache or the LDF server side.

I wonder whether the client is to blame, i.e., whether the same thing would also occur if the identical series of requests was made through curl. We need to run some tests on this.

Looks like caching was my problem. After I disabled caching the error does not occur anymore. While filling the cache with a hundred thousands of requests, htcacheclean wasn't able to keep memory/disk space at a certain level (maybe because of a to high value for cache expiration):

[Tue Mar 06 10:26:20.790165 2018] [cache_disk:warn] [pid 28879] (28)No space left on device: [client 10.44.3.57:40287] AH00721: could not create vary file /srv/zdb/cache/aptmpQy5pny

This could mean that I was running out of inodes, which I suppose has something to do with CacheDirLength and CacheDirLevels.

Interesting. Thanks for sharing.

This project has now been deprecated in favor of Comunica, where this should not be a problem anymore. If it is, feel free to open a new issue there.