jmcvetta / neoism

Neo4j client for Golang

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for streaming in Neo4j 2.0?

bunkat opened this issue · comments

commented

Do you have plans to support the new streaming interface for the REST API?

http://docs.neo4j.org/chunked/snapshot/rest-api-streaming.html

No current plans, but it would be nice to have. Any thoughts on the best way to implement?

If you have a client able to fund development, I currently do have bandwidth to work on it. But it's not a personal priority at the moment.

commented

Looks like there isn't much magic to it.

req, err := http.NewRequest("POST", url, bytes.NewReader(params))
req.Header.Add("content-type", "application/json")
req.Header.Add("accept", "application/json")
req.Header.Add("X-Stream", "true") // this is the new header to add

resp, err := db.client.Do(req)
defer resp.Body.Close()

body, err := ioutil.ReadAll(resp.Body)

Basically this sets the encoding to chunked which the Go http client already handles. In my limited testing, the CPU usage was much lower and returning 75,000 nodes saw about a 25% speedup. Returning a small amount of nodes is a tiny bit slower though.

Since I only use the Cypher interface, I wrote a basic client using net/http directly to test out the streaming stuff. I did a quick benchmark and using Neoism is about 300% slower for the larger queries I was running (25% due to not streaming) and about 35% slower on small queries. I didn't look too much into it but I'm guessing it must be the extra json marshaling.

It would probably help if the JSON parser were "streaming" capable also, but I think the results actually need to be in a different format to be able to take real advantage of that, maybe. (the ability to marshall each data record as they come in, and call a callback, or something)

commented

Agree, obviously ioutil.ReadAll is waiting for the last chunk of data to be received before continuing which then gets parsed. To actually get end to end streaming I think you would need to write a Neo4j extension to send the data over the wire in a different format (such that each chunk represents a valid JSON document). However, even with the limitations, chunking still definitely helps since the data is not getting cued up on both the server and client.

Making a cypher extension that does that actually doesn't seem like a hard problem... hmm. brb.

Most likely it is in fact mostly the JSON parsing that is making neoism slower. The way neoism handles unmarshaling is not particularly efficient - definitely room for improvment. However for insert queries you can skip the JSON handling entirely by leaving CypherQuery.Result nil.

Currently there is no way to tell restclient to use a default set of headers. So adding streaming support would require a bunch of not-too-pretty copypasta. Alternatively, I've been thinking about a partial rewrite of restclient (probably under a less generic name) to make it more closely resemble Python's requests library, which does allow setting default headers.

New underlying HTTP client napping allows streaming support to be set globally.

h := http.Header{}
h.Add("X-Stream", "true")
db, _ := neoism.Connect("http://localhost:7474/db/data")
db.Session.Header = &h

In Neo4j 2.0.0-M03 this causes some behavior changes in the transactional endpoint, inducing test failure. Not sure how this will effect 2.0.0-M05. Will investigate that once #26 is resolved.

M03 may have a streaming bug--there was one in 1.9.2 as well. Check to see if it is fixed in M05 before troubleshooting, I think.

Also you should add:

h.Add("User-Agent", "neoism")

Added user agent by default in #29. So now we want to add the streaming header to the existing Session.Header rather than creating a new header.

db, _ := neoism.Connect("http://localhost:7474/db/data")
db.Session.Header.Add("X-Stream", "true")

Batched Cypher operations return a different data format when streaming is set: http://docs.neo4j.org/chunked/2.0.0-M05/rest-api-batch-ops.html#rest-api-execute-multiple-operations-in-batch-streaming

It is awkward to handle two different result formats. Is there a use case where it is desirable to execute batch operations without streaming?

I would say there's no use case for not streaming. I'd be curious to know how much faster streaming is.

commented

I did a few quick tests before and found that streaming is slightly (1 - 2% on my configuration) slower if you are returning a very small number of results. However, the speedup when returning larger sets and overall reduced CPU is always worth it in my opinion so I've turned it on for all of my queries.

Made some improvements to the node chain tx benchmark, so it returns more data than before. Not seeing significant difference with streaming.

WITHOUT streaming:

$ time for i in {1..3}; do go test -run X -bench BenchmarkNodeChainTx; done
PASS
BenchmarkNodeChainTx10___         20      85166006 ns/op
BenchmarkNodeChainTx100__        100     132423557 ns/op
BenchmarkNodeChainTx1000_         20     283961639 ns/op
ok      github.com/jmcvetta/neoism  55.361s
PASS
BenchmarkNodeChainTx10___         20      67867174 ns/op
BenchmarkNodeChainTx100__         10     104888374 ns/op
BenchmarkNodeChainTx1000_         10     286983636 ns/op
ok      github.com/jmcvetta/neoism  40.136s
PASS
BenchmarkNodeChainTx10___         20      72218400 ns/op
BenchmarkNodeChainTx100__         10     111031788 ns/op
BenchmarkNodeChainTx1000_          5     292345714 ns/op
ok      github.com/jmcvetta/neoism  43.500s

real    2m21.665s
user    0m5.912s
sys 0m2.492s

With streaming:

$ time for i in {1..3}; do go test -run X -bench BenchmarkNodeChainTx; done
PASS
BenchmarkNodeChainTx10___         20      81881587 ns/op
BenchmarkNodeChainTx100__        100     137668196 ns/op
BenchmarkNodeChainTx1000_         20     278935367 ns/op
ok      github.com/jmcvetta/neoism  55.178s
PASS
BenchmarkNodeChainTx10___         20      66432519 ns/op
BenchmarkNodeChainTx100__         10     102221907 ns/op
BenchmarkNodeChainTx1000_         10     289877318 ns/op
ok      github.com/jmcvetta/neoism  39.023s
PASS
BenchmarkNodeChainTx10___         20      72935918 ns/op
BenchmarkNodeChainTx100__         10     129005775 ns/op
BenchmarkNodeChainTx1000_          5     287092883 ns/op
ok      github.com/jmcvetta/neoism  42.169s

real    2m19.102s
user    0m5.996s
sys 0m2.436s

Each benchmarks was run with a freshly-created Neo4j database.

Fixed a bug in previous benchmark code. New results are similarly inconclusive.

WITHOUT streaming:

$ time for i in {1..3}; do go test -run X -bench BenchmarkNodeChainTx; done
PASS
BenchmarkNodeChainTx10____        20      97192014 ns/op
BenchmarkNodeChainTx100___         5     291689561 ns/op
BenchmarkNodeChainTx1000__         1    1506736513 ns/op
BenchmarkNodeChainTx10000_         1    6027240627 ns/op
ok      github.com/jmcvetta/neoism  17.805s
PASS
BenchmarkNodeChainTx10____        20      77949985 ns/op
BenchmarkNodeChainTx100___        10     106571116 ns/op
BenchmarkNodeChainTx1000__         5     373490755 ns/op
BenchmarkNodeChainTx10000_         1    3125825039 ns/op
ok      github.com/jmcvetta/neoism  28.138s
PASS
BenchmarkNodeChainTx10____        20      85015881 ns/op
BenchmarkNodeChainTx100___        10     126663391 ns/op
BenchmarkNodeChainTx1000__         5     378486294 ns/op
BenchmarkNodeChainTx10000_         1    3221352095 ns/op
ok      github.com/jmcvetta/neoism  48.711s

real    1m37.415s
user    0m5.804s
sys 0m2.660s

With streaming:

$ time for i in {1..3}; do go test -run X -bench BenchmarkNodeChainTx; done
PASS
BenchmarkNodeChainTx10____        20     102316285 ns/op
BenchmarkNodeChainTx100___         5     261003375 ns/op
BenchmarkNodeChainTx1000__         1    1597275030 ns/op
BenchmarkNodeChainTx10000_         1    5835127311 ns/op
ok      github.com/jmcvetta/neoism  17.763s
PASS
BenchmarkNodeChainTx10____        10     124557823 ns/op
BenchmarkNodeChainTx100___        10     153990297 ns/op
BenchmarkNodeChainTx1000__         5     365047504 ns/op
BenchmarkNodeChainTx10000_         1    3149878160 ns/op
ok      github.com/jmcvetta/neoism  19.755s
PASS
BenchmarkNodeChainTx10____        20      76973225 ns/op
BenchmarkNodeChainTx100___        10     142672175 ns/op
BenchmarkNodeChainTx1000__         5     373416705 ns/op
BenchmarkNodeChainTx10000_         1    3221136142 ns/op
ok      github.com/jmcvetta/neoism  47.493s

real    1m27.820s
user    0m5.632s
sys 0m2.724s

interesting gap in the middle one.

commented

Which benchmark is this? Are you testing neoism or just Neo4j directly?

This is Neoism. Running go test -run X -bench BenchmarkNodeChainTx from the neoism folder.