jsgilmore / gostorm

GoStorm is a Go library that implements the communications protocol required to write Storm spouts and Bolts in Go that communicate with the Storm shells.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SpoutConn Emit() should flush and then always readTaskIds

cameront opened this issue · comments

I think the spout implementation is a bit broken right now.

The multilang protocol says that non-direct Emits always immediately receive back the task Ids they're sent to, but currently the needsTaskIds property is a property of the struct.

It seems like needsTaskIds should go away, and the connection should know that whenever a non-direct emit happens, you need to flush, read task ids, and then return those?

I'm happy to submit PRs if you're still interested in maintaining this library (thanks! btw), or can fork it if you're no longer interested. Let me know what you'd prefer.

Hi, yes I'll be maintainig this package for a while longer, so I would prefer pull requests. I'll soon be adding a local test mode you can use to test go spouts and bolts with in go, without the need for Storm. This is just for testing, since it's single threaded.

The needTaskIds field I added to mirror the needTaskIds field currently used in Storm. If you take a look at the shellSpout code you'll see it accepts the needTaskIds field. The python multilang also sends that accros, even though it's not in the protocol specification.

But we can discuss more on the pull request

Thanks for the great feedback and the fixes. I'll review them asap.

Regards
John Gilmore

-----Original Message-----
From: "cameron" notifications@github.com
Sent: ‎2014-‎04-‎17 20:36
To: "jsgilmore/gostorm" gostorm@noreply.github.com
Subject: Re: [gostorm] SpoutConn Emit() should flush and then alwaysreadTaskIds (#4)

I'm happy to submit PRs if you're still interested in maintaining this library (thanks! btw), or can fork it if you're no longer interested. Let me know what you'd prefer.

Reply to this email directly or view it on GitHub.

Gotcha. I'm a little confused about the details here. Not sure what needTaskIds field you're referring to (storm internals? multilang protocol? or this library's code?).

Is the python multilang referencing the storm.py library (https://github.com/apache/incubator-storm/blob/master/storm-core/src/multilang/py/storm.py)?

Happy to discuss this on the PR though!

Have a look at the Storm shell code: https://github.com/nathanmarz/storm/blob/moved-to-apache/storm-core/src/jvm/backtype/storm/task/ShellBolt.java lines 233 and 234. Those lines check whether the message specified had a need_task_ids field. This is not part of the multilang protocol specification, but it is part of what Storm expects. When I made the pull request to Storm to make multilang protocols serialisable I was required to also add in that flag. We can talk to the Storm guys on whether that option can be removed.

I see. So that looks to me like a way to prevent task ids from being sent
for non-direct emits, since it sends them if need_task_ids is null OR true.
I wonder why that's not in the protocol?

Weird. I'll go back and take a look- thanks for the info!
On Apr 19, 2014 7:19 AM, "John Gilmore" notifications@github.com wrote:

Have a look at the Storm shell code:
https://github.com/nathanmarz/storm/blob/moved-to-apache/storm-core/src/jvm/backtype/storm/task/ShellBolt.javalines 233 and 234. Those lines check whether the message specified had a
need_task_ids field. This is not part of the multilang protocol
specification, but it is part of what Storm expects. When I made the pull
request to Storm to make multilang protocols serialisable I was required to
also add in that flag. We can talk to the Storm guys on whether that option
can be removed.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-40870680
.

Exactly. There are many situations where you emit a tuple and don't really care to which bolt that tuple was sent. The task ID return path also decreases the multilang throughput, because double the number of messages have to be marshalled and unmarshalled. This slows down the already very slow multilang protocol. I'm going to request that they add mention of that flag to the multilang protocol "specification".