Chunked post request

Question

Chunked post request

alexissellier opened this issue 7 years ago · comments

I use prometheus /federate output as a source to save metrics to warp10
As the size of data is huge, the current implementation fail with

ERRO post fail: Broken pipe (os error 32), sink: warp10_prod

It would be great if beamium could handle chunked post

Steven Le Roux · Answer 1 · Sat May 27 2017 18:14:31 GMT+0800 (China Standard Time)

To reproduce, could give a hint on how huge your data are ?

IMO posting a big dataset shouldn't be an issue no matter what size it is. Chunked POST would be more interesting if you don't know the original Content Length, but here we have it.

Still, a way to overcome this could be to add a setting on the scaper, that would split a scraped source file into multiples. This setting would limit the maximum number of lines to be written into one file.

We use Beamium with massive data forward environments, and using big scraped files works pretty well. Don't you have a firewall/loadbalancer that could reset the connection in between ?

Steven Le Roux · Answer 2 · Sat May 27 2017 18:15:05 GMT+0800 (China Standard Time)

Btw for chunked post, this could help : https://medium.com/@opensourcegeekz/posting-chunked-body-requests-with-hyper-in-rust-lang-cf2b75b901ed

Alexis Sellier · Answer 3 · Mon May 29 2017 17:09:35 GMT+0800 (China Standard Time)

Data size is about 30 M. I do have an haproxy between beamium and a warp10 ingress, i'm going to check if I hit a limit size or a timeout.

Alexis Sellier · Answer 4 · Mon May 29 2017 17:27:16 GMT+0800 (China Standard Time)

I still have the same issue tough when I bypass the haproxy, might be a warp10 conf issue.

Steven Le Roux · Answer 5 · Mon May 29 2017 18:53:29 GMT+0800 (China Standard Time)

What warp10 revision are you using ?

Do you see a TCP RST (or FIN) sent by Warp10 ingress ?

How much GTS do you have inside your 30M data set ? Are they new or already cached by ingress ?

Don't you have any issue producing data and meta messages into kafka from ingress ?

Kevin GEORGES · Answer 6 · Mon May 29 2017 20:54:59 GMT+0800 (China Standard Time)

Chunked encoding wont help on this case as it use the same connection for all the chunks.
We can split dataset which exceed the batch size at scrape time (as @StevenLeRoux suggest), this should solve your issue but it is not the root cause of your issue.

It's look like a network issue - we perform successfully POST of 600M on a regular basis ;)

Alexis Sellier · Answer 7 · Mon May 29 2017 23:24:20 GMT+0800 (China Standard Time)

I am using version 1.2.7 for ingress, I have 151658 GTS in the file problably new for the ingress, with no particular error regarding kafka.

If you have no troubles on your side with 600 M, I will close this issue and dig somewhere else.

Alexis Sellier · Answer 8 · Mon May 29 2017 23:32:49 GMT+0800 (China Standard Time)

Additional information, when I try to replay the post with a curl, The response is a 500 parse error.

Kevin GEORGES · Answer 9 · Tue May 30 2017 00:35:11 GMT+0800 (China Standard Time)

Which file do you try to POST?

Alexis Sellier · Answer 10 · Tue May 30 2017 00:36:49 GMT+0800 (China Standard Time)

a .metric file in the sink directory

Kevin GEORGES · Answer 11 · Tue May 30 2017 00:38:54 GMT+0800 (China Standard Time)

This should not occur. Could you post the parse error?

Kevin GEORGES · Answer 12 · Tue May 30 2017 01:47:16 GMT+0800 (China Standard Time)

I just push #31 which split scrape data in files of batch_size

Kevin GEORGES · Answer 13 · Thu Jun 08 2017 21:20:40 GMT+0800 (China Standard Time)

@alexissellier did you manage to try out this version?

Kevin GEORGES · Answer 14 · Tue Apr 17 2018 01:06:38 GMT+0800 (China Standard Time)

Closed due to inactivity