Using module in cluster architecture (many upload nodes behind LB)

Question

Using module in cluster architecture (many upload nodes behind LB)

seletskiy opened this issue 12 years ago · comments

Stanislav Seletskiy commented 12 years ago

I'm using upload-progress module in cluster architecture, e.g. I have many upload servers behind load balancer (TCP level). And there is a problem, when I make upload request, I know only load balancer address, not upload server address. And there no way to get upload progress of that server, that handle current upload.

Example: 1 LB, 3 Upload Server (A, B, C). When I do upload request, it handled with, for example, server A (choosen by LB). But when I do upload progress request, it handled by server B, next request by server C, then by A (round robin manner). Obviously, I've got 2 error messages (upload ID is not found) and only one correct answer.

So, I want a mechanism to get upload progress correctly. There is native way to do this in nginx: proxy_next_upstream directive, which can pass request to another server when current server answers with bad HTTP code.

There is my suggestion: add some way to return HTTP code from upload-progress module to make it possible to handle answer with proxy_next_upstream directive.

There's an my patch and example configuration: seletskiy/nginx-upload-progress-module@42b6b60d

Brice Figureau · Answer 1 · Mon May 14 2012 23:58:41 GMT+0800 (China Standard Time)

What I usually recommend is to configure the LB to stick to a back-end based on the X-Progress-ID.

I had a look to your configuration example (I'm not familiar with proxy_next_upstream) and I don't understand how it can work. Is the configuration the one on the LB (assuming it is nginx and not something else) or is it one of your upstream server configuration?
If it's the config of the LB, I fail to see how it can try more than 2 upstreams.
Can you elaborate?
Have you tried this patch on a real infrastructure?

Stanislav Seletskiy · Answer 2 · Tue May 15 2012 11:52:13 GMT+0800 (China Standard Time)

Thanks for answer.

I couldn't configure LB that way you propose, because of it balances packets
purely on TCP layer, without any knowledge about protocol.

Configuration example file I suggest is placed on all upload nodes behind LB, and this configuration is same for all nodes.

Each of upload nodes have upstream that contains all other upload nodes except this one.

My scheme works this way:

Upload progress request come to first upload server;
Upload progress module checks upload, and, if it not found, returns 404 Not Found HTTP code;
That error code handled by error_page directive, so request is passed to @fallback location.
Next, in @fallback location, proxy_pass directive takes control: it passes current request (upload progress request) to specified upstream;
If server in upstream answers with 'good' HTTP code (200 OK), than answer returned to client;
If server in upstream answers with 404 Not Found, then proxy_next_upstream transfers request to next server in upstream, after that answer checked by 5-6 rules again.

If proxy_next_upstream handler sees configured http code in the answer (http_404 in my example), that it passes request to next server in upstream in round robin manner.

I tested patch on a test bench of several machines, not on real highly loaded project, but intend to.

Stanislav Seletskiy · Answer 3 · Tue May 22 2012 10:49:38 GMT+0800 (China Standard Time)

Any thoughts?

Brice Figureau · Answer 4 · Thu May 31 2012 23:16:00 GMT+0800 (China Standard Time)

Your solution is interesting.
So if I understood correctly, every upload server has other upload server in potential upstreams, to which it can forward a probe request.
Let's imagine you have 10 upload servers, the worst case is that for only one probe and very bad luck, this probe will be proxied to every upload servers. This can create an artificial load and worsen the latency of the answer.
I don't really have any generic solution for this problem, but I have the feeling that trying to solve this at the upload progress level is not the right way.
I'll think about that and see what we can do.

Stanislav Seletskiy · Answer 5 · Thu Jun 07 2012 20:27:48 GMT+0800 (China Standard Time)

Yep, you absolutely right.

Proposed solution is not a right way to resolve a problem. But I don't see easy graceful solution for now, so, I try to use this solution in production.

BTW, there are some more complicated ways:

Add one another server between LB and US (upload servers), that be a central storage for X-Progress-Id's, and all progress requests will be proxied through this server to correct node;
Each node will have knowledge about uploads started at another nodes, so if node have no information about requested id, it proxy request next to upstream that handle corresponding upload;

Stanislav Seletskiy · Answer 6 · Tue Aug 28 2012 23:59:01 GMT+0800 (China Standard Time)

I think I found a solution (somewhat obvious one). I will implement it soon and send patch to your review.

I think there should be a directive, called, for example, upload_host_id $hostid. When specified, module should return field host_id = $hostid in answer to the progress request.
So, next time when client will request upload status, it can provide received $hostid in it's next request and load balancer can use this info to redirect request to specific upload server.

Example configuration:

Upload Node A:

location /progress/ {
    report_uploads uploads-zone;
    upload_host_id node_a;
}

Upload Node B:

location /progress/ {
    report_uploads uploads-zone;
    upload_host_id node_b;
}

Load Balancer:

location /progress/node_a/ {
    proxy_pass node_a;
}

location /progress/node_b/ {
    proxy_pass node_b;
}

strever · Answer 7 · Fri Nov 22 2019 15:18:42 GMT+0800 (China Standard Time)

any idea finally solve the problem？

Stanislav Seletskiy · Answer 8 · Thu Jan 09 2020 20:20:32 GMT+0800 (China Standard Time)

Closing this issue since it's awfully outdated.