tomas / needle

Nimble, streamable HTTP client for Node.js. With proxy, iconv, cookie, deflate & multipart support.

Home Page:https://www.npmjs.com/package/needle

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tunnelling doesn't work in v3.1.0

pkey opened this issue · comments

Description

With the new version of needle (3.1.0) the current approach (here) to set up tunnelling doesn't work anymore (version 3.0.0 works fine). Scanning the network using Wireshark shows that needle ends up looping CONNECT requests to the proxy. I suspect that needle tries to use both the tunnel agent while also passing proxy parameters from the environment (HTTP_PROXY and HTTPS_PROXY) and thus ends up in this weird state.

We were previously using global agent together with _PROXY environment variables to force needle to do CONNECT requests but since the new version that also doesn't work (which might be a different issue)

How to reproduce

  1. setup local mtimproxy (or any other proxy)
  2. set up tunnelling using the tunnel agent as describe in the documentation, pointing proxy and port to the mtimproxy.
  3. set up HTTP_PROXY and HTTPS_PROXY environment variables to point to the same proxy
  4. try and make a network call using needle

Expected behaviour

When HTTP_PROXY and HTTPS_PROXY is set, and tunnel is configured, needle should make a CONNECT request to the proxy and establish a tunnel

Actual behaviour

needle tries to establish CONNECT request but doesn't succeed

Hi and thanks for the detailed bug report. Would it be possible to see a small code snippet so I can reproduce the error quickly?

Here's the code snippet:

var needle = require('needle');
var tunnel = require('tunnel');
var myAgent = tunnel.httpOverHttp({
  proxy: { host: '127.0.0.1', port: 8080 }
});

needle.get('https://github.com/status', {agent: myAgent} ,function (error, response) {
        if (!error && response.statusCode == 200)
                console.log(response.body);
        else console.log(error);
});

Make sure to npm install needle tunnel and then set _PROXY environment variables to point to the same proxy (in my case127.0.0.1:8080) as the agent configuration above.

When I run this with needle version 3.1.0 installed, in Wireshark, I see attempts to CONNECT 127.0.0.1:8080 HTTP/1.1 and HTTP/1.1 502 Bad Gateway (text/html) whereas with version 3.0.0, I can see CONNECT github.com:443 HTTP/1.1 and HTTP/1.1 200 Connection established - which is what I would expect. Mind that both result in Server disconnected but the symptoms are the same ones we are experiencing in our own system so I think this is a good example.

Let me know how it goes, I will also try and debug though I am not very familiar with the codebase of needle.

commented

Let me add a bit more context and details.

What we are trying to achieve is to use needle with an HTTP/S proxy. In secure setup, HTTP clients are expected to send proxied requests for HTTPS resources through a HTTP CONNECT tunnel.

As far as I understand, needle doesn't support CONNECT requests. Therefore we are using https://github.com/gajus/global-agent to patch node's http agent to provide CONNECT-capable proxy support. There's similar libraries like https://github.com/koichik/node-tunnel (deprecated) or https://github.com/TooTallNate/node-http-proxy-agent.

The introduction of #382 picks up HTTP_PROXY/HTTPS_PROXY from environment variables and does not allow needle to be used without proxy if those environment variable are present.

We're looking for a way to opt-out of needle picking up proxy configuration from environment variables.

@tomas , any chance you could have a look at this, especially PR #427 as a suggestion to disable this behaviour?

Yes, sorry. I'll make some time this week to take a look into this. :)

Thanks @tomas, no rush, just wanted to make sure you've seen it.
Let me know if there's anything you'd like me to change in the PR.

Hey @tomas , any chance you could have a look at #427 to optionally disable needle from automatically picking up environment variables?

During the analysis and debugging, I noticed that no consistent distinction is made between http_proxy and https_proxy. It is sufficient if one of the two is set, then this is used for all connections. If both are set, the http_proxy is used.

My test snipped:

var needle = require('needle');
needle.get('https://github.com/status', function (error, response) {
        if (!error && response.statusCode == 200)
                console.log(response.body);
        else console.log(error);
});

The results (via proxy: export HTTPS_PROXY=http://localhost:8888):

squid tinyproxy
HTTPS websites ⚠️
HTTP sites

For HTTPS pages, the connection goes through the tinyproxy, but the proxy tries to connect the destination via HTTP. An attempt with CURL through the tinyproxy works without problems.

Some return values:

error from squid

CacheErrorInfo - ERR_READ_ERROR&body=CacheHost: d4e570ebcbe2
ErrPage: ERR_READ_ERROR
Err: [none]
TimeStamp: Fri, 29 Dec 2023

ClientIP: 10.10.x.x
ServerIP: github.com

HTTP Request:
GET /status HTTP/1.1
Accept: */*
User-Agent: Needle/3.3.0 (Node.js v18.17.1; linux x64)
Host: github.com
Connection: close

some output from node:

_header: 'GET https://github.com/status HTTP/1.1\r\n' +
  'accept: */*\r\n' +
  'user-agent: Needle/3.3.0 (Node.js v18.17.1; linux x64)\r\n' +
  'host: github.com\r\n' +
  'Connection: close\r\n' +
  '\r\n',
method: 'GET',
path: 'https://github.com/status',
host: 'localhost',
protocol: 'http:',
statusCode: 502,
statusMessage: 'Bad Gateway',

I cannot see how is send the CONNECT request.

curl example

curl https://github.com/status -v
* Uses proxy env variable HTTPS_PROXY == 'http://localhost:8888'
*   Trying 127.0.0.1:8888...
* Connected to (nil) (127.0.0.1) port 8888 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to github.com:443
> CONNECT github.com:443 HTTP/1.1
> Host: github.com:443
> User-Agent: curl/7.81.0
> Proxy-Connection: Keep-Alive

I have found a workaround for me:

var { ProxyAgent } = require('proxy-agent');
var needle = require('needle');
needle.get('https://github.com/status',{ agent: new ProxyAgent(), use_proxy_from_env_var: false }, function (error, response) {
        if (!error && response.statusCode == 200)
                console.log(response.body);
        else console.log(response);
});

This is a similar setting we're using needle. We're using proxy-agent or global-agent, but the earlier changes for needle to pick up the env variables broke this.

@tomas with use_proxy_from_env_var: false implemented, in my opinion, we can close this issue.
@dklimpel happy to leave this open if your case isn't fully covered yet.

IMHO this is open. Needle supports:

  • HTTP Proxy forwarding, optionally with authentication

And that is not the case. There is no support for https_proxy at the moment.