tomas / needle

Nimble, streamable HTTP client for Node.js. With proxy, iconv, cookie, deflate & multipart support.

Home Page:https://www.npmjs.com/package/needle

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

utf8 redirect breaks stuff

rlidwka opened this issue · comments

Part 1: error

There is a server on the internet that shows following headers:

$ curl -I http://59.gibdd.ru
HTTP/1.1 301 Moved Permanently
Date: Mon, 13 Sep 2021 15:51:29 GMT
Server: Apache
Location: https://гибдд.рф/r/59/
Vary: Accept-Encoding
Content-Type: text/html; charset=iso-8859-1

I'm trying to follow that redirect, and it fails:

require('needle')('get','http://59.gibdd.ru/',{follow:10}).
  then(()=>console.log('ok'),()=>console.log('error'))

It produces this:

> Uncaught TypeError [ERR_INVALID_URL]: Invalid URL: https://гибдд.ÑÑ
                                                                          /r/59/
    at onParseError (internal/url.js:259:9)
    at new URL (internal/url.js:335:5)
    at resolve_url (/tmp/node_modules/needle/lib/needle.js:165:12)
    at ClientRequest.<anonymous> (/tmp/node_modules/needle/lib/needle.js:582:28)
    at Object.onceWrapper (events.js:422:26)
    at ClientRequest.emit (events.js:315:20)
    at ClientRequest.EventEmitter.emit (domain.js:529:15)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:641:27)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:126:17)
    at Socket.socketOnData (_http_client.js:509:22) {
  input: 'https://гибдд.Ñ\x80Ñ\x84/r/59/',
  code: 'ERR_INVALID_URL'
}

This error is not intercepted by needle and is thrown into the void (or rather node.js own unhandledException handler).

Promise is never resolved or rejected, and user has no way of intercepting it.

Part 2: header encoding

Sometimes someone upstream makes a decision that every single downstream developer has to fix on their own. Sad fact of life. So...

Turns out, node.js itself returns headers in binary encoding, as explained here: nodejs/node#7928

> require('needle')('get','http://59.gibdd.ru/').then(res=>{console.log(res.headers)})
Promise { <pending> }
> {
  date: 'Mon, 13 Sep 2021 15:55:57 GMT',
  server: 'Apache',
  location: 'https://гибдд.Ñ\x80Ñ\x84/r/59/',
  vary: 'Accept-Encoding',
  'content-length': '298',
  connection: 'close',
  'content-type': 'text/html; charset=iso-8859-1'
}

But if you try to parse that URL, you'll get either incorrect result (with url.parse) or an exception (with new URL).

Got resolved this by using Buffer.from(res.headers.location, 'binary').toString() instead of res.headers.location as shown here: sindresorhus/got#214

Needle fails here:

var redirect_url = resolve_url(headers.location, uri);

So instead of:

var redirect_url = resolve_url(headers.location, uri);

It should be:

var location = Buffer.from(headers.location, 'binary').toString();
var redirect_url = resolve_url(location, uri);

User should probably get fixed location header as well (shouldn't break api because it was never working in the first place):

out.emit('redirect', headers.location);

Part 3: intercepting an error

Even if we fix encoding on a legitimate redirect, someone can intentionally or accidentally throw in something that breaks new URL syntax.

> new url.URL('http://\xB8', 'http://google.com');
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL: http://¸

So I believe the call to new URL should be wrapped in try..catch and error thrown whenever request errors would normally go.

@tomas , could you take a look? Unhandled error is a serious problem. If you have no time - we can do PR. We need new release with fix. Error handling - mandatory, redirect encoding - if possible.

@tomas can we help anyhow? Fix of this issue is important for us.

http headers are expected to be ISO-8859-1. the server sends the location header in utf-8, the http module interprets it as latin1 and we end up with mojibake. most browsers these days gracefully accept headers in utf-8 and don't break. i think it's worth doing the same.

@yetzt note, bad headers is not a big deal. Problem is in unhandled exception and loss of promise end. First post contains all details how to fix with minimal efforts.

if so, open a pull request. @puzrin