MarkusBernhardt / proxy-vole

Proxy Vole is a Java library to auto detect the platform network proxy settings.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

proxy-vole interpreting IE settings differently

srett opened this issue · comments

I noticed that proxy-vole interprets the settings read from IE differently, which might lead to non-working internet access in Java. And actually while trying to debug the problem, I got quite confused about this whole issue. I'll try to give a simple example that I encountered:

The setup in IE was set to manual, in the advanced settings dialog there was one proxy entered in the "socks" line (say 1.2.3.4:5678), all other fields left blank. Which made sense to me, it seemed IE would tunnel all connections, no matter which type, through the socks proxy. However, proxy-vole ends up in createFixedProxySelector, logs "IE uses manual settings: socks=1.2.3.4:5678 with bypass list: null"
so it goes ahead and add it via addSelectorForProtocol("socks" ...)

And this is where my confusion starts. As far as I can tell, proxy-vole's understanding of protocol in this regard is the protocol type that the java application requests when opening a connection. But in that context, socks makes no sense, because you never want to talk to a server using the socks protocol. And indeed, the proxy selector that gets generated on this computer never matches anything, because the URIs that get passed to the selector are mostly of type "http", "https" or "socket".

The next problem is that the proxy, if it would be returned by the selector, has type Proxy.Type.HTTP, which is also wrong, because that makes Java try to do a HTTP CONNECT after connection to the proxy, although it is a SOCKS proxy. So basically, proxy-vole interprets this IE setup as "if the client requests a connection of type socks (whatever that is), use the proxy server 1.2.3.4:5678 and talk HTTP CONNECT to it. For all other protocols, don't use a proxy." While IE interprets it as "for any outgoing TCP connection, use the proxy server 1.2.3.4:5678 and talk SOCKS to it."

So basically it comes down to that we generally talk about two different protocols: The protocol that the applications wants to talk to the server it is trying to reach, and the protocol that java should use when talking to the the proxy server. These two things seem to be mixed up in proxy-vole, but also in the IE settings dialog, because I cannot make sense of the list you see there: You get HTTP, Secure, FTP, Socks. So is this refering to the protocol that the browser is trying to talk, or the protocol that the proxy is talking? For HTTP and FTP, both would make sense; so I enter an http URL into the address bar, and expect IE to use the proxy server I entered in the HTTP line. But what protocol is it using to talk to the proxy? Transparent HTTP? HTTP CONNECT? SOCKS? No idea. I'd just assume transparent HTTP. For FTP, things get a little bit confusing; to my knowledge, there is no "distinct FTP proxy", so it could either use HTTP CONNECT or SOCKS. But you don't even have a way to say which type it is, you can only enter Host and Port. Finally if you get to the Socks line, things get confusing: This protocol makes no sense from the client's point of view, there is no socks:// URL type you could enter into your address bar. And we didn't even get to the point where you select "use the same proxy server for all protocols" -- what does that mean in that context, and which protocol is it using to talk to it? What is the difference to unchecking that option and just entering a socks proxy?

Sorry I cannot express this problem more clearly, I still hope it's possible to understand what I'm trying to get at.

I ran some tests. Setup as follows on Windows 10 in IE:
Manual proxy settings, advanced, using a different proxy for all of the four available entries:
"HTTP" goes to proxyip:2000
"Secure" -> proxyip:2001
"FTP" -> proxyip:2002
"Socks" -> proxyip:2003

When entering a plain http URL like http://foobar/, IE connects to port 2000 and sends a normal HTTP request with some additional Proxy headers. (GET http://foobar/ HTTP/1.1)
When entering an https URL (https://foobar/), IE connects to port 2001 and sends "CONNECT foobar:443 HTTP/1.0" (expects HTTP proxy supporting CONNECT)
When entering an ftp URL like ftp://foobar/, IE connects to port 2002 and sends a GET request (GET ftp://foobar/ HTTP/1.1), expecting an HTTP proxy which can talk FTP to the outside world.
The Socks proxy is never used in case all the other three proxies are set. As soon as you leave one of the first three lines empty, the Socks proxy is used as a fallback. It is never used as a fallback in case the proxy set for the specific protocol doesn't work, which I found interesting.

OK so far that has cleared things up a bit for me: All fields in the advanced config dialog in IE refer to the protocol the client wants to talk, and expect the proxy server to be a traditional HTTP proxy with support for CONNECT. The Socks line is somewhat of an exception as it doesn't refer to the protocol the clients wants to talk, but to the type of proxy server and is only ever used for protocols that don't have a specific proxy server set. This makes sense, but the layout of the dialog makes it a bit confusing.

Now to the Java side. I installed a fixed HTTP type proxy that is always returned by the ProxySelector.
Here's what happens:
new URL("http://foobar/").openStream() calls the select() method with URI "http://foobar/", which results in a "GET http://foobar/ HTTP/1.1" to the proxy.
new URL("https://foobar/").openStream() calls the select() method with URI "https://foobar/", which results in a "CONNECT foobar:443 HTTP/1.1" to the proxy.
new URL("ftp://foobar/").openStream() calls the select() method with URI "ftp://foobar/", which results in a "GET ftp://foobar/ HTTP/1.1" to the proxy.

So far, this is pretty much identical to what IE does. So it seems that only the Socks part needs to be fixed in proxy-vole, as it is treated as some kind of fallback by IE for protocols that have no proxy set. It is also never used when the "use same proxy for all protocols" checkbox is checked.
I'll see if I can get a pull request going in the next weeks that will make sure the behavior is identical to IE.

Interesting side-note that has nothing to do with proxy-vole: I also tried the apache httpclient library during my tests; it re-implements HTTP, so if you only install a proxy selector by calling ProxySelector.setDefault(), this leads to calls to ProxySelector.select() with socket:// URIs, and even when replying to those with the above mentioned fixed proxy list containing exactly one proxy of Type.HTTP, the java socket implementation will ignore it and try a DIRECT connection. It won't even try to do an HTTP CONNECT via the proxy.
When using apache httpclient, one has to explicitly call HttpClient.setRoutePlanner() and pass along the wrapped ProxySelector. This however somewhat collides with setting a default ProxySelector, as then you get two calls to the select() method of your ProxySelector:
First one for http://foobar/ and then a second one for socket://proxyhost:proxyport
Some careful studying of its documentation seems to be in order if its usage is desired. ;)

Here's some explanation at the general level.

The Javadoc for ProxySelector clearly states that that it may be called with a look-alike URL like this:

socket://host:port

    ....for tcp client sockets connections. 

You are right that this is not a real URL, but at least the behaviour is documented by Oracle.

The reason why it is designed like this is because there's such a thing as SOCKS proxies. This is a beast which will proxy connections at the level of the tcp socket. (unlike a HTTP proxy server which proxies at the level of the HTTP protocol).

For a given http/https operation Oracle's HttpURLConnection will actually call the ProxySelector twice as you have also discovered. First time is to figure out which proxy to use proxy to use for http/https. The answer to this question can by definition only be proxies of type DIRECT (meaning no proxy) or of type HTTP.

Once this has been established it will also need to know if it should use a SOCKS proxy for the raw tcp socket connection. Thus it calls the ProxySelector once again, but this time with a fake protocol name of 'socket'. The only answer correct answer to this second question is to return proxies of type DIRECT and SOCKS only. SOCKS proxies are very rare in my experience, so the answer to this second question is almost always a list with only one element of type DIRECT.

Btw: There's technically nothing to prevent you from writing a ProxySelector class that would return a reply to the select() which is inconsistent relative to the protocol of the input parameter, e.g. returning a HTTP proxy when the protocol of the URI input is 'socket'. I'm sure this would really mess up things. :-)

I hope this aids your understanding.

Yes, that whole concept became clearer after the digging I did before my second post to this issue ;-)

However, the problem is that if you configure a SOCKS proxy in IE, proxy-vole ends up creating a FixedProxySelector containing a Proxy instance of type HTTP and adds it to the ProtocolDispatchSelector with scheme "socks". So of course, if you pass an http/https or other high level URI to the select() method, it will return nothing (DIRECT), which is expected. But if you pass it a URI of type socket://, one would expect it to return the SOCKS proxy, but obviously it will not find a matching entry in the selector map, as it only contains an entry for key "socks" (and fallbackSelector is still set to the initial value of NoProxySelector.getInstance(), as the IE search strategy will only set the fallback selector if you check the "Use same proxy for all connections" checkbox in IE's connection settings, which prevents you from configuring a SOCKS proxy altogether).
So the quickfix here is to add some code to IEProxySearchStrategy.addSelectorForProtocol() that will replace "socks" by "socket" when calling ps.setSelector(protocol, protocolSelector) (but not when calling settings.getProperty(protocol)). But wait there's more! While ProtocolDispatchSelector now properly returns the FixedProxySelector that contains the SOCKS proxy address and port, it still erroneously has its type set to HTTP, so it will still not work, since it will now return an HTTP proxy when queried for a socket URI (so this does exactly what you described in your last paragraph ;)).
So the second quickfix is to hack IEProxySearchStrategy.addSelectorForProtocol() to actually create a FixedSocksSelector (instead of FixedProxySelector indirectly through ProxyUtil.parseProxySettings()) for the socks entry.
However, these are really just quick hacks that successfully fixed connectivity in my test setup; the same problem might very well occur with other search strategies (Firefox for sure after a quick look), so some slightly more intelligent refactoring seems appropriate instead of copy & pasting the same special case into multiple class files. Unfortunately I'm not that familiar with the code base so it would take me some time to get around to actually create a "pretty" fix for this.

TLDR: IE connection settings set to manual with just a SOCKS proxy leads to no connectivity in Java with proxy-vole, while IE itself works just fine. (Assuming you really only can connect to the internet via a SOCKS proxy somewhere on your reachable LAN).

Understood. By looking at the code I can verify all you say. There seems to be inconsistent use of protocol names vis-a-vis URI scheme names in the library. (as you point out 'socket' <> 'socks'). And as you say: there are other problems too. :-(

This was fixed in #10. Think so at least. Please retest with release 1.0.6 and reopen, if this still exists. Thanks for your help and time