OWASP / owasp-java-encoder

The OWASP Java Encoder is a Java 1.5+ simple-to-use drop-in high-performance encoder class with no dependencies and little baggage. This project will help Java web developers defend against Cross Site Scripting!

Home Page:https://owasp.org/www-project-java-encoder/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create encode for URL function

ropnop opened this issue · comments

Hello! Wanted to start a discussion around adding a function that makes it safe to append strings to URLs in Java. I haven't found a great way to do this, and if we can agree on a good approach I'd be happy to submit a PR.

Imagine the following code:

URL u = new URL("http://example.com/api/v2/" + untrustedInput + "/data");
HttpURLConnection con = (HttpURLConnection) u.openConnection();
...etc...

If unrtustedInput comes in as foo../../../bar what method can make it safe and not allow for path traversal? encode.forUriComponent does not work here because Java URL will just decode the value.

The only good approach I've found is to just strip out all the dots and slashes, as seen here: https://github.com/linkedin/URL-Detector/blob/173c6dbb5d3453d8327b59f254905cdf0338a7de/url-detector/src/main/java/com/linkedin/urls/PathNormalizer.java#L40

Does this sound like something that should belong in owasp-java-encoder? Any other ideas for a sanitization/encoding function that would work?

Hey Jim! Thanks for the quick response. I agree - encoded parameters are the best way to go, but in the case I'm thinking of, the untrusted input has to be part of the URL itself, not a parameter. I don't think URL encoding can be relied on since servers will just decode it and canonicalize the path. Here's a quick test:

public static void main(String [] args) throws Exception {

        String bad = "okay.txt/../../../../key.txt";
        URL badURL = new URL("http://127.0.0.1:8000/a/b/c/" + Encode.forUriComponent(bad));
        System.out.println("Bad URL: " + badURL.toString());
        HttpURLConnection con = (HttpURLConnection) badURL.openConnection();
        con.setRequestMethod("GET");
        int responseCode = con.getResponseCode();
        System.out.println("GET Response Code :: " + responseCode);
        if (responseCode == HttpURLConnection.HTTP_OK) { // success
            BufferedReader in = new BufferedReader(new InputStreamReader(
                    con.getInputStream()));
            String inputLine;
            StringBuffer response = new StringBuffer();

            while ((inputLine = in.readLine()) != null) {
                response.append(inputLine);
            }
            in.close();

            // print result
            System.out.println(response.toString());
        } else {
            System.out.println("GET request not worked");
        }

    }

against a Python server, the file called "key.txt" in the root directory is accessed instead of files in /a/b/c as intended:

Bad URL: http://127.0.0.1:8000/a/b/c/okay.txt%2F..%2F..%2F..%2F..%2Fkey.txt
GET Response Code :: 200
sensitive

Is there a different encoder that I should be using?

It seems to me--in cases where it's possible--that validation, rather than output encoding is the more appropriate defense here. That might not be feasible if the URL that you are creating a path for is not your site (i.e., that of the ORIGIN server), but even then you probably can decide to at least use a block-list and deny "..". If it's your site, then use cannonicalization and then check to see if it corresponds to a legitimate resource. And for JavaEE, put your configuration data like "key.txt" either outside of Document Root or if you want it bundled in your .war or .ear file, make sure it is under WEB-INF/ so the JavaEE container itself will prevent direct browsing attempts to those resources.

Thanks @jmanico - I don't have a .NET environment but I will check out the other resource.

The more I think about it, I'm inclined to agree with @kwwall - I'm just not sure if encoding will ever be enough in this case (without breaking things) - whatever custom encoding scheme you take you'd have to assume the server will be able to decode it, which would be impossible when you're constructing requests to a different site you don't control.

It's crazy how flexible URL paths can be. All of these curl requests are identical and retrieve key from the root of my directory:

curl --path-as-is --url "http://localhost:8000/a/b/c/../../../key"
curl --path-as-is --url "http://localhost:8000/a/b/c/okay%2F..%2F..%2F..%2F..%2Fkey"
curl --path-as-is --url "http://localhost:8000/a/b/c/%6f%6b%61%79%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%6b%65%79"

As for sanitization, I think two good approaches are:

  • just strip all .. and / from the string
  • construct a path from the input and only return the filename portion

I've implemented both as an example:

Bad Input: okay/../../../../key
Just filename: key
StripDotsAndSlashes: okaykey

Even with sanitization it's tricky since the user could supply input that is already URL encoded. In this case the sanitization function needs to properly URL Decode any input before stripping dots and slashes or extracting just a filename

Does it make sense to add any sanitization functions to this library for these cases?

Here's what I was thinking to "sanitize" the inputs. It depends whether you want to honor paths (slashes) or not:

public static String normalizePath(String path) {
        path = URLDecoder.decode(path, Charset.defaultCharset());

        // make it an absolute path then normalize
        String normalized = Paths.get("/" + path).normalize().toString();

        // return a leading slash if it came in with one, otherwise strip the leading slash
        return path.charAt(0) == '/' ? normalized : normalized.substring(1);
    }

public static String stripPath(String path) {
        String decodedPath = URLDecoder.decode(path, Charset.defaultCharset());
        return Paths.get(decodedPath).getFileName().toString();
    }

I suggest a new project for specialized URL validation. Since we are just an encoding library I am closing this out.