cisagov / pshtt

Scan domains and return data based on HTTPS best practices

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'Valid HTTPS' key-value inconsistent across platforms

refayathaque opened this issue · comments

We are utilizing the pshtt module to determine M-15-13 compliance for certain websites. We are running pshtt off of a python script that is invoking the 'inspect_domains' method to get all relevant results. As part of our testing we have been running the same method in multiple places, namely our local machine and our cloud instances (the pshtt versions are the same on both), additionally, we are also running tests by calling 'pshtt' directly from bash. In all three examples, we are seeing different results for a couple of specific 'key-value' pairs. Provided below is one example of the issues we are facing.

www.worklife4you.com - for this domain we are seeing three different Boolean values for 'Valid HTTPS'.

  • 'pshtt.inspect_domain' method in a python script running locally returns 'None' for 'Valid HTTPS'.
  • running pshtt directly off the bash CLI returns 'False' for 'Valid HTTPS'.
  • running the scan from our cloud instance returns 'True' for 'Valid HTTPS'.
    • What's really strange about this is that it's the same 'pshtt.inspect_domains' method we are running locally, in this application, it's just wrapped in an EC2 instance. The pshtt version is also up-to-date in the cloud (v.0.3.0) and is the same version as in our local machine (v.0.3.0)

Thank you so much for helping us out with this.

Running off of the CLI (pshtt --version says 0.3.0) with either worklife4you.com or www.worklife4you.com gives me a null value in the resulting JSON,

pshtt worklife4you.com -d -j
[
  {
    "Base Domain": "worklife4you.com",
    "Base Domain HSTS Preloaded": false,
    ...
    "Valid HTTPS": null,

Though when I use the CLI and have it output in CSV mode, I get False for the Valid HTTPS column:

pshtt worklife4you.com -d
# ...
cat results.csv
Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error
worklife4you.com,worklife4you.com,https://worklife4you.com,True,False,,False,True,False,True,False,True,False,False,False,,,False,False,False,False,False,False,False,False,False

When running this from the Python API in ipython (where pshtt.__version__ says 0.3.0), I get a value of None in the resulting dict:

In [13]: pshtt.inspect_domains(["worklife4you.com"], {})

Out[13]: 
[{'Base Domain': 'worklife4you.com',
  'Base Domain HSTS Preloaded': False,
  'Canonical URL': 'https://worklife4you.com',
   ...
  'Valid HTTPS': None,

In the latest git-versioned pshtt, None values are supposed to get converted to False for all but a few non-boolean fields:

https://github.com/dhs-ncats/pshtt/blob/develop/pshtt/pshtt.py#L139-L148

    for header in HEADERS:
        if header in ("HSTS Header", "HSTS Max Age", "Redirect To"):
            continue

        if result[header] is None:
            result[header] = False

But previously in 0.3.0, the behavior was to only apply this change to CSV output. The commit that changed this was a44ab68 and on October 21, 2017, but it wasn't merged in in #125 until October 24th, the day after 0.3.0 was published.

@refayathaque Given this, I think you're seeing two issues:

  • The None/False distinction is because in 0.3.0, None only gets turned into False right before CSV serialization. This is fixed in the repository version. It's likely a good time for @h-m-f-t to publish an update to PyPi, but you can also fix this locally by pulling from the git repository (which I do).

  • Valid HTTPS is false because, in your local (and my local) environment, the canonical URL is being detected as https://worklife4you.com, in part because http://worklife4you.com redirects there. And https://worklife4you.com doesn't have a valid cert (it's only valid for the www subdomain, not the root hostname). I suspect that your cloud vantage point (which you say shows you Valid HTTPS as True) is actually seeing different server behavior for some reason, potentially in the redirects you're being served, possibly based on IP/firewall rules affecting the server of cloud provider you're scanning from.

If you can share a full JSON output of the scan results (pshtt worklife4you.com -d -j) from the cloud provider with a result of Valid HTTPS as true, and one from your local environment running the same command and showing different output as Valid HTTPS being null or false, we can take a look at what might be different between the two to show that result. There should be some difference in one of the fields shown in the JSON output, since they contain all of the data points used to calculate the eventual answers.

@refayathaque You are probably already aware of this, but you can install from the GitHub repo via pip like this:

pip install git+https://github.com/dhs-ncats/pshtt.git@develop

Thanks to @konklone for investigating this issue!

@refayathaque, are you still seeing this issue with the latest code from develop?

Hi @jsf9k apologies but I wasn't notified when you and @konklone began to respond to my inquiry. I was only made aware of this over the weekend by a colleague. Thank you so much for your help, let me run the tests you two have recommended, and then I'll get back to you. @jsf9k I actually wasn't aware that you can do pip installs directly off of github, that's quite neat, I'll definitely need to try that out as well. However, in the past, we have encountered innumerable difficulties running the pshtt module in AWS Lambda. AWS Lambda, being essentially run in an Amazon Linux AMI, requires these very specific .so files for the pshtt, and all its supporting modules, to run. Getting these .so files is a nightmare and requires us to 'build from source', something my junior developer repertoire lacks.

@refayathaque, no worries.

Regarding running in AWS Lambda, if you want to run pshtt via 18F/domain-scan then you can leverage the Lambda work that @konklone has already done. You may also find dhs-ncats/lambda_functions useful if you need to build fresher Lambda zip files that what is committed to 18F/domain-scan.

@konklone getting back to you with the JSON objects you asked for.

The first is from our Lambda function running the pshtt scan (FYI we are NOT running pshtt www.worklife4you.com -d -j but we are running pshtt_results = pshtt.inspect_domains([url], {})[0] where url would be www.worklife4you.com)

"Pshtt": { "Base Domain": "worklife4you.com", "Base Domain HSTS Preloaded": "False", "Canonical URL": "https://www.worklife4you.com", "Defaults to HTTPS": "True", "Domain": "www.worklife4you.com", "Domain Enforces HTTPS": "False", "Domain Supports HTTPS": "False", "Domain Uses Strong HSTS": "True", "Downgrades HTTPS": "True", "HSTS": "True", "HSTS Entire Domain": "True", "HSTS Header": "max-age=31536000; includeSubDomains", "HSTS Max Age": "31536000", "HSTS Preload Pending": "False", "HSTS Preload Ready": "None", "HSTS Preloaded": "False", "HTTPS Bad Chain": "None", "HTTPS Bad Hostname": "None", "HTTPS Expired Cert": "None", "HTTPS Self Signed Cert": "None", "Live": "True", "Redirect": "False", "Redirect To": "None", "Strictly Forces HTTPS": "True", "Unknown Error": "False", "Valid HTTPS": "True" }

And here is what is being return in my terminal after running pshtt www.worklife4you.com -d -j

{ "Base Domain": "worklife4you.com", "Base Domain HSTS Preloaded": false, "Canonical URL": "https://worklife4you.com", "Defaults to HTTPS": true, "Domain": "worklife4you.com", "Domain Enforces HTTPS": false, "Domain Supports HTTPS": false, "Domain Uses Strong HSTS": null, "Downgrades HTTPS": false, "HSTS": false, "HSTS Entire Domain": null, "HSTS Header": null, "HSTS Max Age": null, "HSTS Preload Pending": false, "HSTS Preload Ready": false, "HSTS Preloaded": false, "HTTPS Bad Chain": false, "HTTPS Bad Hostname": true, "HTTPS Expired Cert": false, "HTTPS Self Signed Cert": false, "Live": true, "Redirect": false, "Redirect To": null, "Strictly Forces HTTPS": true, "Unknown Error": false, "Valid HTTPS": null }

You're absolutely correct about the CSV serialization. So if I run just pshtt www.worklife4you.com and check out the results.csv, I see that Valid HTTPS is False.

@konklone I also just ran worklife4you.com (without the www.) in our Lambda function and what results is Valid HTTPS is None 😕

@refayathaque, are you using the lambda zip in the domain-scan repo? I don't think that zip has been updated in a while. You can use dhs-ncats/lambda_functions to build a new zip for pshtt.

When I run in lambda using a zip I recently built, I get these (admittedly difficult to read - apologies for that) results:

$ ./scan --scan=pshtt --lambda worklife4you.com
[pshtt] Downloading third party data...
[worklife4you.com][pshtt] Running scan...
        Executing Lambda scan...
Results written to CSV.
$ less results/pshtt.csv 
Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error
worklife4you.com,worklife4you.com,https://worklife4you.com,True,False,,False,True,False,True,False,True,False,False,False,,,False,False,False,False,False,False,False,False,False

Note that Valid HTTPS is False, not None.

@refayathaque ah, nevermind, it looks like you built your own zip. I should read more carefully. :)

@jsf9k thanks for getting back! Yes, we built our own zip file and pushed the deployment package up to Lambda. I am now experimenting with the latest code from the pshtt repo (did pip install git+https://github.com/dhs-ncats/pshtt.git@develop), and I created a local package (which I hope to push up to Lambda and test later), but our pshtt.inspect_domains([url], {})[0] invokation from before isn't working. We get the error TypeError: 'generator' object has no attribute '__getitem__' . Not sure what could be happening here. Do you think they changed the method for invoking pshtt scans from within a .py file?

pshtt.inspect_domains([url], {})[0] - Has this changed?

@refayathaque you need to add a line like this to trigger the work. This changed about four months ago, and pshtt.inspect_domains([url], {}) is now a generator.

@jsf9k thanks for getting back. We will test this once we get a chance, but before we do, a couple of questions.

results = list(results)
^
Where is list defined? Are we importing this from pshtt as well?

return results[0]
^
Is it compulsory for us to return results[0]? In that case, we will need to take this out of our handler and create a separate scan function like what you have. results[0] I'm assuming is basically the return object with all relevant scan data? In essence what we've been recieving as the return dictionary?

Thank you so much for all your help!

@refayathaque list is a built-in Python function, it forces a Python iterator (which is what results is when it's returned from pshtt) to evaluate the entire iterator and convert it into a full list of items.

@refayathaque Once you do list(results) you will have a Python list of results like you were expecting from the old code. You can return the entire thing, take the first one, or do whatever you want with it.

Hi @konklone and @jsf9k, thank you once again for guiding us on how to use the most recent version of the module, we pip installed directly off the repo and used the new scan function invocation. We are now running our scans off the repo, and we seem to be getting the same results as before, at least for three test cases, and we are a little perplexed by the results. Allow me to elaborate.

  1. www.worklife4you.com - Defaults_to_HTTPS : True, Strictly_Forces_HTTPS : True, BUT Supports_HTTPS : False - this isn't making sense to us, if Defaults_to_HTTPS and Strictly_Forces_HTTPS are both True, then surely Supports_HTTPS should be True as well.

    1. worklife4you.com - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : False, Supports_HTTPS : False - the data here is consistent but because the certificate is bad (SSLyze part of pshtt returning an 'error validating certificate' message) can the scan result not be trusted?
  2. www.buprenorphine.samhsa.gov AND buprenorphine.samhsa.gov - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : False, Supports_HTTPS : False - data here is consistent with expectations, exhibiting that pshtt works well for some websites. (No certificate errors for both url and domain)

  3. www.aoa.acl.gov - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : True, Supports_HTTPS : False - this also doesn't make sense to us, how can both Defaults_to_HTTPS and Supports_HTTPS be False when Strongly_Forces _HTTPS is True? We would be remiss if we didn't mention that this scan also resulted in an 'error validating certificate', and as result of this can the result not be trusted?

    1. aoa.acl.gov curiously, results in a slightly different scan outcome - Defaults_to_HTTPS : True, Strictly_Forces_HTTPS : True, Supports_HTTPS : False - again, this makes no sense, it defaults to HTTPS but does not support and force HTTPS? Are we getting these results because this scan also resulted in an 'error validating certificate'?

Thank you!

@refayathaque -

  1. For worklife4you.com, you should get (and I do get) the same results whether you use www or not. pshtt treats those inputs as identical. And for that host, I get False for all of the relevant fields. One key issue is that https://www.worklife4you.com redirects immediately to http://www.worklife4you.com/index.html, which is a downgrade and causes the domain to be flagged as not supporting HTTPS.

  2. Seems like this is working fine.

  3. The results for aoa.acl.gov look True across the board, in pshtt and on Pulse. Let us know if you see anything amiss.

Are you maybe using an old version of pshtt, before we started properly harmonizing inputs with or without www?