'Valid HTTPS' key-value inconsistent across platforms
refayathaque opened this issue · comments
We are utilizing the pshtt module to determine M-15-13 compliance for certain websites. We are running pshtt off of a python script that is invoking the 'inspect_domains' method to get all relevant results. As part of our testing we have been running the same method in multiple places, namely our local machine and our cloud instances (the pshtt versions are the same on both), additionally, we are also running tests by calling 'pshtt' directly from bash. In all three examples, we are seeing different results for a couple of specific 'key-value' pairs. Provided below is one example of the issues we are facing.
www.worklife4you.com - for this domain we are seeing three different Boolean values for 'Valid HTTPS'.
- 'pshtt.inspect_domain' method in a python script running locally returns 'None' for 'Valid HTTPS'.
- running pshtt directly off the bash CLI returns 'False' for 'Valid HTTPS'.
- running the scan from our cloud instance returns 'True' for 'Valid HTTPS'.
- What's really strange about this is that it's the same 'pshtt.inspect_domains' method we are running locally, in this application, it's just wrapped in an EC2 instance. The pshtt version is also up-to-date in the cloud (v.0.3.0) and is the same version as in our local machine (v.0.3.0)
Thank you so much for helping us out with this.
Running off of the CLI (pshtt --version
says 0.3.0
) with either worklife4you.com
or www.worklife4you.com
gives me a null
value in the resulting JSON,
pshtt worklife4you.com -d -j
[
{
"Base Domain": "worklife4you.com",
"Base Domain HSTS Preloaded": false,
...
"Valid HTTPS": null,
Though when I use the CLI and have it output in CSV mode, I get False
for the Valid HTTPS
column:
pshtt worklife4you.com -d
# ...
cat results.csv
Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error
worklife4you.com,worklife4you.com,https://worklife4you.com,True,False,,False,True,False,True,False,True,False,False,False,,,False,False,False,False,False,False,False,False,False
When running this from the Python API in ipython
(where pshtt.__version__
says 0.3.0
), I get a value of None
in the resulting dict:
In [13]: pshtt.inspect_domains(["worklife4you.com"], {})
Out[13]:
[{'Base Domain': 'worklife4you.com',
'Base Domain HSTS Preloaded': False,
'Canonical URL': 'https://worklife4you.com',
...
'Valid HTTPS': None,
In the latest git-versioned pshtt, None
values are supposed to get converted to False
for all but a few non-boolean fields:
https://github.com/dhs-ncats/pshtt/blob/develop/pshtt/pshtt.py#L139-L148
for header in HEADERS:
if header in ("HSTS Header", "HSTS Max Age", "Redirect To"):
continue
if result[header] is None:
result[header] = False
But previously in 0.3.0, the behavior was to only apply this change to CSV output. The commit that changed this was a44ab68 and on October 21, 2017, but it wasn't merged in in #125 until October 24th, the day after 0.3.0 was published.
@refayathaque Given this, I think you're seeing two issues:
-
The
None
/False
distinction is because in 0.3.0,None
only gets turned intoFalse
right before CSV serialization. This is fixed in the repository version. It's likely a good time for @h-m-f-t to publish an update to PyPi, but you can also fix this locally by pulling from the git repository (which I do). -
Valid HTTPS
is false because, in your local (and my local) environment, the canonical URL is being detected ashttps://worklife4you.com
, in part becausehttp://worklife4you.com
redirects there. Andhttps://worklife4you.com
doesn't have a valid cert (it's only valid for thewww
subdomain, not the root hostname). I suspect that your cloud vantage point (which you say shows youValid HTTPS
asTrue
) is actually seeing different server behavior for some reason, potentially in the redirects you're being served, possibly based on IP/firewall rules affecting the server of cloud provider you're scanning from.
If you can share a full JSON output of the scan results (pshtt worklife4you.com -d -j
) from the cloud provider with a result of Valid HTTPS
as true
, and one from your local environment running the same command and showing different output as Valid HTTPS
being null
or false
, we can take a look at what might be different between the two to show that result. There should be some difference in one of the fields shown in the JSON output, since they contain all of the data points used to calculate the eventual answers.
@refayathaque You are probably already aware of this, but you can install from the GitHub repo via pip like this:
pip install git+https://github.com/dhs-ncats/pshtt.git@develop
Thanks to @konklone for investigating this issue!
@refayathaque, are you still seeing this issue with the latest code from develop
?
Hi @jsf9k apologies but I wasn't notified when you and @konklone began to respond to my inquiry. I was only made aware of this over the weekend by a colleague. Thank you so much for your help, let me run the tests you two have recommended, and then I'll get back to you. @jsf9k I actually wasn't aware that you can do pip installs directly off of github, that's quite neat, I'll definitely need to try that out as well. However, in the past, we have encountered innumerable difficulties running the pshtt module in AWS Lambda. AWS Lambda, being essentially run in an Amazon Linux AMI, requires these very specific .so files for the pshtt, and all its supporting modules, to run. Getting these .so files is a nightmare and requires us to 'build from source', something my junior developer repertoire lacks.
@refayathaque, no worries.
Regarding running in AWS Lambda, if you want to run pshtt
via 18F/domain-scan
then you can leverage the Lambda work that @konklone has already done. You may also find dhs-ncats/lambda_functions
useful if you need to build fresher Lambda zip files that what is committed to 18F/domain-scan
.
@konklone getting back to you with the JSON objects you asked for.
The first is from our Lambda function running the pshtt scan (FYI we are NOT running pshtt www.worklife4you.com -d -j
but we are running pshtt_results = pshtt.inspect_domains([url], {})[0]
where url
would be www.worklife4you.com
)
"Pshtt": { "Base Domain": "worklife4you.com", "Base Domain HSTS Preloaded": "False", "Canonical URL": "https://www.worklife4you.com", "Defaults to HTTPS": "True", "Domain": "www.worklife4you.com", "Domain Enforces HTTPS": "False", "Domain Supports HTTPS": "False", "Domain Uses Strong HSTS": "True", "Downgrades HTTPS": "True", "HSTS": "True", "HSTS Entire Domain": "True", "HSTS Header": "max-age=31536000; includeSubDomains", "HSTS Max Age": "31536000", "HSTS Preload Pending": "False", "HSTS Preload Ready": "None", "HSTS Preloaded": "False", "HTTPS Bad Chain": "None", "HTTPS Bad Hostname": "None", "HTTPS Expired Cert": "None", "HTTPS Self Signed Cert": "None", "Live": "True", "Redirect": "False", "Redirect To": "None", "Strictly Forces HTTPS": "True", "Unknown Error": "False",
"Valid HTTPS": "True" }
And here is what is being return in my terminal after running pshtt www.worklife4you.com -d -j
{ "Base Domain": "worklife4you.com", "Base Domain HSTS Preloaded": false, "Canonical URL": "https://worklife4you.com", "Defaults to HTTPS": true, "Domain": "worklife4you.com", "Domain Enforces HTTPS": false, "Domain Supports HTTPS": false, "Domain Uses Strong HSTS": null, "Downgrades HTTPS": false, "HSTS": false, "HSTS Entire Domain": null, "HSTS Header": null, "HSTS Max Age": null, "HSTS Preload Pending": false, "HSTS Preload Ready": false, "HSTS Preloaded": false, "HTTPS Bad Chain": false, "HTTPS Bad Hostname": true, "HTTPS Expired Cert": false, "HTTPS Self Signed Cert": false, "Live": true, "Redirect": false, "Redirect To": null, "Strictly Forces HTTPS": true, "Unknown Error": false,
"Valid HTTPS": null }
You're absolutely correct about the CSV serialization. So if I run just pshtt www.worklife4you.com
and check out the results.csv, I see that Valid HTTPS is False.
@refayathaque, are you using the lambda zip in the domain-scan
repo? I don't think that zip has been updated in a while. You can use dhs-ncats/lambda_functions
to build a new zip for pshtt
.
When I run in lambda using a zip I recently built, I get these (admittedly difficult to read - apologies for that) results:
$ ./scan --scan=pshtt --lambda worklife4you.com
[pshtt] Downloading third party data...
[worklife4you.com][pshtt] Running scan...
Executing Lambda scan...
Results written to CSV.
$ less results/pshtt.csv
Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error
worklife4you.com,worklife4you.com,https://worklife4you.com,True,False,,False,True,False,True,False,True,False,False,False,,,False,False,False,False,False,False,False,False,False
Note that Valid HTTPS
is False
, not None
.
@refayathaque ah, nevermind, it looks like you built your own zip. I should read more carefully. :)
@jsf9k thanks for getting back! Yes, we built our own zip file and pushed the deployment package up to Lambda. I am now experimenting with the latest code from the pshtt repo (did pip install git+https://github.com/dhs-ncats/pshtt.git@develop
), and I created a local package (which I hope to push up to Lambda and test later), but our pshtt.inspect_domains([url], {})[0]
invokation from before isn't working. We get the error TypeError: 'generator' object has no attribute '__getitem__'
. Not sure what could be happening here. Do you think they changed the method for invoking pshtt scans from within a .py file?
pshtt.inspect_domains([url], {})[0]
- Has this changed?
@refayathaque you need to add a line like this to trigger the work. This changed about four months ago, and pshtt.inspect_domains([url], {})
is now a generator.
@jsf9k thanks for getting back. We will test this once we get a chance, but before we do, a couple of questions.
results = list(results)
^
Where is list
defined? Are we importing this from pshtt as well?
return results[0]
^
Is it compulsory for us to return results[0]
? In that case, we will need to take this out of our handler and create a separate scan
function like what you have. results[0]
I'm assuming is basically the return object with all relevant scan data? In essence what we've been recieving as the return dictionary?
Thank you so much for all your help!
@refayathaque list
is a built-in Python function, it forces a Python iterator (which is what results
is when it's returned from pshtt
) to evaluate the entire iterator and convert it into a full list of items.
@refayathaque Once you do list(results)
you will have a Python list of results like you were expecting from the old code. You can return the entire thing, take the first one, or do whatever you want with it.
Hi @konklone and @jsf9k, thank you once again for guiding us on how to use the most recent version of the module, we pip installed directly off the repo and used the new scan function invocation. We are now running our scans off the repo, and we seem to be getting the same results as before, at least for three test cases, and we are a little perplexed by the results. Allow me to elaborate.
-
www.worklife4you.com - Defaults_to_HTTPS : True, Strictly_Forces_HTTPS : True, BUT Supports_HTTPS : False - this isn't making sense to us, if Defaults_to_HTTPS and Strictly_Forces_HTTPS are both True, then surely Supports_HTTPS should be True as well.
- worklife4you.com - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : False, Supports_HTTPS : False - the data here is consistent but because the certificate is bad (SSLyze part of pshtt returning an 'error validating certificate' message) can the scan result not be trusted?
-
www.buprenorphine.samhsa.gov AND buprenorphine.samhsa.gov - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : False, Supports_HTTPS : False - data here is consistent with expectations, exhibiting that pshtt works well for some websites. (No certificate errors for both url and domain)
-
www.aoa.acl.gov - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : True, Supports_HTTPS : False - this also doesn't make sense to us, how can both Defaults_to_HTTPS and Supports_HTTPS be False when Strongly_Forces _HTTPS is True? We would be remiss if we didn't mention that this scan also resulted in an 'error validating certificate', and as result of this can the result not be trusted?
- aoa.acl.gov curiously, results in a slightly different scan outcome - Defaults_to_HTTPS : True, Strictly_Forces_HTTPS : True, Supports_HTTPS : False - again, this makes no sense, it defaults to HTTPS but does not support and force HTTPS? Are we getting these results because this scan also resulted in an 'error validating certificate'?
Thank you!
-
For worklife4you.com, you should get (and I do get) the same results whether you use
www
or not.pshtt
treats those inputs as identical. And for that host, I getFalse
for all of the relevant fields. One key issue is thathttps://www.worklife4you.com
redirects immediately tohttp://www.worklife4you.com/index.html
, which is a downgrade and causes the domain to be flagged as not supporting HTTPS. -
Seems like this is working fine.
-
The results for aoa.acl.gov look True across the board, in
pshtt
and on Pulse. Let us know if you see anything amiss.
Are you maybe using an old version of pshtt
, before we started properly harmonizing inputs with or without www
?