Issues on Windows

Question

Issues on Windows

krtek4 opened this issue 8 years ago · comments

Gilles Crettenand commented 8 years ago

Hey there,

I have multiple issues on windows

First a non Windows specific issue : git is required has some packages are linked to github repos.

Then, a11ym behave strangely on windows. When I try to crawl an URL, I have one of three possible outcomes :

a11ym fetches the first URL and stop immediately afterward. No report is produced.
a11ym fetches the first URL, run it and then fetches a bunch of URL and stops. No report is produced.
a11ym fetches a bunch of URLs, runs the X firsts (X being the number of workers) and then fetches other URLs until stopped manually. No report is produced.

In each cases, I have no debug output whatsoever.

I tried multiple combination of global and non global packages :

Global PhantomJS and A11ym
Global PhantomJS and local A11ym
Local PhantomJS and A11ym
Local PhantomJS and local A11ym

The result is the same in each cases. The computer is freshly installed with Windows 10 and the latest nodejs in 64 bit.

> node.exe --version
v5.10.1
> npm --version
3.8.3
> [System.Environment]::OSVersion.Version

Major  Minor  Build  Revision
-----  -----  -----  --------
10     0      10586  0
> phantomjs.cmd -v
2.1.1

magnus anderssen · Answer 1 · Fri May 13 2016 17:33:28 GMT+0800 (China Standard Time)

I'm trying to run a11ym on linux.

I encounter the same issue as described in this ticket: "a11ym fetches the first URL and stop immediately afterward. No report is produced."

I only get

$ ./node_modules/.bin/a11ym https://todomvc.com                                    
Initializing with https://todomvc.com.
$

And that's all.

Versions:

$ node -v
v4.2.2
$ npm -v
2.14.7
$ npm list
/src/test
└─┬ the-a11y-machine@0.8.1
  ├── async@1.5.2
  ├─┬ chalk@1.1.3
  │ ├── ansi-styles@2.2.1
  │ ├── escape-string-regexp@1.0.5
  │ ├─┬ has-ansi@2.0.0
  │ │ └── ansi-regex@2.0.0
  │ ├─┬ strip-ansi@3.0.1
  │ │ └── ansi-regex@2.0.0
  │ └── supports-color@2.0.0
  ├─┬ commander@2.9.0
  │ └── graceful-readlink@1.0.1
  ├── crypto@0.0.3
  ├─┬ glob@6.0.4
  │ ├─┬ inflight@1.0.4
  │ │ └── wrappy@1.0.1
  │ ├── inherits@2.0.1
  │ ├─┬ minimatch@3.0.0
  │ │ └─┬ brace-expansion@1.1.4
  │ │   ├── balanced-match@0.4.1
  │ │   └── concat-map@0.0.1
  │ ├─┬ once@1.3.3
  │ │ └── wrappy@1.0.1
  │ └── path-is-absolute@1.0.0
  ├── HTML_CodeSniffer@2.0.1 (git+https://github.com/liip-forks/HTML_CodeSniffer.git#5cee16fe68f76ffd96caee41c6b2754fc00d4f47)
  ├─┬ mkdirp@0.5.1
  │ └── minimist@0.0.8
  ├─┬ pa11y@3.2.1 (git+https://github.com/liip-forks/pa11y.git#a4ab830d30bbee4064d1794a32e457d85be90f24)
  │ ├── async@1.4.2
  │ ├─┬ bfj@1.2.2
  │ │ └── check-types@3.2.0
  │ ├─┬ commander@2.8.1
  │ │ └── graceful-readlink@1.0.1
  │ ├── lower-case@1.1.3
  │ ├─┬ node.extend@1.1.5
  │ │ └── is@3.1.0
  │ ├─┬ once@1.3.3
  │ │ └── wrappy@1.0.1
  │ └─┬ truffler@2.1.1
  │   ├── freeport@1.0.5
  │   ├─┬ hasbin@1.1.3
  │   │ └── async@1.5.2
  │   └── node-phantom-simple@2.0.6
  ├── process@0.11.3
  ├─┬ simplecrawler@0.7.0 (git+https://github.com/cgiffard/node-simplecrawler.git#bdafeb7acb55cb38655ce44d522ce06873db621e)
  │ ├── iconv-lite@0.4.13
  │ └── urijs@1.18.0
  └── underscore@1.8.3

Ivan Enderlin · Answer 2 · Tue Sep 13 2016 16:22:52 GMT+0800 (China Standard Time)

@callmemagnus Sorry for the late reply… There a TLS issue with https://todomvc.com. Maybe this is why the crawler does not scan it. Did you try with the --http-tls-disable option?

Ivan Enderlin · Answer 3 · Wed Dec 21 2016 23:47:45 GMT+0800 (China Standard Time)

@callmemagnus There is a certificate issue with https://todomvc.com. See the following command:

$ curl -D - -o /dev/null https://todomvc.com -s
curl: (60) SSL certificate problem: Invalid certificate chain
More details here: https://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

With the --insecure flag, we are able to GET the page:

$ curl -D - -o /dev/null --insecure https://todomvc.com -s
HTTP/1.1 200 OK
Server: GitHub.com
Date: Wed, 21 Dec 2016 15:43:43 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 254821
Last-Modified: Tue, 15 Nov 2016 19:32:33 GMT
Access-Control-Allow-Origin: *
Expires: Wed, 21 Dec 2016 15:53:43 GMT
Cache-Control: max-age=600
Accept-Ranges: bytes
X-GitHub-Request-Id: B2D3F57C:C44D:3D11CD8:585AA32F

So basically, you have to use a11ym with --http-tls-disable:

$ ./a11ym -m 3 --http-tls-disable https://todomvc.com
Initializing with https://todomvc.com.
Fetch complete for https://todomvc.com/.
Waiting to run https://todomvc.com/.
 1/3  Run: https://todomvc.com/.
Fetching https://todomvc.com/site-assets/favicon.ico.
Fetching https://todomvc.com/bower_components/webcomponentsjs/webcomponents-lite.min.js.
Fetching https://todomvc.com/examples/backbone.
Fetching https://todomvc.com/url,baseUri.
Fetching https://todomvc.com/assetPath,module.ownerDocument.baseURI.
Fetching https://todomvc.com/url,root.
Fetching http://todomvc.com/.
Fetch complete for https://todomvc.com/site-assets/favicon.ico; skipped, not text/html.
Fetching https://todomvc.com/site-assets/main.min.css.
Fetch complete for https://todomvc.com/bower_components/webcomponentsjs/webcomponents-lite.min.js; skipped, not text/html.
Fetching https://todomvc.com/bower_components/paper-icon-button/%5B%5Bsrc%5D%5D.
Fetch complete for https://todomvc.com/examples/backbone, and redirect to http://todomvc.com/examples/backbone/.
Fetching https://todomvc.com/examples/angularjs.
https://todomvc.com/url,baseUri responds with a 404.
https://todomvc.com/assetPath,module.ownerDocument.baseURI responds with a 404.
https://todomvc.com/url,root responds with a 404.
https://todomvc.com/bower_components/paper-icon-button/%5B%5Bsrc%5D%5D responds with a 404.
Fetching https://todomvc.com/bower_components/webcomponentsjs/b%22,%22http:/a.
Fetch complete for http://todomvc.com/.
Waiting to run http://todomvc.com/.
 2/3  Run: http://todomvc.com/.
Fetching http://todomvc.com/url,baseUri.
Fetching http://todomvc.com/assetPath,module.ownerDocument.baseURI.
Fetching http://todomvc.com/url,root.
etc.

I have this report:

Ivan Enderlin · Answer 4 · Wed Dec 21 2016 23:50:20 GMT+0800 (China Standard Time)

@krtek4 No more direct dependencies are using Git. Onlt HTML_CodeSniffer is using https://github.com/….

About the unpexected stop, probablt that #78 is a similar issue, and it has been fixed. Could you confirm please? If the problem is still present, please, give me the URL you are trying to crawl. I am not sure this is an issue related to Windows.

Gilles Crettenand · Answer 5 · Thu Dec 22 2016 00:00:04 GMT+0800 (China Standard Time)

I don't have a Windows machine available anymore, and honestly I don't remember the website I had issues with.

Since there has been multiple changes since I opened the bug and no other reports about the same issue, I think it is fair to say the bug is fixed. For me, you can close it :)

Thanks !

Ivan Enderlin · Answer 6 · Thu Dec 22 2016 00:04:28 GMT+0800 (China Standard Time)

Thank you! Feel free to reopen if needed.

xoxo

syndy1989 · Answer 7 · Tue Mar 07 2017 22:24:55 GMT+0800 (China Standard Time)

Hi there, I'm actually using Windows server 2012. I tried downloading cygwin on Windows to run bash commands.
I've noticed that pa11y-crawl gives the following error when attempting to crawl a URL with a subdomain.

. is not an html document, skipping

Any advice on this would be helpful. Thanks in advance or is there any other tool to crawl the site on Windows machine

Ivan Enderlin · Answer 8 · Tue Mar 07 2017 23:04:42 GMT+0800 (China Standard Time)

@syndy1989 Can you give me the command line you run please?

syndy1989 · Answer 9 · Wed Mar 08 2017 12:07:20 GMT+0800 (China Standard Time)

@Hywan Please find the command line error below

$ pa11y-crawl nature.com
fatal: Not a git repository (or any of the parent directories): .git

using wget to mirror site
<<< found 1 files in 1 directories
beginning the analysis
|---------------------------------------
C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found
|| is not an html document, skipping
C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found
|| is not an html document, skipping
C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found
|---------------------------------------
|-> analyzing
jq: error: Could not open file /home/results.json: No such file or directory
jq: error: Could not open file /home/pa11y-crawl/pa11y.json: No such file or directory
jq: error (at /home/pa11y-crawl/pa11y.json:0): Cannot use null (null) as object key
parse error: Invalid numeric literal at line 2, column 8
parse error: Invalid numeric literal at line 2, column 8
parse error: Invalid numeric literal at line 2, column 8
<<< pa11y says: error: | warning: | notice:
cleaning up
rm: cannot remove '/home/pa11y-crawl': Device or resource busy

Ivan Enderlin · Answer 10 · Wed Mar 08 2017 17:20:27 GMT+0800 (China Standard Time)

@syndy1989 You are running pa11y-crawl, not a11ym.