sjdirect / abot

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Abot is getting "Request aborted" on responses that should be standard http 404 status

sjdirect opened this issue · comments

See fidler .saz file attached (remove the .zip part of the extension then open in fiddler) to see the raw request/responses from the server.
404s.saz.zip

Noticed that most of the domains above are served by nginx. This appears to be an issue with the HttpWebRequest object interacting with some mystery set of configurations available in nginx. System.Net.Http.HttpClient does NOT have an issue with these requests but would hate to pull in dependencies to use this lib.

Found this blog post which suggested a few properties on the HttpWebRequest object. Turns out this simple setting solved this PITA of a problem. Looks like nginx didn't like communicating over http1.1 protocol (which is default of the HttpWebRequest object). When the default protocol version is set to 1.0 explicitly then nginx behaves.

request.ProtocolVersion = HttpVersion.Version10;

Commit 7356d06 appears to fix this issue. Available fix in Nuget package 1.6.0.3

Rolled back this change. Will see if I can find other workarounds. Defaulting back to http 1.0 caused other problems.

A work around for a client was to create a CustomPageRequester : IPageRequester that handles retrying using http 1.0 if there is a request abort webexception. See attachment...
CustomPageRequester.cs.txt

This workaround is the best way to go if you can't upgrade to Abot v2.0 or greater which targets .net standard 2 and uses the newer httpclient that does not have this issue.