ruippeixotog / scala-scraper

A Scala library for scraping content from HTML pages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Per connection proxy?

polymorpher opened this issue · comments

I noticed the README and comments to existing issues suggest setting up a global JVM proxy for the use case of scrapping behind a proxy. However, in many distributed use cases, it is necessary to switch proxy servers at high frequency.

As I was searching for answers, I noticed Jsoup supports per-connection proxy since 1.9 (https://stackoverflow.com/questions/13288471/jsoup-over-vpn-proxy). Similar support also exists in HtmlUnit: https://stackoverflow.com/questions/36398670/using-htmlunit-behind-proxy .

Based on these, it would seem straightforward to add per connection proxy functionality to the scala-scrapper. I want to confirm this is a missing feature and not part of the next release, before I work on it and start a pull request.

Hi @polymorpher, jsoup supporting per connection proxy settings is great news! I only relied on those system properties because I didn't have other choice at the time.

Support setting a proxy Browser-wide, probably via a setter method, would be great. If you're willing to take this on and submit a pull request I would appreciate it!

Pull request merged