with both supervisor & non-supervisor crawler, target page is https://arxiv.org
- Inside ./web_crawler
- Ubuntu:
- Add Erlang Solutions repo:
wget https://packages.erlang-solutions.com/erlang-solutions_1.0_all.deb && sudo dpkg -i erlang-solutions_1.0_all.deb
- Run:
sudo apt-get update
- Install the Erlang/OTP platform and all of its applications:
sudo apt-get install esl-erlang
- Install Elixir:
sudo apt-get install elixir
- Add Erlang Solutions repo:
- Get dependency packages
mix deps.get
- Run
iex -S mix
- Run
WebCrawlerNoSupervisor.get_links
for crawling with no supervisor, or runWebCrawlerSupervisor
for crawling with supervisor - Input an author you want to search, ex. Ian Goodfellow
- That's it! Wait for the crawling! P.S. Don't try to search with some famaliar name like "Paul", too much result will cause to much crawling, and you'll probably get ban by the arxiv.org
If available in Hex, the package can be installed
by adding web_crawler
to your list of dependencies in mix.exs
:
def deps do
[
{:web_crawler, "~> 0.1.0"}
]
end
Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/web_crawler.