elixir-crawly / crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

Home Page:https://hexdocs.pm/crawly

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Remove URL fragment before storing in `Crawly.Middlewares.UniqueRequest`

tanguilp opened this issue · comments

Usually a fragment leads to the same page.

Yes, it will improve the situation! I will add it to the scope.

@Ziinc Is this issue still open? Looking for something to work on. If it is still open, please help with a description of the issue.

I think you should change the file /lib/crawly/middlewares/unique_request.ex

The fragment could be removed with something like

"http://example.com/faqs#one" |> URI.parse |> Map.put(:fragment, nil) |> URI.to_string

I could submit a pull request about this small change

Regards
Matteo