brianleect / etherscan-labels

Full label data dump of top EVM chains in JSON/CSV.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feat] [Bug] Etherscan Cloudflare bypass

brianleect opened this issue · comments

commented

Seems that etherscan might have implemented an additional layer of scraping protection. In an attempt to scrape today it appears that while logged in I got blocked by a cloudflare linked page. Might be a major problem.

Will need to research further and see if it occurs often or was a one off case.

I see you are using simple selenium, which can be caught easily as a bot. Have you tried using undetected-chromedriver, or selenium-wire? They are very good at bypassing anti-bot tests.

commented

Thanks for the suggestion, I'll take a look at it! Do feel free to open a PR with this integrated if you happen to be more familiar with utilizing the libraries.

Currently it seems that simple selenium works for bscscan and polygonscan but not etherscan.

commented

Might have scraped too heavily, getting the issue on bscscan now as well.

commented

Currently main problem consistently occurs for scraping eth (old scraped done) and optimism (TBD)

I happened to be working on web scraping a lot these days, I will try implementing selenium-wire or undetected-chromedriver. Hopefully, that will resolve the issue. Can you assign this issue to me?

Hey guys. I tried using the selenium wire with residental proxies (workaround, that worked few weeks ago). I haven't tried using the undetected selenium. Try this - https://fingerprintjs.github.io/BotD/main/ this could be a good start to see if it detects your broswer correctly (this helped me last time)

@dante11235, that's a great website thanks for sharing that