DedSecInside / TorBot

Dark Web OSINT Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Only get the original link when crawling onion sites

0xEnders opened this issue · comments

Hi guys,

was following the guide step by step. However when i tried crawling a particular link i only get that link returned even though manually navigating TOR shows that there are multiple other links. Have tried for a few different websites but still having the same issue. Am unsure if its because of my settings or a bug.

Please advise.

What's the link so that I can try to reproduce it? Also can you provide more information such as

  • Operating System
  • Which version of TorBot that you're using?
  • How you're executing the application?
  • TOR configuration

Thanks for the quick reply!

I am trying the links :

http://alphvmmm27o3abo3r2mlmjrpdmzle3rykajqc5xsj7j7ejksbpsa36ad.onion/
http://noescapemsqxvizdxyl7f7rmg5cdjwp33pg2wpmiaaibilb4btwzttad.onion/

Operating System : Ubuntu 22
Which version of TorBot that you're using? : current dev version. i git cloned it

How you're executing the application?
python3 torbot -u http://website.onion --depth 2

TOR configuration : default config
sudo apt install tor
sudo service tor start

Also, is there a way to crawl based on a text file of email addresses?

You're welcome and thanks for providing the information, I'll look into it later today or sometime this week. There is no feature to crawl email addresses, the current program operates on HTML retrieved from sites so I don't know how that would be possible with email addresses but if you have suggestions for a new feature then feel free to submit a ticket and it'll be looked into. If you already know how the feature should be implemented then you can take a crack at it and submit a pull request to the repo.

correction, text file of websites* not email addresses. And thanks for looking into it. ill go and mess around with the settings and see what happens. 2 other things :

  1. Is it recommended to amend the torcc config file? Because i didnt touch that and all
  2. Can I get a link to the slack channel? The link on the main page has expired.

Thanks once again!

  1. It's your choice. I've created CLI flags to dynamically define the SOCKS5 proxy when instantiating the HTTPS client.
  2. The link should still work, but the Slack channel is not highly used. If you have suggestions, thoughts, or problems. You'll likely get the quickest response from posting here.

There's no way for us to crawl multiple websites at once right?

Not currently, it'd probably be a fairly straightforward feature to implement but no one has requested it. If you want to know what's possible or not, check the README. If you have ideas or suggestions, create a new ticket.

Or build it out yourself and submit it if you're capable.

I checked the URLs and the reason why it's only returning the host domain is that all of the links are paths within the same domain. The scraper looks for unique host domains that are fully qualified URIs. All of the links are paths to the same domain, not different sites.

I'll look into modifying the feature to identify paths.