Giters
yasserg
/
crawler4j
Open Source Web Crawler for Java
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
4503
Watchers:
306
Issues:
287
Forks:
1921
yasserg/crawler4j Issues
Not working with JDK 17
Closed
2 years ago
Comments count
9
Question Regarding the Crawler Logic
Updated
a year ago
Comments count
3
Closing connections idle longer than 30 SECONDS
Updated
a year ago
Comments count
6
configurable slf4j logger
Closed
4 years ago
Comments count
3
Authentication OpenID Connect not working with Crawler4j
Updated
a year ago
when will release crawl4j new version?
Updated
a year ago
Comments count
1
Can't fetch content of random pages
Updated
2 years ago
Comments count
1
[Performance Bug Report] Specific input cause endless loop in UrlResolver.java
Updated
2 years ago
Comments count
1
Could not find artifact Sleepycat
Updated
2 years ago
Comments count
1
Cannot fetch content of some website but python can.
Updated
2 years ago
Comments count
1
Fix Maven dependencies in build.gradle
Updated
2 years ago
Q: any plan on releasing a new version with updated dependencies..
Updated
3 years ago
Comments count
4
examples or code scraps of how to use crawler4j when pages contain js generated links? ...
Updated
3 years ago
what is the purpose of threadShutdownDelaySeconds and cleanupDelaySeconds
Updated
3 years ago
Units for maxDownloadSize are not defined in docs/comments
Updated
3 years ago
Handle escaped * and $ characters in RobotsServer
Updated
3 years ago
Enable Dependabot on the repo to keep dependencies up to date
Updated
3 years ago
Comments count
1
Canonicalization Error
Updated
4 years ago
Comments count
2
Wrong comment
Updated
4 years ago
Typo
Closed
4 years ago
config.setResumableCrawling(true) make it too slow
Updated
4 years ago
Even if seeded with different domains, crawler4j crawls one domain at a time
Updated
4 years ago
Backslashes in path processed incorrect
Updated
4 years ago
When I set depth to one level, I can see that only one thread is working
Updated
4 years ago
Performance Issues
Updated
4 years ago
Comments count
5
Future - Sustainability of project
Updated
4 years ago
Comments count
2
Performance problem with String.replace() in JAVA 8.
Updated
4 years ago
Comments count
7
EMBED tag is not parsed causes we missed embedded URL
Updated
4 years ago
URLcanonicalizer returns invalid URL as a valid one
Updated
4 years ago
Shutting Down a specific crawler of 3 working crawlers ?
Updated
4 years ago
Comments count
4
How to crawl dynamic javascript page.
Updated
4 years ago
Comments count
1
Crawling https://developress.netsons.org returns 0 pages
Closed
4 years ago
Comments count
4
URL's with spaces are not getting considered.
Updated
4 years ago
Comments count
1
Strange conditions to follow redirection
Updated
4 years ago
Exponential backtracking in regex blocks Thread
Updated
4 years ago
Comments count
1
use useSystemProperties() for HttpClient builder
Updated
4 years ago
Not working on Android
Updated
5 years ago
Change in Documentation
Updated
5 years ago
Comments count
1
DocIDServer contains more than configured maxPagesToFetch Url count
Updated
5 years ago
Add how to contribute documentation
Updated
5 years ago
Cant close the connection and crawler thread hanged
Updated
5 years ago
getCanonicalURL removes duplicated paramaters
Updated
5 years ago
Duplicate visit with anchor urls
Updated
5 years ago
Comments count
3
Android Support
Updated
5 years ago
Method and attribute visibility
Updated
5 years ago
Comments count
1
Why PageFetcher apply Politeness on all fetched pages?
Updated
5 years ago
[Feature] On-the-fly checksum calculation
Updated
5 years ago
Comments count
1
www.dvsh.co.uk
Closed
5 years ago
Not able to crawling with Login credential
Closed
5 years ago
Comments count
2
Crawler not crawling website
Closed
5 years ago
Previous
Next