web-spider web-scraping reactjs reactjs-ui web-crawler react-crawler reactjs-crawler crawler-ui crawler-java spring-boot-crawler spring-boot-web-spider ibm-carbon ibm-components ibm-react-components custom-crawler spider-web web-scraper react-spider heroku-react

🕷FYCrawler UI🕸

Access on: https://fycrawler.herokuapp.com/

( Backend Side is private Repository 😉 ) , If you want to use backend as a docker image, you will be able to run as a container with this image.

Backend Full API Documentation Swagger Access on: https://spiderfy.herokuapp.com/swagger-ui.html

All frameworks and libraries are %100 open source ❤️.

Feature Count	Methodology	Description
#1	🕸️Crawl	To retrieve all links from given website.
#2	🕸️Crawl	To retrieve all images from given website.
#3	💬NLP	To analyze all words frequencies (NLP) from given website url.
#4	💬NLP	To analyze specific word frequencies (NLP) from given website url.
#5	🕸️Crawl	To retrieve all metatags from given website.
#6	🔖Sitemap - 🕸️Crawl	To retrieve all sitemap nodes from given website.
#7	🔖Sitemap - 🕸️Crawl	To retrieve all links from given sitemap url.
#8	🖼OCR	To convert base64 to text (OCR function) from given snapshoot.
#9	🗳RSS - 🕸️Crawl	To retrieve all RSS Feeds from given RSS url.
#10	📊Analysis - 📄PDF	To summary all javascript files usage (Content length, file size) from given website.
#11	🗳RSS - 🕸️Crawl	To retrieve Turkey RSS News Feeds from RSS list.
#12	📑Static File	To show 3500 user agent from static list.
#13	🖱 HTTP Request Manipulation	To insert randomly selected user agent on http request according to user agents static list
#14	🕸️Crawl	To obtain only text without any html tags from given website.
#15	🕸️Crawl	To obtain only all html source code from given website.
#16	🌐Selenium	To retrieve all links/link snapshoot on base64 format.
#17	📊Analysis - 🕸️Crawl	To obtain top 50 sites in Turkey from Alexa.
#18	📊Analysis - 🕸️Crawl	To obtain top 50 sites in Turkey from SimilarWeb.
#19	📊Analysis - 🕸️Crawl	To show rank site from Alexa.
#20	📊Analysis - 📄PDF	To generate pdf output from given website url.
#21	📊Analysis - 📄PDF	To generate pdf output from given any html.
#22	📊Analysis	To calculate page load time from given any url.
#23	🕸️Crawl	To retrieve internal backlinks from given any url.
#24	🕸️Crawl	To retrieve outgoing backlinks from given any url.
#25	🕸️Crawl	To generate summary data [Consist of: #1 ,#2, #5, #14, #15, #23, #24] from given website.
#26	📊Analysis -🕸️Crawl	To generate summary usage of html tags given in website.

⚠️ No data is stored.

About

🕷 Custom Web Spider | Frontend: ⚛️ React.js - Backend: ☕️ Java / NodeJS

https://fy-crawler.herokuapp.com/

web-spider web-scraping reactjs reactjs-ui web-crawler react-crawler reactjs-crawler crawler-ui crawler-java spring-boot-crawler spring-boot-web-spider ibm-carbon ibm-components ibm-react-components custom-crawler spider-web web-scraper react-spider heroku-react

Apache License 2.0

Languages

Language:JavaScript 93.7%Language:HTML 4.0%Language:CSS 2.3%