EstebanMqz/Web-scraper

data-mining html beautifulsoup metadata http-get front-end web-scrapper

Formatted / indexed web-scrapper

Web-scrapper tool for metadata extraction purposes using HTTP GET requests.
For complete attribute structural inspections inherent in code's granularity.

Technique different than those provided by web-development tools:

View-source (ctrl+U) → HTML View-source prefix.
Raw code inspection for search-engines.
Inspect Element (ctrl+shift+I) → Attributes inspection.

Web-development View-source Save as: Complete HTML, Single HTML, HTML only traditional methods generally provide unreliable or incomplete information from websites, particularly if they are using dynamic and client-sided scripts.

Usage:

.sh

Terminal

$ ./html-extractor.sh Enter a URL: https://estebanmqz.github.io/EstebanMqz/html/Resume.html Do you want to extract the raw code to a temporary file? (Y/N): Y Enter a filename to save the raw code: Resume Raw code extracted to Resume.html opening..

About

HTML HTTP GET requests for dynamic/client-sided web-scrapping purposes other than traditional static caching protocols.

data-mining html beautifulsoup metadata http-get front-end web-scrapper

Creative Commons Zero v1.0 Universal

Languages

Language:HTML 100.0%Language:Shell 0.0%

EstebanMqz / Web-scraper

Formatted / indexed web-scrapper

See also:

About

Languages