url-crawler webpage-crawler page-crawler python-crawler python-webcrawler

Overview

url_crawler is a Python library to crawl the details of a URL.

Package Installer

pip install url-crawler==1.0.0

Usage

from url_crawler import url_crawler
'''
  url -> string URL to crawl for information.
'''
url_details = url_crawler(url)

print(url_details.url)
print(url_details.domain)
print(url_details.check_https)
print(url_details.dot_count)
print(url_details.digit_count)
print(url_details.url_length)

Utilities

Name	Output	Description
url	str	Returns the string url.
domain	str	Returns the domain of the url.
registrar	str	Returns the registrar for the given URL.
registered_country	str	Returns the registered domain country of the given URL.
whois	dict	Returns the whois information of the given URL.
registration_date	int	Returns the number of days since registration of the given URL.
expiry_date	int	Returns the number of days to expiration of the given URL.
intended_lifespan	int	Returns the number of days of intended lifespan of the given URL.
dot_count	int	Returns the dot(.) count in the given URL.
digit_count	int	Returns the digit count in the given URL.
url_length	int	Returns the length of the given URL.
fragments_count	int	Returns the fragment counts in the given URL.
entropy	int	Returns the entropy of the given URL.
check_http	bool	Checks for http headers in the given URL.
check_http	bool	Checks for https headers in the given URL.
url_response	bool	Checks for the URL response.
check_encoding	bool	Checks for encoding in in the given URL.
check_client	bool	Checks for client keyword in the given URL.
check_admin	bool	Checks for admin keyword in the given URL.
check_server	bool	Checks for server keyword in the given URL.
check_login	bool	Checks for login keyword in the given URL.
check_ports	bool	Checks for any ports in the given URL.

Requirements

The requirements.txt file has details of all Python libraries for this package, and can be installed using

pip install -r requirements.txt

Organization

├── src
│   ├── url_crawler
          ├── init             <- init
          ├── url_crawler      <- package source code for URL crawler
├── setup.py             <- setup file 
├── LICENSE              <- LICENSE
├── README.md            <- README
├── CONTRIBUTING.md      <- contribution
├── test.py              <- test cases for unit testing
├── requirements.txt     <- requirements file for reproducing the code package

License

MIT

Contributions

For steps on code contribution, please see CONTRIBUTING.

About

A Python library to crawl the details of a URL.

https://pypi.org/project/url-crawler/1.0.0/

url-crawler webpage-crawler page-crawler python-crawler python-webcrawler

MIT License

Languages

Language:Python 100.0%