Bishwas-py / regxon

RegXon is a powerful validator, sanitizer and content parser that you're searching for decades.

Home Page:https://pypi.org/project/regxon/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RegXon

RegXon is a powerful validator, sanitizer and content parser that you're searching for decades.

Installation

pip install regxon

Usage

from regxon.common import Regxon
regxon = Regxon()

General Validation

General validation includes email, domain, url and ipv4.

Validate Email

from regxon.common import Regxon

regxon = Regxon()
regxon.is_email('xyz@.com')  # None
regxon.is_email('xyz@cpx.com')  # returns a proper Match object; you can grab the match with `.string`

Validate Domain

from regxon.common import Regxon

regxon = Regxon()
regxon.is_domain('xyzcom')  # None
regxon.is_domain('xyz.com')  # returns a proper Match object; you can grab the match with `.string`

Validate URL

from regxon.common import Regxon

regxon = Regxon()
regxon.is_url('xyz.com')  # None
regxon.is_url('https://xyz.com')  # returns a proper Match object

Validate HTTP URL

from regxon.common import Regxon

regxon = Regxon()
regxon.is_http_url('xyz.com')  # None; returns None if the url is not http
regxon.is_http_url('ftp://xyz.com')  # None; returns None if the url is not http
regxon.is_http_url('http://django.c') # None; returns None because `.c` is not a valid domain 
regxon.is_http_url('https://xyz.com')  # returns a proper Match object; you can grab the match with `.string`

Validate IP

from regxon.common import Regxon

regxon = Regxon()

# 1, 2 both are same and return a proper Match, as default schema is ""
regxon.is_ipv4('127.0.0.1')                 # 1
regxon.is_ipv4('127.0.0.1', schema='')      # 2; matches because 127.0.0.1 has no schema

regxon.is_ipv4('http://127.0.0.1')  # returns None as schema is not matched; "http" != ""
regxon.is_ipv4('http://127.0.0.1', schema='')  # returns None as schema is not matched; "http" != ""

regxon.is_ipv4('http://127.0.0.1', schema='http://')  # returns a proper Match
regxon.is_ipv4('https://127.0.0.1', schema='http://')  # returns None as schema is not matched; "https" != "http"

regxon.is_ipv6('2001:db8:3333:4444:5555:6666:7777:8888') # validates the ipv6 

Validate Phone Number

from regxon.common import Regxon

regxon = Regxon()
regxon.is_phone('+91 1234567890')  # returns a proper Match object; you can grab the match with `.string`

HTML Sanitization and Validation

RegXon provides a powerful HTML sanitizer and validator that you're searching for decades. It's a combination of html5lib and beautifulsoup4.

You "how to remove an attribute from HTML tag" problem is solved now. Or another problem of "how to remove a tag from HTML" is also solved.

from regxon.html import RegxonHTML

regxon_html = RegxonHTML()
html_content = """
<img onload="alert(1)" onerror="hey" src="http://example.com" />
<script>alert(1)</script>
"""
html = regxon_html.get_sanitized_content(html_content)

print(html)

The above code will print the following output

<img onerror="hey"/>

Add custom excluded attributes for any tag you want

from regxon.html import RegxonHTML

regxon_html = RegxonHTML()
html_content = """
<img onload="alert(1)" onerror="hey" src="http://example.com" />
<script>alert(1)</script>
"""

# Add custom excluded attributes for any tag you want
regxon_html.excluded_attributes.update({
    'img': regxon_html.excluded_attributes['img'] + ['onerror'],
})

The above code will print the following output

<img/>

Purpose of RegXon

  • Sanitize HTML; remove unwanted tags and attributes; XSS prevention
  • Validate IP, URL, Domain; SSRF prevention
  • Validate Email; Email spoofing prevention
  • Validate Phone Number; Phone number spoofing prevention

License

MIT

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Authors

Acknowledgements

About

RegXon is a powerful validator, sanitizer and content parser that you're searching for decades.

https://pypi.org/project/regxon/

License:MIT License


Languages

Language:Python 100.0%