jorovalkov / url-regex

A Regular Expression for catching URLs and extracting fragments out of them

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

URL-RegEx

v.1.3

A Regular Expression for catching URLs and extracting fragments out of them.

RegEx = /\(?(?:(http|https|ftp):\/\/)?(?:((?:[^\W\s]|\.|-|[:]{1})+)@{1})?((?:www.)?(?:[^\W\s]|\.(?!\.)|-)+[\.](?!\.)[^\W\s]{2,4}|localhost(?=\/)|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?::(\d*))?([\/]?[^\s\?]*[\/]{1})*(?:\/?([^\s\n\?\[\]\{\}\#]*(?:(?=\.)){1}|[^\s\n\?\[\]\{\}\.\#]*)?([\.]{1}[^\s\?\#]*)?)?(?:\?{1}([^\s\n\#\[\]]*))?([\#][^\s\n]*)?\)?/;

Match this RegEx against a chunk of text, and catch any URL inside it. It should work on any programming language that supports Regular Expressions.

You can check JavaScript example of its use.

URL fragments

It captures 8 groups (plus the whole match that contains entire URL). If some of them don’t exist in URL, that group will return empty.

  • $& - Entire URL - url being parsed
  • $1 - Protocol - http, https, ftp
  • $2 - Userinfo - username:password
  • $3 - Domain - www.mydomain.com, mydomain.com, 127.0.0.1, localhost...
  • $4 - Port - 80
  • $5 - Path / Folders - /folder/dir/
  • $6 - Page / Filename - eg. index
  • $7 - File extension - .html, .php...
  • $8 - Query - item=value&item2=value2
  • $9 - Anchor - #home

About

A Regular Expression for catching URLs and extracting fragments out of them