DavidKorczynski / liburlparse

A C++ libaray which can parse more urls from text referenced by linkedin-URL-Detector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

liburlparse

Build Status

Description

This is a lib which can parse more urls from string text and the lib is overwritted by c++ and referenced by linkedin/URL-Detector

It is able to find and detect any urls such as:

It is also able to identify the parts of the identified urls. For example, for the url: http://user@linkedin.com:39000/hello?boo=ff#frag

  • Scheme - "http"
  • Username - "user"
  • Password - null
  • Host - "linkedin.com"
  • Port - 39000
  • Path - "/hello"
  • Query - "?boo=ff"
  • Fragment - "#frag"

Feature

  • The lib of Urlparse is only implemented by C++11
  • The lib of UrlParse Support the many of Mode
    • Default: like this is a url test@qq.com
    • Html : like <html><body>xxx@xxx.com</body></html>
    • Json : like {abc:123,ccc:"xhou@urlparse.com"}
    • Xml : like <a><b>Test</b></a>
    • JavaScript: like <script>var location="www.baidu.com";</script>
  • The lib of UrlParse is

Author

Contact with xhou

How to build

  • make && make main

Example

  • We Can Parse Url from Text
UrlDetectorOptions_T T(Default);
std::string str = "https://user:name@www.baidu.com:80/part.html?query=c+#part1";
UrlDetector detect(str, T);
std::list<Url> urls = detect.detect();
for (Url url : urls)
{
    std::cout << "url:" + url.getOriginalUrl() << std::endl;     //获取Url所有录 一般得到这个url全值即可
    std::cout << "scheme:" << url.getScheme() << std::endl;      //获取协议头
    std::cout << "username:" << url.getUsername() << std::endl;  //获取用户名
    std::cout << "password:" << url.getPassword() << std::endl;  //获取密码
    std::cout << "host:" << url.getHost() << std::endl;          //获取主机地址
    std::cout << "port:" << url.getPort() << std::endl;          //获取端口号
    std::cout << "path:" << url.getPath() << std::endl;          //获取路径
    std::cout << "query:" << url.getQuery() << std::endl;        //获取查询参数
    std::cout << "fragment:" << url.getFragment() << std::endl;  //获取fragment
}

生成单元测试覆盖度

    1. make 添加参数 -fprofile-arcs -ftest-coverage 生成 gcno文件
    1. 执行 生成 gcda文件
    1. lcov -d . -t 'unitmain' -o 'unitmain.info' -b . -c 生成 unitmain.info文件
    1. genhtml -o result hello_test.info
    1. python3 -m http.server 8080

About

A C++ libaray which can parse more urls from text referenced by linkedin-URL-Detector


Languages

Language:C++ 63.1%Language:Python 34.6%Language:CMake 1.2%Language:Makefile 1.0%Language:Shell 0.2%