cedricblondeau / webpage2html-java

:page_with_curl: Generates a single HTML file for a given URL by transforming external assets (css, js, images, fonts) into inline base64 strings. Java 1.7 and Android compatible.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

webpage2html-java

Generates a single HTML file for a given URL by transforming external assets (css, js, images, fonts) into inline content and by encoding them with base64 if necessary.

Initially a Java port of zTrix/webpage2html.

Java 1.7 and Android compatible.

Known limitations

Dependencies

Usage

// Build a WebPage2Html object from a java.net.URL object
URL url = new URL("http://rtw.cedricblondeau.com"); // Input URL, throws MalformedURLException
WebPage2Html webPage2Html = new WebPage2Html(url);

// Optionally: Pass a custom configuration object
Configuration configuration = new Configuration();
configuration.setUserAgent("Android"); // Custom user-agent
webPage2Html.setConfiguration(configuration);

// execute() method returns a WebPage2HtmlResult object
WebPage2HtmlResult webPage2HtmlResult = webPage2Html.execute(); // throws IOException
webPage2HtmlResult.getUrl();    // Actual URL, could be different from input URL (e.g. redirection)
webPage2HtmlResult.getTitle();  // HTML document title
webPage2HtmlResult.getHtml();   // Transformed HTML content

CLI usage using Gradle

./gradlew run -Dexec.args="http://rtw.cedricblondeau.com out.html"

About

:page_with_curl: Generates a single HTML file for a given URL by transforming external assets (css, js, images, fonts) into inline base64 strings. Java 1.7 and Android compatible.

License:MIT License


Languages

Language:Java 100.0%