vtempest / bypasscors

Bypass CORS (Cross-Origin Resource Sharing) get HTML from external domains and make your own API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NPM

Bypass CORS restrictions on external domains from Node.js server, scraping any webpage data's as a HTML DOM to make your own APIs. Relative paths for resources will still load using target page's domain.

On the server, setup a route to which the client can pass the URL of the page to retrive:

app.get('/geturl', function(req,res){
    require('bypasscors')(req.query.url, function(html){
	    return res.send(html);
    });
});

On the frontend, you can use jQuery to parse the HTML as DOM :

$.get('/geturl', {url: "http://google.com"}, function(html){
	$(html).find("div")
})

Example: Live demo: http://hkrnews.com/

Local demo:

npm i bypasscors express
node node_modules/bypasscors/example

Virtual DOM and JS

This approach only returns the html and text returned at that URL, not the HTML DOM and text inserted after page load by AJAX requests or by single-page interface frameworks like React.js. To overcome this you can create a virtual DOM and JS execution environment by creating an invisible iframe then loading into its source the URL to your local-host-proxied scraper end point, then you can access the iframe DOMs contents (chrome treats both the iframe and your domain as same origin). If you need a JS DOM execution environment on the server-side you can use Ghost Driver which implements Selenium WebDriver methods executed in the environment of the PhantomJS Webkit engine.

<iframe id="dom-iframe" style="width:0;height:0;border:0; border:none;"></iframe>

document.getElementById('dom-iframe').src = '/get?url=' + url;

document.getElementById('dom-iframe').contentWindow.document.body.innerHTML;

About

Bypass CORS (Cross-Origin Resource Sharing) get HTML from external domains and make your own API


Languages

Language:JavaScript 100.0%