A puppeter-like Node.js library for interacting with Headless production scenarios.
Why
Although you can think puppeteer could be enough, there is a set of use cases that make sense built on top of puppeteer and they are necessary to support into robust production scenario, like:
- Sensible good defaults, aborting unnecessary requests based of what you are doing (e.g, aborting image request if you just want to get
.html
content). - Privacy by default, blocking tracker requests.
- Easily create a pool of instance (via
@browserless/pool
). - Built-in AdBlocker (soon).
Install
browserless is built on top of puppeteer, so you need to install it as well.
$ npm install puppeteer browserless --save
You can use browserless together with puppeteer
, puppeteer-core
or puppeteer-firefox
.
Internally, the library is divided into different packages based on the functionality
Usage
The browserless API is like puppeteer, but doing more things under the hood (not too much, I promise).
For example, if you want to take an screenshot
, just do:
const browserless = require('browserless')()
browserless
.screenshot('http://example.com', { device: 'iPhone 6' })
.then(buffer => {
console.log(`your screenshot is here!`)
})
You can see more common recipes at @browserless/examples
.
API
All methods follow the same interface:
url
(required): The target URLoptions
: Specific settings for the method (optional).
The methods returns a Promise or a Node.js callback if pass an additional function as the last parameter.
.constructor(options)
It creates the browser
instance, using puppeter.launch method.
// Creating a simple instance
const browserless = require('browserless')()
or passing specific launchers options:
// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
ignoreHTTPSErrors: true,
args: [
'--disable-gpu',
'--single-process',
'--no-zygote',
'--no-sandbox',
'--hide-scrollbars'
]
})
options
By default the library will be pass a well known list of flags, so probably you don't need any additional setup.
timeout
type:number
default: 30000
This setting will change the default maximum navigation time.
puppeteer
type:Puppeteer
default: puppeteer
|puppeteer-core
|puppeteer-firefox
It's automatically detected based on your dependencies
being supported puppeteer, puppeteer-core or puppeteer-firefox.
Alternatively, you can pass it.
incognito
type:boolean
default: false
Every time a new page is created, it will be an incognito page.
An incognito page will not share cookies/cache with other browser pages.
.html(url, options)
It returns the full HTML content from the target url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const html = await browserless.html(url)
console.log(html)
})()
options
See page.goto.
Additionally, you can setup:
waitFor
type:string
|function
|number
default: 0
Wait a quantity of time, selector or function using page.waitFor.
waitUntil
type:array
default: ['networkidle0']
Specify a list of events until consider navigation succeeded, using page.waitForNavigation.
userAgent
It will setup a custom user agent, using page.setUserAgent method.
viewport
It will setup a custom viewport, using page.setViewport method.
abortTypes
type: array
default: ['image', 'media', 'stylesheet', 'font', 'xhr']
A list of resourceType
requests that can be aborted in order to make the process faster.
abortTrackers
type: boolean
default: true
It will be abort request coming for tracking domains.
.text(url, options)
It returns the full text content from the target url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const text = await browserless.text(url)
console.log(text)
})()
options
They are the same than .html
method.
.pdf(url, options)
It generates the PDF version of a website behind an url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const buffer = await browserless.pdf(url)
console.log(`PDF generated!`)
})()
options
See page.pdf.
Additionally, you can setup:
media
Changes the CSS media type of the page using page.emulateMedia.
device
It generate the PDF using the device descriptor name settings, like userAgent
and viewport
.
userAgent
It will setup a custom user agent, using page.setUserAgent method.
viewport
It will setup a custom viewport, using page.setViewport method.
.screenshot(url, options)
It takes a screenshot from the target url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const buffer = await browserless.screenshot(url)
console.log(`Screenshot taken!`)
})()
options
See page.screenshot.
Additionally, you can setup:
The options
provided are passed to page.pdf.
Additionally, you can setup:
device
It generate the PDF using the device descriptor name settings, like userAgent
and viewport
.
userAgent
It will setup a custom user agent, using page.setUserAgent method.
viewport
It will setup a custom viewport, using page.setViewport method.
.devices
List of all available devices preconfigured with deviceName
, viewport
and userAgent
settings.
These devices are used for emulation purposes.
.getDevice(deviceName)
Get a specific device descriptor settings by descriptor name.
const browserless = require('browserless')
browserless.getDevice('Macbook Pro 15')
// {
// name: 'Macbook Pro 15',
// userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X …',
// viewport: {
// width: 1440,
// height: 900,
// deviceScaleFactor: 1,
// isMobile: false,
// hasTouch: false,
// isLandscape: false
// }
// }
Advanced
The following methods are exposed to be used in scenarios where you need more granularity control and less magic.
.browser
It returns the internal browser instance used as singleton.
const browserless = require('browserless')
;(async () => {
const browserInstance = await browserless.browser
})()
.evaluate(page, response)
It exposes an interface for creating your own evaluate function, passing you the page
and response
.
const browserless = require('browserless')()
const getUrlInfo = browserless.evaluate((page, response) => ({
statusCode: response.status(),
url: response.url(),
redirectUrls: response.request().redirectChain()
}))
;(async () => {
const url = 'https://example.com'
const info = await getUrlInfo(url)
console.log(info)
// {
// "statusCode": 200,
// "url": "https://example.com/",
// "redirectUrls": []
// }
})()
Note you don't need to close the page; It will be done under the hood.
Internally the method performs a .goto.
.goto(page, options)
It performs a smart page.goto, blocking ads trackers) requests and other requests based on resourceType
.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
await browserless.goto(page, {
url: 'http://savevideo.me',
abortTypes: ['image', 'media', 'stylesheet', 'font']
})
})()
options
url
type: string
The target URL
abortTypes
type: string
default: []
A list of req.resourceType()
to be blocked.
abortTrackers
type: boolean
default: true
It will be abort request coming for tracking domains.
abortTrackers
type: boolean
default: true
It will be abort request coming for tracking domains.
waitFor
type:string|function|number
default: 0
Wait a quantity of time, selector or function using page.waitFor.
waitUntil
type:array
default: ['networkidle2', 'load', 'domcontentloaded']
Specify a list of events until consider navigation succeeded, using page.waitForNavigation.
userAgent
It will setup a custom user agent, using page.setUserAgent method.
viewport
It will setup a custom viewport, using page.setViewport method.
args
type: object
The settings to be passed to page.goto.
.page()
It returns a standalone browser new page.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
})()
Pool of Instances
browserless uses internally a singleton browser instance.
You can use a pool instances using @browserless/pool
package.
const createBrowserless = require('@browserless/pool')
const browserlessPool = createBrowserless({
poolOpts: {
max: 15,
min: 2
}
})
The API is the same than browserless
. now the constructor is accepting an extra option called poolOpts
.
This setting is used for initializing the pool properly. You can see what you can specify there at node-pool#opts.
Also, you can interact with a standalone browserless
instance of your pool.
const createBrowserless = require('browserless')
const browserlessPool = createBrowserless.pool()
// get a browserless instance from the pool
browserlessPool(async browserless => {
// get a page from the browser instance
const page = await browserless.page()
await browserless.goto(page, { url: url.toString() })
const html = await page.content()
console.log(html)
process.exit()
})
You don't need to think about the acquire/release step: It's done automagically ✨.
Packages
browserless is internally divided into multiple packages for ensuring just use the mininum quantity of code necessary for your user case.
Package | Version | Dependencies |
---|---|---|
browserless |
||
@browserless/pool |
||
@browserless/devices |
||
@browserless/goto |
||
@browserless/benchmark |
||
@browserless/examples |
Benchmark
For testing different approach, we included a tiny benchmark tool called @browserless/benchmark
.
FAQ
Q: Why use browserless over Puppeteer?
browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.
Q: Why do you block ads scripts by default?
Headless navigation is expensive compared with just fetch the content from a website.
In order to speed up the process, we block ads scripts by default because they are so bloat.
Q: My output is different from the expected
Probably browserless was too smart and it blocked a request that you need.
You can active debug mode using DEBUG=browserless
environment variable in order to see what is happening behind the code:
DEBUG=browserless node index.js
Consider open an issue with the debug trace.
Q: Can I use browserless with my AWS Lambda like project?
Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.
License
browserless © Kiko Beats, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.
logo designed by xinh studio.
kikobeats.com · GitHub Kiko Beats · Twitter @kikobeats