tonibergholm / crudivore

Small service for rendering JS heavy sites for search engines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crudivore

Build Status

Crudivore is a small service for rendering JS heavy sites for search engines.

Greatly influenced by Prerender, Crudivore utilizes PhantomJS to render JavaScript applications into pure HTML for search engines, Facebook previews etc.

Usage

Install dependencies:

npm install

Start the server:

./index.js

REST API

/render/<URL> Render the requested page and return the resulting HTML

Example:
curl http://127.0.0.1:5000/render/https://www.google.com

/info/ Get a simple JSON object to monitor the status of Crudivore service

Example:
curl http://127.0.0.1:5000/info/
                                                                                                                    
      +-------+                         +------------------+                    +---------+           +---------+
      |Crawler|                         |Application server|                    |Crudivore|           |PhantomJS|
      +---+---+                         +--------+---------+                    +---+-----+           +---------+
          |                                      |                                  |                      |
          | GET                                  |                                  |                      |
          | app.com/?_escaped_fragment_=!/stuff  |                                  |                      |
          +------------------------------------> |                                  |                      |
          |                                      |                                  |                      |
          |                                      | GET                              |                      |
          |                                      | /render/http://app.com|#!/stuff  |                      |
          |                                      +--------------------------------> |                      |
          |                                      |                                  |                      |
          |                                      |                                  | GET                  |
          |                                      |                                  | app.com/#!/stuff     |
          |                                      |                                  +--------------------> |
          |                                      |                                  |                      |
          |                                      |                                  |              SUCCESS |
          |                                      |                                  | <--------------------+
          |                                      |                                  |                      |
          |                                      |                                  | Is the page ready?   |
          |                                      |                                  +--------------------> |
          |                                      |                                  |                      |
          |                                      |                                  |                Ready |
          |                                      |                                  | <--------------------+
          |                                      |                                  |                      |
          |                                      |                                  | Page content, please |
          |                                      |                                  +--------------------> |
          |                                      |                                  |                      |
          |                                      |                                  |            Page HTML |
          |                                      |                                  | <--------------------+
          |                                      |                                  |
          |                                      |                        Page HTML |
          |                                      | <--------------------------------+
          |                                      |
          |                            Page HTML |
          |  <-----------------------------------+
          +

Communicating from the page being rendered to Crudivore

Crudivore works on existing pages without any modifications. The page may want to pass information to Crudivore and may do so with the global variable window.crudivore:

window.crudivore = {
    pageReady: <boolean>,
    status: <int>,
    headers: {
        "<headername>": "<headercontent>"
    }
}

window.crudivore.pageReady tells Crudivore that the page has completed loading and rendering. This is not mandatory: if this parameter is never set, Crudivore waits until the timeout.

window.crudivore.status sets the Crudivore response HTTP status code. Default response code is 200. For example, many single page apps contain a catch-all route that displays a soft 404. Added window.crudivore.status = 404 causes Crudivore to turn that into a hard 404, which is better for search enginges.

window.crudivore.headers allows setting custom headers from frontend code. An example would be a redirect:

window.crudivore = {
    pageReady: true,
    status: 302,
    headers: {
        "Location": "http://mysite.com/newurl"
    }
}

Configuration

Crudivore can be configured with environment variables:

Example:

CRUDIVORE_TIMEOUT=5000 ./index.js

CRUDIVORE_TIMEOUT defines timeout for each request, defaults to 10s

CRUDIVORE_POLL_INTERVAL defines the frequency (milliseconds) in which PhantomJS checks if the page is fully rendered. Default value 50.

CRUDIVORE_PHANTOM_PORT_START defines the first port of the port range used by PhantomJS instances. Default value 10000.

CRUDIVORE_PHANTOM_PORT_END defines the last port of the port range used by PhantomJS instances. Default value 10100.

CRUDIVORE_INITIAL_THREAD_COUNT defines the number of PhantomJS threads spawned and warmed up when the service is started. Default value 1.

Test

Run all tests:

./runtests.sh

About

Small service for rendering JS heavy sites for search engines

License:MIT License


Languages

Language:JavaScript 91.4%Language:HTML 7.9%Language:Shell 0.7%