PyroFilmsFX / cdp-modify-response-example

Examples to modify response headers in Playwright or Puppeteer using Chrome Devtools Protocol (CDP)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Modifying Response Headers to Force File Downloads in Puppeteer & Playwright

What these examples do:

  1. 🔧 Creates a new Chrome-Devtools-Protocol (CDP) session in Puppeteer or Playwright.
  2. 🔨 Enable Fetch domain to let us substitute browser's network layer with our own code.
  3. 👀 Pause every request and check content-type header to match pdf and xml types.
  4. ⏩ If the content-type is not what we are looking for, resume the request without any change.
  5. 🎯 If the content-type is what we're looking for (pdf or xml), add a content-disposition: attachment response header to make the browser download the file instead of opening it in Chromium's built-in viewers.



A visual overview

cdp-modify-response-header

But why?

Response interception support in Puppeteer and Playwright is missing. There may be multiple scenarios where you need to modify either the response body or response headers for crawling or testing. As an example, you may want Chromium to download PDF and XML content-type responses instead of opening them in the built-in viewers in headful mode (in headless mode, the default behavior is to download the PDF file).

Default Behaviors

chromium-pdf-viewer
PDF response (content-type: application/pdf) is opened in Chromium



chromium-xml-viewer
XML response (content-type: text/xml) is also rendered in Chromium

What We Want

download
Make Chromium download the files. This can be done by adding a content-disposition: attachment header to the response.

Test Site

I've setup a test site with links to both a PDF file and an XML file.
https://pdf-xml-download-test.vercel.app/

pdf-xml-download-site

Usage

Install Dependencies

Using npm

$ npm install

Using yarn

$ yarn install

Run

Depending on whether you want to use Puppeteer or Playwright, run one of the following commands.

$ node puppeteer-example.js
$ node playwright-example.js

Notes

  • Codes for Puppeteer and Playwright are almost identical. They have subtle differences in creating a new CDP session, but all other code are pretty much the same.
  • Chromium is the only browser that will work with this example. Using Firefox or Webkit browsers will throw errors since they don't support CDP.
  • You can specify more specific patterns when enabling requestPaused events with Fetch.enable. For simplicity's sake, this example captures all requests at Response stage.
  • There may be cases where the response already has a content-disposition header. This example does not handle those cases. An easy way to handle those cases would be to simply replace the existing content-disposition: yariyada header with our new content-disposition: attachment header.

What does the intercepted object look like in Fetch.requestPaused?

In case you're confused what the passed object in requestPaused looks like, a log is attached below. The object contains both request and response information. The response body should be retrieved separately using Fetch.getResponseBody.

Code

 await client.on('Fetch.requestPaused', async (reqEvent) => { console.log(reqEvent); }

Console Output

{
  requestId: 'interception-job-17.0',
  request: {
    url: 'https://pdf-xml-download-test.vercel.app/api/file/pdf',
    method: 'GET',
    headers: {
      'sec-ch-ua': '"Chromium";v="85", "\\\\Not;A\\"Brand";v="99"',
      'sec-ch-ua-mobile': '?0',
      'Upgrade-Insecure-Requests': '1',
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4182.0 Safari/537.36',
      Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',        
      Referer: 'https://pdf-xml-download-test.vercel.app/'
    },
    initialPriority: 'VeryHigh',
    referrerPolicy: 'strict-origin-when-cross-origin'
  },
  frameId: 'CC3923414718B2309C50E21BCFB3DDF0',
  resourceType: 'Document',
  responseStatusCode: 200,
  responseHeaders: [
    { name: 'status', value: '200' },
    { name: 'content-type', value: 'application/pdf' },
    { name: 'x-nextjs-page', value: '/api/file/pdf' },
    { name: 'date', value: 'Mon, 03 Aug 2020 11:47:24 GMT' },
    {
      name: 'cache-control',
      value: 'public, max-age=0, must-revalidate'
    },
    { name: 'content-length', value: '516719' },
    { name: 'x-vercel-cache', value: 'MISS' },
    { name: 'age', value: '0' },
    { name: 'server', value: 'Vercel' },
    {
      name: 'x-vercel-id',
      value: 'cle1::sfo1::flf5q-1596455244871-be3d3dcbd2ec'
    },
    {
      name: 'strict-transport-security',
      value: 'max-age=63072000; includeSubDomains; preload'
    }
  ],
  networkId: 'CC3631EE0BC63C579EDF277C2CDEE85D'
}

About

Examples to modify response headers in Playwright or Puppeteer using Chrome Devtools Protocol (CDP)


Languages

Language:JavaScript 100.0%