net::ERR_ABORTED during headless testing

Question

net::ERR_ABORTED during headless testing

maximkoshelenko opened this issue 6 years ago · comments

Hello team. Need help with issue which only reproduce in headless mode. I have a test which is checked 200 and 206 result after clicking the footer links of site. But after adding PDF file as a footer link test crashed in headless mode (in debug mode it is working). Please help solve this problem :).

Steps to reproduce

Tell us about your environment:

Puppeteer version:
Platform / OS version: Windows
URLs (if applicable): http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf
Node.js version:
npm -v : 5.6.0
What steps will reproduce the problem?

Please include code that reproduces the issue.

  await page.goto('http://www.cancernetwork.com/', { waitUntil: "domcontentloaded" });
  await page.waitForSelector('.expanded .menu');
  let footerLinks = await page.evaluate(
    () => Array.from(document.body.querySelectorAll('.expanded a[href]'), ({ href }) => href)
  );
  for (var r = 0; r < footerLinks.length; r++) {
    let  [response] = await Promise.all([
        page.waitForNavigation(),
        await page.goto(footerLinks[r], { waitUntil: "load" }),
      ]);

    if (response._status == 206) {
      expect(response._status).toBe(206);
    } else {
      expect(response._status).toBe(200);
    }

    response = '';
    console.log('Footer link: ' + footerLinks[r] + ' was checked successfully');
  }

What is the expected result?
Going to http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf page with 200 or 206 status
What happens instead?
Without headless mode code is working, but in headless there is an error

net::ERR_ABORTED at http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf
at navigate (node_modules/puppeteer/lib/Page.js:592:37)

I have tried execute code at https://try-puppeteer.appspot.com/ site
Code:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf');
await browser.close();

Result:
Error running your code. Error: net::ERR_ABORTED at http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf

Andrey Lushnikov · Answer 1 · Wed Jun 27 2018 09:21:52 GMT+0800 (China Standard Time)

Hi @maximkoshelenko,

Contrary to Headful Chrome, Chrome Headless doesn't know how to navigate to PDFs. I think it'll issue a download instead, but downloads are not supported yet: #299

maximkoshelenko · Answer 2 · Wed Jun 27 2018 13:39:57 GMT+0800 (China Standard Time)

Thanks. Hope downloads will be supported ASAP :)

Souf G · Answer 3 · Thu Feb 21 2019 19:09:41 GMT+0800 (China Standard Time)

@aslushnikov are there any way to know that the net::ERR_ABORTED is issued because of a download when we catch the issue?

We could use request interception and to look at content type, but ideally I want to be able to handle it in the catch

matej2 · Answer 4 · Tue Dec 17 2019 05:02:31 GMT+0800 (China Standard Time)

Why is this closed? My issue occurs when redirecting...

Robert Riemann · Answer 5 · Fri Jan 17 2020 22:03:48 GMT+0800 (China Standard Time)

How can this error due to a pdf file be distinguished from other errors? I also hit this problem and could try-catch it, but I am afraid I will oversee other errors unrelated to pdfs.

Dean Gurvitz · Answer 6 · Thu Aug 06 2020 16:09:09 GMT+0800 (China Standard Time)

@gsouf Can you expand on how to implement that? The content type isn't known until there is a response, and the resourceType() for pdf pages is simply "document"

Souf G · Answer 7 · Thu Aug 06 2020 16:18:46 GMT+0800 (China Standard Time)

@deansg I'm doing something like this. I cannot share more because the rest is part of a more complex thing but this is the gist of it:

const page = openSomePuppeteerPage();

async function pageOnResponseRequest(response) {
  if (response.frame() === page.mainFrame() && response.request().isNavigationRequest()) {
    const statusCode = response.status();
    const headers = response.headers();

    // At this point you have access to status code and headers which you can use to detect that it's an html document, an image, a downloadable document, etc...

  }
}

page.on('response', pageOnResponseRequest);

Dean Gurvitz · Answer 8 · Tue Aug 18 2020 22:27:37 GMT+0800 (China Standard Time)

@aslushnikov Do you know which content types experience this behaviour? So that I can use gsouf's idea to filter them out

Souf G · Answer 9 · Tue Aug 18 2020 22:30:33 GMT+0800 (China Standard Time)

@deansg everything that is a download. I would say everything that is not a displayable text (html, maybe xml and json?) and images. Maybe video and audio too? Havent tested all of those

Dean Gurvitz · Answer 10 · Tue Aug 18 2020 22:39:33 GMT+0800 (China Standard Time)

@gsouf My problem is that I faced websites that return javascript content type, and then continue loading until proper HTML is loaded. If I intercept the first response and immediately decide that the website isn't relevant because that content type isn't HTML bases, then I lose valuable information (I need only extract content only from websites that return html). I'm not sure whether it's better to build a blacklist of content types, or a whitelist.

Souf G · Answer 11 · Tue Aug 18 2020 22:42:14 GMT+0800 (China Standard Time)

@deansg the solution I proposed filters only navigation requests for the main frame. What you want to do is to process the request only if content type (from response headers) is html

Souf G · Answer 12 · Tue Aug 18 2020 22:44:25 GMT+0800 (China Standard Time)

@deansg I have never seen a website returning javascript content type and running properly. The browser wont process the javascript. I will just display it on screen as simple text

Dean Gurvitz · Answer 13 · Tue Aug 18 2020 22:46:05 GMT+0800 (China Standard Time)

@gsouf when I try to navigate to the following website:
https://atelierhaussmann.de/en/
The 'response' event is called several times, and in several of the cases the content type is application/javascript

Souf G · Answer 14 · Tue Aug 18 2020 22:50:44 GMT+0800 (China Standard Time)

@deansg you'll have to figure out what's wrong because it does not occur for me. Even with your website, I confirm it has content-type text/html

Pepsiamir · Answer 15 · Sat Dec 19 2020 23:17:47 GMT+0800 (China Standard Time)

I get the same error when I want to get the redirect chain of an url. any solution to exit the process after fetching the data?

Error: net::ERR_ABORTED at https://www.example.com/vip-dl/?filename=23309907.rar
at navigate (C:\Users\noora\AppData\Roaming\npm\node_modules\puppeteer\lib\FrameManager.js:120:37)

Ravindra Vairagi · Answer 16 · Wed Apr 21 2021 18:20:09 GMT+0800 (China Standard Time)

I get the same error when I want to get the redirect chain of an url. any solution to exit the process after fetching the data?

Error: net::ERR_ABORTED at https://www.example.com/vip-dl/?filename=23309907.rar
at navigate (C:\Users\noora\AppData\Roaming\npm\node_modules\puppeteer\lib\FrameManager.js:120:37)

A similar error I am also facing, Any help appreciated.

Arnas Pecelis · Answer 17 · Wed Apr 28 2021 18:04:11 GMT+0800 (China Standard Time)

going over the same issue for the past few days, was working fine before, nothing has changed...

chinmaey · Answer 18 · Mon Aug 16 2021 17:37:53 GMT+0800 (China Standard Time)

I need to test the download speed for video download. I face same issue.

Rbrandao7 · Answer 19 · Wed Dec 01 2021 23:59:38 GMT+0800 (China Standard Time)

That issue is related to a bad connection? I am face same issue, sometimes this error appear and others no.

John Haugeland · Answer 20 · Thu Jun 30 2022 09:27:31 GMT+0800 (China Standard Time)

In my case, the reason for this was that I was trying to run several jobs in parallel with promises, and I hadn't noticed that they were all using the same page object.

Mohammed Shah · Answer 21 · Mon Dec 19 2022 16:30:06 GMT+0800 (China Standard Time)

Another reason you may be experiencing this error consistently is that your system is telling chrome to use jemalloc.

Turn off system-wide jemalloc to get rid of this error.

See: #8246 (comment)

白一梓 · Answer 22 · Wed May 15 2024 19:16:55 GMT+0800 (China Standard Time)

Another reason you may be experiencing this error consistently is that your system is telling chrome to use jemalloc.

Turn off system-wide jemalloc to get rid of this error.

See: #8246 (comment)

I use win10, when I called setCookie first, it will have the same error.
When I removed the code of cookie, it run OK.
I have captured package when request abort with wireshark:

The ip starts with 10.254 is my win10 pc.