puppeteer / puppeteer

Node.js API for Chrome

Home Page:https://pptr.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

net::ERR_ABORTED during headless testing

maximkoshelenko opened this issue · comments

Hello team. Need help with issue which only reproduce in headless mode. I have a test which is checked 200 and 206 result after clicking the footer links of site. But after adding PDF file as a footer link test crashed in headless mode (in debug mode it is working). Please help solve this problem :).

Steps to reproduce

Tell us about your environment:

Please include code that reproduces the issue.

  await page.goto('http://www.cancernetwork.com/', { waitUntil: "domcontentloaded" });
  await page.waitForSelector('.expanded .menu');
  let footerLinks = await page.evaluate(
    () => Array.from(document.body.querySelectorAll('.expanded a[href]'), ({ href }) => href)
  );
  for (var r = 0; r < footerLinks.length; r++) {
    let  [response] = await Promise.all([
        page.waitForNavigation(),
        await page.goto(footerLinks[r], { waitUntil: "load" }),
      ]);

    if (response._status == 206) {
      expect(response._status).toBe(206);
    } else {
      expect(response._status).toBe(200);
    }

    response = '';
    console.log('Footer link: ' + footerLinks[r] + ' was checked successfully');
  }

What is the expected result?
Going to http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf page with 200 or 206 status
What happens instead?
Without headless mode code is working, but in headless there is an error

net::ERR_ABORTED at http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf
at navigate (node_modules/puppeteer/lib/Page.js:592:37)

I have tried execute code at https://try-puppeteer.appspot.com/ site
Code:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf');
await browser.close();

Result:
Error running your code. Error: net::ERR_ABORTED at http://marketing.advanstar.info/mediakits/TC_MK_2016.pdf

Hi @maximkoshelenko,

Contrary to Headful Chrome, Chrome Headless doesn't know how to navigate to PDFs. I think it'll issue a download instead, but downloads are not supported yet: #299

Thanks. Hope downloads will be supported ASAP :)

@aslushnikov are there any way to know that the net::ERR_ABORTED is issued because of a download when we catch the issue?

We could use request interception and to look at content type, but ideally I want to be able to handle it in the catch

Why is this closed? My issue occurs when redirecting...

How can this error due to a pdf file be distinguished from other errors? I also hit this problem and could try-catch it, but I am afraid I will oversee other errors unrelated to pdfs.

@gsouf Can you expand on how to implement that? The content type isn't known until there is a response, and the resourceType() for pdf pages is simply "document"

@deansg I'm doing something like this. I cannot share more because the rest is part of a more complex thing but this is the gist of it:

const page = openSomePuppeteerPage();

async function pageOnResponseRequest(response) {
  if (response.frame() === page.mainFrame() && response.request().isNavigationRequest()) {
    const statusCode = response.status();
    const headers = response.headers();

    // At this point you have access to status code and headers which you can use to detect that it's an html document, an image, a downloadable document, etc...

  }
}

page.on('response', pageOnResponseRequest);

@aslushnikov Do you know which content types experience this behaviour? So that I can use gsouf's idea to filter them out

@deansg everything that is a download. I would say everything that is not a displayable text (html, maybe xml and json?) and images. Maybe video and audio too? Havent tested all of those

@gsouf My problem is that I faced websites that return javascript content type, and then continue loading until proper HTML is loaded. If I intercept the first response and immediately decide that the website isn't relevant because that content type isn't HTML bases, then I lose valuable information (I need only extract content only from websites that return html). I'm not sure whether it's better to build a blacklist of content types, or a whitelist.

@deansg the solution I proposed filters only navigation requests for the main frame. What you want to do is to process the request only if content type (from response headers) is html

@deansg I have never seen a website returning javascript content type and running properly. The browser wont process the javascript. I will just display it on screen as simple text

@gsouf when I try to navigate to the following website:
https://atelierhaussmann.de/en/
The 'response' event is called several times, and in several of the cases the content type is application/javascript

@deansg you'll have to figure out what's wrong because it does not occur for me. Even with your website, I confirm it has content-type text/html

I get the same error when I want to get the redirect chain of an url. any solution to exit the process after fetching the data?

Error: net::ERR_ABORTED at https://www.example.com/vip-dl/?filename=23309907.rar
at navigate (C:\Users\noora\AppData\Roaming\npm\node_modules\puppeteer\lib\FrameManager.js:120:37)

I get the same error when I want to get the redirect chain of an url. any solution to exit the process after fetching the data?

Error: net::ERR_ABORTED at https://www.example.com/vip-dl/?filename=23309907.rar
at navigate (C:\Users\noora\AppData\Roaming\npm\node_modules\puppeteer\lib\FrameManager.js:120:37)

A similar error I am also facing, Any help appreciated.

going over the same issue for the past few days, was working fine before, nothing has changed...

I need to test the download speed for video download. I face same issue.

That issue is related to a bad connection? I am face same issue, sometimes this error appear and others no.

In my case, the reason for this was that I was trying to run several jobs in parallel with promises, and I hadn't noticed that they were all using the same page object.

Another reason you may be experiencing this error consistently is that your system is telling chrome to use jemalloc.

Turn off system-wide jemalloc to get rid of this error.

See: #8246 (comment)

Another reason you may be experiencing this error consistently is that your system is telling chrome to use jemalloc.

Turn off system-wide jemalloc to get rid of this error.

See: #8246 (comment)

I use win10, when I called setCookie first, it will have the same error.
When I removed the code of cookie, it run OK.
I have captured package when request abort with wireshark:
2024-05-16 145414
The ip starts with 10.254 is my win10 pc.