Cuadrix / puppeteer-page-proxy

Additional module to use with 'puppeteer' for setting proxies per page basis.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

puppeteer-page-proxy doesn't work at all

jpgklassen opened this issue · comments

I followed the example code and it doesn't work. Requests are sent from my own IP address, not the IP address of the proxy.

My code:

const puppeteer = require('puppeteer')
const useProxy = require('puppeteer-page-proxy')

let proxy = 'http://host:port'
let url = 'https://example.com'

global.browser = await puppeteer.launch({headless: true})
global.page = await global.browser.newPage()
await Promise.all([
  global.page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'),
  global.page.setViewport({ width: 1366, height: 768 }),
  global.page.setRequestInterception(true)
])
global.page.on('request', req => {
  if (req.resourceType() === 'image' || req.resourceType() === 'stylesheet' || req.resourceType() === 'font') {
    req.abort()
  } else {
    req.continue()
  }
})
await useProxy(global.page, proxy)
let data = await useProxy.lookup(global.page)
console.log(data) // always prints out my IP address, not the IP address of the proxy
await global.page.goto(url) // my server logs show the request is coming from my IP address, no the IP address of the proxy

Any idea why it isn't working?

Sorry for the late response. useProxy also intercepts requests internally to change the proxy of a request. The reason it isn't working in this case is because you are continuing the request before useProxy even gets to handle it. But even if it was able to handle it, you would just be greeted with 'Request already handled' error because requests can only be handled once.

I released a new version which implements proxies per request, so now you can just use the function inside the callback:

global.page.on('request', req => {
  if (req.resourceType() === 'image' || req.resourceType() === 'stylesheet' || req.resourceType() === 'font') {
    req.abort()
  } else {
    useProxy(req, proxy)
  }
})

This way you can still abort those nasty media files, and also tunnel the requests through a proxy.

Thanks @Cuadrix, I tried what you recommended and it still didn't work. Now when I call await page.goto(url), it just times out. Any idea why?

If it times out, it might be because of the proxy itself.
This works perfectly fine when I try;

const puppeteer = require('puppeteer');
const useProxy = require('puppeteer-page-proxy');

(async () => {
	let proxy = 'http://51.158.119.88:8811'
	let url = 'https://ipleak.net'

	global.browser = await puppeteer.launch({headless: true})
	global.page = await global.browser.newPage()
	await Promise.all([
	global.page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'),
	global.page.setViewport({ width: 1366, height: 768 }),
	global.page.setRequestInterception(true)
	])
	global.page.on('request', req => {
	if (req.resourceType() === 'image' || req.resourceType() === 'stylesheet' || req.resourceType() === 'font') {
		req.abort()
	} else {
		useProxy(req, proxy)
	}
	})
	let data = await useProxy.lookup(global.page)
	console.log(data)
	await global.page.goto(url)
})();

It makes the requests from the proxy's IP as expected.

Thank you @Cuadrix, there was indeed a problem with my proxy configuration. I fixed the problem, and tested the proxy with my web browser - it is working fine. But when I run your code (substituting my proxy), I get the following error:

(node:21540) UnhandledPromiseRejectionWarning: Error: net::ERR_FAILED at https://ipleak.net
    at navigate (C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\node_modules\puppeteer\lib\FrameManager.js:121:37)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async FrameManager.navigateFrame (C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\node_modules\puppeteer\lib\FrameManager.js:95:17)
    at async Frame.goto (C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\node_modules\puppeteer\lib\FrameManager.js:407:12)
    at async Page.goto (C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\node_modules\puppeteer\lib\Page.js:629:12)
    at async C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\index.js:26:2
  -- ASYNC --
    at Frame.<anonymous> (C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\node_modules\puppeteer\lib\helper.js:110:27)
    at Page.goto (C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\node_modules\puppeteer\lib\Page.js:629:49)
    at Page.<anonymous> (C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\node_modules\puppeteer\lib\helper.js:111:23)
    at C:\Users\jpgkl\workspace\med crt\gofundme campaign scraper\index.js:26:20
    at processTicksAndRejections (internal/process/task_queues.js:93:5)

I get the same error when I try running my own code. :(

That happens when multiple requests fail due to some reason. In this case it usually means that the proxy is extremely slow. If your own web proxy worked, then there should't be anything wrong really.
The above proxy points to a french location, so it might have something to do with from where you are connecting from

@Cuadrix my proxy is in central Canada, and I am in western Canada... the distance isn't terribly far. I've tried increasing the CPU and RAM on the proxy server but it doesn't make any difference.

After some troubleshooting I've noticed that I only get this problem when I try to access an https:// address using puppeteer-page-proxy... http:// addresses work fine. When I don't use puppeteer-page-proxy, both https:// and http:// addresses work. So for example, if I set the proxy at the browser level like this:

global.browser = await puppeteer.launch({
  headless: true,
  args: [ '--proxy-server=http://35.183.34.164:3128' ]
})

...everything works fine. This appears to be an issue with puppeteer-page-proxy, not the proxies themselves.

Are you saying that you are unable to connect to https addresses with an http proxy?
How many proxies have you tried and what is the NodeJS version you are using?

Most of the http proxies I have tested connect fine to https websites on 12.15.0 LTS, though it will show that the connection is "unsecure" unlike passing in --proxy-server argument at launch.

Yes, that's correct... I'm unable to connect to https addresses with http proxies when using puppeteer-page-proxy. I get ERR_FAILED when I try to do so. Can this be fixed? Is there anything I can do to help troubleshoot?

Have you tried other http proxies to see if they get through?

Besides that you can at least tell me the NodeJS version, Puppeteer version, Windows version and if you are not using the Chromium that comes with Puppeteer, also tell me the browser you are trying to automate.

I'll be looking more into it.

Yes, I have tried with multiple different proxies and it doesn't seem to make any difference. I am using NodeJS 12.14.0, Puppeteer 1.17.0, Windows 10.0.18363 build 18363, and I am using the Chromium that comes with Puppeteer. Sorry, I'm not sure what you mean by "the browser you are trying to automate"? I am trying to automate Chromium (bundled with Puppeteer) to scrape web pages, if that answers your question.

I appreciate your support, thank you!

Hi @Cuadrix, have you had a chance to look into this yet? I'm using your plugin in a project for a client and it's somewhat time-sensitive. I don't intend to rush you at all, it's just that if I can't use this plugin, I should get started on coding a different solution pretty soon. Please let me know :)

If it is time sensitive, you should probably try something different since I won't have almost any free time, due to work, till next week to properly look into the issue.

Okay, thanks for letting me know. My client says he's fine with waiting a couple more weeks while we try to get this plugin to work, as it would be a really ideal solution for us. If it doesn't work or you can't find the time outside your work, please don't stress about it, just let me know and I will start coding my own work-around. :)

Hi! any updates? 407 error with login:pass@proxy:port, when try to use lookup method it freezes