reacherhq / check-if-email-exists

Check if an email address exists without sending any email, written in Rust. Comes with a ⚙️ HTTP backend.

Home Page:https://reacher.email

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HaveIBeenPawned?

amaury1093 opened this issue · comments

Add a field misc.have_i_been_pawned: true/false which makes an API call to https://haveibeenpwned.com/

There is a small problem: haveibeenpwned's API costs $3.50/month.
Maybe consider scraping or a similar free API?

Ah, I wasn't aware it was paid. So maybe not, I don't think it's super high priority (and people can always make a separate API call for that).

I recall the author was open-sourcing it. Will it still be paid after?

On API Key Page they provide a link to a blog post, which says:

Clearly not everyone will be happy with this so let me spend a bit of time here explaining the rationale. This fee is first and foremost to stop abuse of the API.

So, I think that we should not expect that API will become free soon.

Hello I made my own API. It's free forever! And it works the same as haveibeenpwned.com. I try to make a PR soon.
Edit: I am not a rust dev😅

@DigitalGreyHat Can you give some/more information about your API?

Hello, I am currently working on this.

There are my thoughts:

The problem with the cloudflare bypass is that we have to rely on a stealth browser. Otherwise cloudflare will be triggered. https://github.com/ultrafunkamsterdam/undetected-chromedriver seems to be the one with the biggest community. I did a PoC and the results are not reliable. It works ~70% of the time (30% of crash/no response). Another problem of the slealth browser is that it brings a lot of new dependencies with its maintainability need.

To my mind, implement the paid API is the way to go. Otherwise we can find another reliable and free API.

commented

Let's go with the paid API. @sylvain-reynaud would you like to create a PR?

I think the way to go is:

  • add an env variable RCH_HIBP_API_KEY, if it's set to something non empty, then make an API call
  • put the result in misc.have_i_been_pawned: Option<bool>
commented

Otherwise we can find another reliable and free API.

Do people know of other free APIs? Ideally open-source. We can always add misc.<other_api> = true/false, and make those extra API calls configurable.

According to https://github.com/khast3x/h8mail#apis there are 3 free(ium) apis:

For information, there is Fingerprint Suite with Playwright.
It's OK with Antibot. I didn't test with Cloudflare.

It's OK with Antibot. I didn't test with Cloudflare.

const { chromium } = require('playwright');
const { FingerprintGenerator } = require('fingerprint-generator');
const { FingerprintInjector }  = require('fingerprint-injector');

(async () => {
	const fingerprintGenerator = new FingerprintGenerator();

	const browserFingerprintWithHeaders = fingerprintGenerator.getFingerprint({
		devices: ['desktop'],
		browsers: ['chrome'],
	});

	const fingerprintInjector = new FingerprintInjector();
	const { fingerprint } = browserFingerprintWithHeaders;

	const browser = await chromium.launch({ headless: false})

	// With certain properties, we need to inject the props into the context initialization
	const context = await browser.newContext({
		userAgent: fingerprint.userAgent,
		locale: fingerprint.navigator.language,
		viewport: fingerprint.screen,
	});

	// Attach the rest of the fingerprint
	await fingerprintInjector.attachFingerprintToPlaywright(context, browserFingerprintWithHeaders);

	const page = await context.newPage();

	await page.goto('https://haveibeenpwned.com/unifiedsearch/user@example.org');

	// wait for the page to load
	await page.waitForTimeout(20000);
	// log the page content
	console.log(await page.content());
	// screenshot the page
	await page.screenshot({ path: 'proof.png' });
})();

If it runs in headless it is blocked, if it runs with the browser window it is not blocked. You can check it with the code above.

I'll implement the paid API in first place.

It seems OK in Firefox headless mode with this:

import path from 'path';
import { fileURLToPath } from 'url';

import { firefox } from 'playwright';
import { FingerprintGenerator } from 'fingerprint-generator';
import { FingerprintInjector } from 'fingerprint-injector';

(async () => {
    const fingerprintGenerator = new FingerprintGenerator();

    const browserFingerprintWithHeaders = fingerprintGenerator.getFingerprint({
        devices: ['desktop'],
        browsers: ['firefox'],
    });

    const fingerprintInjector = new FingerprintInjector();
    const { fingerprint } = browserFingerprintWithHeaders;

    const browser = await firefox.launch({
        headless: true
    });

    // With certain properties, we need to inject the props into the context initialization
    const context = await browser.newContext({
        userAgent: fingerprint.userAgent,
        locale: fingerprint.navigator.language,
        viewport: fingerprint.screen,
    });

    // Attach the rest of the fingerprint
    await fingerprintInjector.attachFingerprintToPlaywright(context, browserFingerprintWithHeaders);

    const page = await context.newPage();

    await page.goto('https://haveibeenpwned.com/unifiedsearch/user@example.org');

    await page.screenshot({ path: path.join(path.dirname(fileURLToPath(import.meta.url)), 'playwright_test_headless.png') });

    await browser.close()
})();

Yep! It's OK with got-scraping
got-scraping library has usually better success than other libraries due to header generation, http2 and browser ciphers.

import { gotScraping } from 'got-scraping';

(async () => {
    const response = await gotScraping({
        url: 'https://haveibeenpwned.com/unifiedsearch/user@example.org',
        headerGeneratorOptions:{
            browsers: ['firefox'],
            devices: ['desktop'],
        }
    });
    console.log(response.body)
    const result = JSON.parse(response.body)
    console.log(`Response headers: ${JSON.stringify(response.headers)}`);
})();

@LeMoussel wow I didn't know about this package, thank's 💯

So I'm working on adding the feature by calling this URL https://haveibeenpwned.com/unifiedsearch/user@example.org

Hello, my PR is ready to be reviewed :)

commented

I fixed the format and removed code that might break if a field is added on the API Response.

@beshoo it uses the haveibeenpwned API. The endpoint used is the one used by the front-end haveibeenpwned.com.

commented

The node.js libraries are probably more battle-tested, but I would like to keep this repo as pure Rust.

Also, I'm reluctant to use a headless browser for HIBP. It seems there's a risk that it'll become flaky/blocked one day, and the maintenance burden will likely fall on me. I propose to start with the paid API, as descrbied in #289 (comment). I'll gladly purchase the paid API and make it available on https://reacher.email 's SAAS plan.