cyberpunka / passdb-backend

Password Breach Data Normalizer & DB Seeder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

passdb

Password dump database normalizer and seeder

super alpha

DB Setup

Depending on the data drive, add one of the conf files from the db directory to Postgres' conf.d dir.

cp db/16gb_4cpu_ssd.conf /etc/postgres/10/main/conf.d/dump.conf
systemctl restart postgres@10-main.service

Currently averaging around 350K inserts/minute with these settings and table configuration in db/migrate/

Usage

Seeding

There's a test tar in the tests dir

Dump entries should be in the format:

email@domain.com:password

The parse logic in seeder takes a best-effort approach to pulling domain, username, and password from each line. Some dumps use ; and seeder looks for that too. Apart from that, if it can't find all three datapoints in the line, it isn't added to the database.

# for psql
export RACK_ENV=production

# for sqlite
export RACK_ENV=development

bundle install
bundle exec rake db:reset

#build golang seeder
cd seeder && go build -o seeder main.go

# pushover.net token for mobile progress alerts
export PO_USR=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export PO_API=yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
export PG_CONN='postgres://passdb_user:passdb_pass@localhost/passdb'

#macos postgres want this string instead
export PG_CONN='postgres://passdb_user:passdb_pass@localhost/passdb?ssl_mode=disabled'

# seed the
./seeder test_data.tar.gz

Querying

Associations are set in the ORM such that pivotting on any of username, password, or domain is possible

# to start the query interface
bundle exec rake


# start with a domain
yahoo = Domain.find_by(domain: "yahoo.com")

# find all passwords by yahoo mail users
yahoo.passwords



# find all yahoo mail users
yahoo.usernames

# find all password of a particular yahoo mail user
yahoo.usernames.first.passwords



# start with a user
eric = Usernames.find_by(name: "eric1990")

# see all passwords belonging to eric
eric.passwords

# see all email account for eric
eric.domains



# starting with a password
pass = Password.find_by(password: "P@ssw0rd!")

# see the users that share this password
pass.usernames

Stats

Run rake -T to see all tasks.

At the time of writing you can pull table sizes, current connection pool utilization

Stats below taken at 8 million entries:

Seeder benchmarks with bundle exec rake bench:insert

About

Password Breach Data Normalizer & DB Seeder


Languages

Language:Ruby 64.1%Language:Go 35.0%Language:Dockerfile 0.9%