twoscoops / two-scoops-of-django-1.11

The issue tracker, changelog, and code repository for Two Scoops of Django 1.11

Home Page:https://www.twoscoopspress.com/products/two-scoops-of-django-1-11

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[enhancement] never display sequential primary keys: postgresql solution

netvigator opened this issue · comments

Location within the Book 365-366

  • Chapter or Appendix: Chapter 26
  • Section: 26.27
  • Subsection: to be implemented

You advise never to display sequential primary keys. There is a solution that might be worth mentioning: in the tables where, by default, django displays the key in the url to access the record, use non-sequential integer primary keys.

Databases in general have an easier time with integer keys. So non-sequential integer keys might be a better option than slugs or UUID's. And making the integer keys non-sequential avoids the additional overhead of adding an extra field and index.

The idea has come up before, and there are solutions out there. Here are the links I found:

Pseudo_encrypt
Pseudo_encrypt_constrained_to_an_arbitrary_range

Here is the code I used to make integer keys at least 7 digits in length:

CREATE OR REPLACE FUNCTION pseudo_encrypt(VALUE int) returns int AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
 l1:= (VALUE >> 16) & 65535;
 r1:= VALUE & 65535;
 WHILE i < 3 LOOP
   l2 := r1;
   r2 := l1 # (((1366 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
   l1 := l2;
   r1 := r2;
   i := i + 1;
 END LOOP;
 RETURN ((r1 << 16) + l1);
END;
$$ LANGUAGE plpgsql strict immutable;

CREATE OR REPLACE FUNCTION randomized(VALUE int) returns int AS $$
BEGIN
  LOOP
    VALUE := pseudo_encrypt(VALUE);
    EXIT WHEN VALUE >= 1000000;
  END LOOP;
  RETURN VALUE;
END
$$ LANGUAGE plpgsql strict immutable;

Make your own "secret sauce"! Tweak the numbers:
(((1366 * r1 + 150889) % 714025) / 714025.0)
As explained here:
sql-keys-in-depth

Tweaking the numbers, note that the last two should be the "same", the last one being the float version of the prior integer.

Yes, in my implementation, I tweaked the numbers. But for the following example, I used the non-tweaked numbers:

# using secret sauce values from https://wiki.postgresql.org/wiki/Pseudo_encrypt
# r2 := l1 # ((((1366 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
create table test ( id serial, text character ) ;
insert into test (text) values ('a');
insert into test (text) values ('b');
insert into test (text) values ('c');
ALTER TABLE test ALTER COLUMN id SET DEFAULT randomized(nextval('test_id_seq')::int);
insert into test (text) values ('d');
insert into test (text) values ('e');
insert into test (text) values ('f');
select * from test ;
     id     | text 
------------+------
          1 | a
          2 | b
          3 | c
  483424269 | d
 1905133426 | e
  971249312 | f

This solution can be implemented as a retrofit, just make your minimum value non-sequential key bigger that the biggest sequential key in your tables; change this line:

EXIT WHEN VALUE >= 1000000;

Also note that the secret sauce values are stored in the database, so keeping them out of version control is feasible.

I would not recommend this for the user table, as there is utility in having a recognizable alpha user name. This option can be a good fit for any other table.

Real world example: ebay item numbers

Part of the reason not to display sequential primary keys comes down to user experience; numeric identifiers convey no meaning to a user and are virtually impossible to use as a reference point from memory to find something later. If you're going to be displaying something in the URL on a public facing page, you should really be using a slug.

The other half of not displaying sequential keys is to give you security-by-depth. Your users should not be able to just iterate through a range of numbers to figure out what the shape of your dataset is. Random number sets are a step in the right direction, but I don't feel like they go far enough. UUIDv4 is complex and random enough that it makes any kind of experimentation to discover the shape of your dataset practically infeasible. I.e., the cost and time of attempting aren't generally worth the payoff.

For the above reasons, I'd strongly advise simply using UUIDv4 as a PK and providing a user-facing slug whenever possible. That being said, there are a couple of Django-specific corner cases where switching the PK to something generated at runtime isn't really feasible; e.g., if you want to check to see if an object exists by referencing the pk (which won't have been set until the model instance has been saved). I personally feel like this is a bit of an anti-pattern, but it certainly exists in the wild. If you're dealing with a situation like this that isn't particularly feasible to fix, randomizing the PK pool as above is probably a decent compromise.