UK Postcode Data Structure

Overview

Type safe representation of only those strings that are valid postcodes
Stack allocated 4 byte struct instead of a .NET string (26 - 30 byte heap allocation + 4 bytes for a reference)
Fast postcode validation and parsing, ignoring spaces and treating all letters case in-sensitively

Background

Every UK address is associated with a postcode. This consists of between 5 and 7 letters and digits in one of these formats:

Area	District	Sector	Unit	Example
A	1	1	AA	M1 1AE
A	11	1	AA	B33 8TH
A	1A	1	AA	W1A 0AX
AA	1	1	AA	CR2 6XH
AA	11	1	AA	DN55 1PT
AA	1A	1	AA	EC1A 1BB

Where A represents an upper case letter A-Z and 1 represents a digit 0-9. A single space is placed between the sector and unit for a total of between 6 and 8 characters.

Description of Representation

1st character of the Area is a letter (26 possibilities)
2nd character of the Area is a letter or missing (26 + 1 = 27 possibilities)
1st character of the District is a digit (10 possibilities)
2nd character of the District is a letter, a digit or missing (26 + 10 + 1 = 37 possibilities)
1st character of the Sector is a digit (10 possibilities)
1st character of the Unit is a letter (26 possibilities)
2nd character of the Unit is a letter (26 possibilities)

The number of possible postcodes is 26 · 27 · 10 · 37 · 10 · 26 · 26 = 1,755,842,400 ≤ 2³². Therefore it is possible to represent a postcode in a 4 byte (32 bit) word by using this scheme:

Value	0	1	2	3	…	9	10	11	12	13	…	25	26	…	36
Digit	0	1	2	3	…	9
Letter	A	B	C	D	…	J	K	L	M	N	…	Z
Letter or missing		A	B	C	…	K	L	M	N	O	…	Y	Z
Letter, digit or missing		0	1	2	…	8	9	A	B	C	…	M	N	…	Z

Parsing Algorithm

public static bool TryParse(string s, out Postcode postcode)

The algorithm can be understood as a special case of multi-dimensional array indexing:

Imagine each valid postcode is placed in a 7 dimensional array with the indexes for each dimension given by the scheme above
This array might be represented in memory by a single dimensional array
By calculating the index for a given postcode in this single dimensional array we have an integer that uniquely specifies any valid postcode
It is not necessary to store the hypothetical array since the element the index represents can be easily computed using the reverse calculation

Package available on Nuget

vurso / Postcodes

UK Postcode Data Structure

Overview

Background

Description of Representation

Parsing Algorithm

About

Languages