[FEATURE] - Find personnummer & check vs skatteverkets list
peter-c-larsson opened this issue · comments
peter-c-larsson commented
Description
I had help with this script here and what I did to scan our code base for personnummer was a regex that looks like this:
REG_EX="(18|19|20|)\d{2}(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])-?\d{4}"
grep -r -e "$REG_EX" .
The code I used to download all skatteverkets personnummer:
#!/bin/bash
# Based on the API here: https://skatteverket.entryscape.net/rowstore/dataset/b4de7df7-63c0-4e7e-bb59-1f156a591763/html
RESULTFILE=personnummer-skatteverket.txt
LIMIT=500
rm "$RESULTFILE" || echo "File doesn't exist"
for ((i=0; ; i+=LIMIT)); do
echo "Working with offset $i"
contents=$(curl -s -H 'Content-Type: application/json' "https://skatteverket.entryscape.net/rowstore/dataset/b4de7df7-63c0-4e7e-bb59-1f156a591763?_offset=$i&_limit=$LIMIT")
if jq -e '.results | length == 0' >/dev/null; then
break
fi <<< "$contents"
echo "$contents" | jq --raw-output '.results[] | .testpersonnummer' >> "$RESULTFILE"
done
echo "Entries collected: $(wc -l $RESULTFILE)"
With the help of these two components and the valid script here I could easily check all our code for personnummer that is is valid and if they match skatteverkets list.
Suggestion would be to add these features to this repository and I can create PR if there is interest.
Breaking changes
Nope