data61 / clkhash

CLK hash: hash pii for entity matching

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support whitespace in column identifiers

hardbyte opened this issue · comments

Oh. I presume this is because the header cells don’t match the identifiers in the schema.

This is a tough one because I don’t think it’s a good idea to strip the whitespace from all CSV cells (what if the whitespace was supposed to be there?). It might be easier to make sure that our example conforms to the schema as it is (without whitespace stripping).

From the RFC section 2.4:

Spaces are considered part of a field and should not be ignored.
But it seems to be often accepted that the line

a, b, "c"

is equivalent to

"a"," b","c"

see well the spaces: there is one before b (on the first and second line, which is expected) but none before c while on the first line, there was one before "c", because the string c is defined with the quotes so the space before can be stripped.

I agree - I'll open a quick edit to remove the extra spaces from the header line in our example