agentm / project-m36

Project: M36 Relational Algebra Engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

parse error in importcsv

YuMingLiao opened this issue · comments

test.csv

attr
(

It's fine.

test.csv

attr
)
TutorialD (master/main): n :: {attr Text}
TutorialD (master/main): :importcsv "test.csv" n
ERR: ParseError "AttributeMappingError (ParseError \"endOfInput\")"

The CSV parser in Project:M36 only parses quoted Text fields, so set your CSV exporter to quote all text fields. Project:M36 exports text fields with quotes unconditionally so that it can round-trip the data. Sorry if that's not clear.

CSV is not a technical standard, so without this requirement, certain strings become ambiguous:

  • test"test
  • “test (Unicode smart quote!)
  • "test
  • etc.

In addition, Project:M36 generates CSV files with Haskell ADTs like HairColor "Blond", so the text "Blond" is a TextAtom within an algebraic data type in the CSV file.

This could be improved, but no one solution would be able to parse all CSV files consistently, so I punted on it entirely and required TextAtoms to be quoted unconditionally to be unambiguous. I have added a note to the documentation to clarify this.

I can pass TextAtom without quotes, actually.
It seems something wrong with right paren, only.

"city"
"("

This one is ok.

"city"
")"

This one is not.

TutorialD (master/main): x :: {city Text}
TutorialD (master/main): :importcsv "nutrition/one_column.csv" x
ERR: ParseError "AttributeMappingError (ParseError "endOfInput")"

parseAtom attrName aType textIn = case APT.parseOnly (parseCSVAtomP attrName tConsMap aType <* APT.endOfInput) textIn of Left err -> Left (ParseError (T.pack err))

I guess it's because a right paren is treated like an endOfInput in a TextAtom parsing.

--read data for Text.Read parser but be wary of end of interval blocks takeToEndOfData :: APT.Parser T.Text takeToEndOfData = APT.takeWhile (APT.notInClass ",)]")
I see. the right paren is treated as end of interval blocks. So I can't have TextAtom with a ) character.

Yea, the current behavior is arbitrary and unintentional. I'll fix the parser to error out on unquoted strings.