railslove / cmxl

your friendly MT940 SWIFT file parser for bank statements

Home Page:http://railslove.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

processing problem

krzcho opened this issue · comments

I recently had problems with MT940 so I upgraded to 1.1.0 and from that time I got issues (but it may also be the bank changing the format).
MT940 piece:

:61:180627D79,NMSCXXXX3550//MA-20-00084395
28/06/1812:15 PIZZA HUT MA3550
:86:XXXX3550        /TYPE/631/PAYM CARTE

result:

#<Cmxl::Fields::Transaction:0x0000000007ad8c78 @tag="61", @modifier=nil, @source="180627D79,NMSCXXXX3550//MA-20-00084395\n28/06/1812:15 PIZZA HUT MA3550", @data={"date"=>"180627", "entry_date"=>nil, "storno_flag"=>"", "funds_code"=>"D", "currency_letter"=>nil, "amount"=>"79,", "swift_code"=>"NMSC", "reference"=>"XXXX3550//MA-20-", "bank_reference"=>nil, "supplementary"=>"00084395"}, @match=#<MatchData "180627D79,NMSCXXXX3550//MA-20-00084395" date:"180627" entry_date:nil storno_flag:"" funds_code:"D" currency_letter:nil amount:"79," swift_code:"NMSC" reference:"XXXX3550//MA-20-" bank_reference:nil supplementary:"00084395">, @details=#<Cmxl::Fields::StatementDetails:0x0000000007ad8570 @tag="86", @modifier=nil, @source="XXXX3550        /TYPE/631/PAYM CARTE", @data={"transaction_code"=>"XXX", "details"=>"X3550        /TYPE/631/PAYM CARTE", "seperator"=>"X"}, @match=#<MatchData "XXXX3550        /TYPE/631/PAYM CARTE" transaction_code:"XXX" details:"X3550        /TYPE/631/PAYM CARTE" seperator:"X">>>

Why do I have such reference and bank_reference? Is it proper result of processing?
I would expect a reference of "XXXX3550", bank_reference "MA-20-00084395" and supplementary "28/06/1812:15 PIZZA HUT MA3550"

I use Cmxl.config[:statement_separator] = /\r?\n-\r?\n(?:[^:]*\r?\n)+/m

Hi, thanks for your report.
I only had a quick look, but it seems that the // separator for the bank reference is not properly recognized.

if you want to help debug this:
parsing is done by this regex: https://github.com/railslove/cmxl/blob/master/lib/cmxl/fields/transaction.rb#L5
and there was this change lately: dafcc43

Also @Uepsilon do you have an idea?

Hi, thanks for checking. What is given as an entry for this regexp? Is it ":61:180627D79,NMSCXXXX3550//MA-20-00084395\n28/06/1812:15 PIZZA HUT MA3550" part?

Disregard that - actually initial post describes the @source

I propose to change reference matching from
(?<reference>NONREF|.{0,16})
into
(?<reference>NONREF|[^\/]{0,16})
which fixes my case

actually I think only the new line in the :61: filed is the problem.
maybe we should strip /n in the line. the parser then should actually work.

afaik it should not be a problem to strip all new lines in the fields, @Uepsilon ?

so it seems there are two solutions: strip newlines or exclude backspace from reference matching; both work fine
http://rubular.com/r/NZ7HnqWF7p
http://rubular.com/r/zRB5mTM7DO
restoration of rexexps on those links fails :( I have reported it as an issue...

commented

@bumi i actually just added that here: #19
newlines are allowed to attach supplementary information.

@krzcho also, excluding / does not sound fun, as they are among the allowed chars. we just have to focus on the double slash. that's fun with regex

but i'm this close to kick the regex shit in the dirt (should have done that a while ago) and write a grammar. will do when i find the time.. meanwhile, take that dirty hotfix

@Uepsilon ahh ok. saw your PR - I think we should add an additional test case and then merge that one.

also when I started CMXL I tried to use some ragel and grammar stuff... but failed. it was too hard for me and the regex was more flexible and faster - so be aware :D