bank2ynab / bank2ynab

Easily convert and import your bank's statements into YNAB. This project consolidates other conversion efforts into one universal tool.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Output file not showing Swedish letters å, ä and ö

alanlahoni opened this issue · comments

Describe the bug
Although the script is opening the input file using the correct encoding utf-8, the output file does not include the Swedish letters å, ä and ö.

What did you EXPECT to happen?
The letters å, ä and ö is expected to be present in the output file.

What ACTUALLY happened?
The letters are not present in the output file. Either they are not included at all or replaced by a blank space.

What did you DO? (steps to reproduce)
Steps to reproduce the behavior:

  1. Put the input file, which includes the letters å, ä and ö, in my Downloads folder
  2. Open a terminal, navigate to the script directory, and run the command python3 ./bank2ynab
  3. Check the fixed_output file, which now does not include the letters å, ä or ö.

What's your software environment?

  • Script language: Python 3.11.1
  • Operating system: MacOS
  • OS version: 10.15.7

Can you provide other helpful information?
Input file:
Privatkonto senaste transaktioner 2022-12-28.csv

Resulting output file:
fixed_Privatkonto senaste transaktioner 2022-12-28.csv

Note that I have changed the account number, text and amounts for privacy reasons.

Attachments
Screenshot of the terminal when the script has run that shows that the encoding is in fact correctly identified as utf-8
Picture

The output .csv results in us-ascii encoding although the source is utf-8. Any ideas?

If I add åäöÅÄÖ in row 429 in dataframe_handler.py like below at least the characters are not replaced with blanks and the output file is also being encoded as utf-8:

modified_string_series = modified_string_series.replace(
    "[^a-zA-Z0-9åäöÅÄÖ ]", " ", regex=True

But instead the å, ä, ö, Å, Ä Ö are shown as below in the output file:

å is shown as √•
ä is shown as √§ or aÃà
ö is shown as √∂ or oÃà

Å is shown as √Ö
Ä is show as √Ñ
Ö is shown as √ñ

@nocalla What do you think? Is it when the characters are being written to the output file when the characters are not being encoded correctly?

Okay, now I see. The characters not being shown properly was an Excel issue. If I open the output file in another text editor it is all fine. But this requires the letters to not being replaced by a blank in the dataframe_handler.py by modifying the regular expression to exclude the characters å, ä, ö, Å, Ä and Ö.

Thanks for working out the root cause. This should be a straightforward enough fix. Rather than specifically include the Swedish characters you want, I think it's better to include all accented characters. I believe there is a regex handler for this.

Although I must check YNAB's API docs to see if there are any accented characters that it won't accept. We implemented this string cleaning function in the first place to provide sanitised input for the API.

Yes, this fixes the issue. Nicely solved! Thank you.

Great, thanks again for the detective work.