Output file not showing Swedish letters å, ä and ö

Question

Output file not showing Swedish letters å, ä and ö

alanlahoni opened this issue 2 years ago · comments

Describe the bug
Although the script is opening the input file using the correct encoding utf-8, the output file does not include the Swedish letters å, ä and ö.

What did you EXPECT to happen?
The letters å, ä and ö is expected to be present in the output file.

What ACTUALLY happened?
The letters are not present in the output file. Either they are not included at all or replaced by a blank space.

What did you DO? (steps to reproduce)
Steps to reproduce the behavior:

Put the input file, which includes the letters å, ä and ö, in my Downloads folder
Open a terminal, navigate to the script directory, and run the command python3 ./bank2ynab
Check the fixed_output file, which now does not include the letters å, ä or ö.

What's your software environment?

Script language: Python 3.11.1
Operating system: MacOS
OS version: 10.15.7

Can you provide other helpful information?
Input file:
Privatkonto senaste transaktioner 2022-12-28.csv

Resulting output file:
fixed_Privatkonto senaste transaktioner 2022-12-28.csv

Note that I have changed the account number, text and amounts for privacy reasons.

Attachments
Screenshot of the terminal when the script has run that shows that the encoding is in fact correctly identified as utf-8

alanlahoni · Answer 1 · Sat Feb 04 2023 00:44:07 GMT+0800 (China Standard Time)

The output .csv results in us-ascii encoding although the source is utf-8. Any ideas?

alanlahoni · Answer 2 · Sat Feb 04 2023 22:02:20 GMT+0800 (China Standard Time)

If I add åäöÅÄÖ in row 429 in dataframe_handler.py like below at least the characters are not replaced with blanks and the output file is also being encoded as utf-8:

modified_string_series = modified_string_series.replace(
    "[^a-zA-Z0-9åäöÅÄÖ ]", " ", regex=True

But instead the å, ä, ö, Å, Ä Ö are shown as below in the output file:

å is shown as √•
ä is shown as √§ or aÃà
ö is shown as √∂ or oÃà

Å is shown as √Ö
Ä is show as √Ñ
Ö is shown as √ñ

@nocalla What do you think? Is it when the characters are being written to the output file when the characters are not being encoded correctly?

alanlahoni · Answer 3 · Sat Feb 04 2023 22:39:02 GMT+0800 (China Standard Time)

Okay, now I see. The characters not being shown properly was an Excel issue. If I open the output file in another text editor it is all fine. But this requires the letters to not being replaced by a blank in the dataframe_handler.py by modifying the regular expression to exclude the characters å, ä, ö, Å, Ä and Ö.

Niall O'Callaghan · Answer 4 · Sat Feb 04 2023 23:15:12 GMT+0800 (China Standard Time)

Thanks for working out the root cause. This should be a straightforward enough fix. Rather than specifically include the Swedish characters you want, I think it's better to include all accented characters. I believe there is a regex handler for this.

Although I must check YNAB's API docs to see if there are any accented characters that it won't accept. We implemented this string cleaning function in the first place to provide sanitised input for the API.

Niall O'Callaghan · Answer 5 · Sun Feb 05 2023 05:12:47 GMT+0800 (China Standard Time)

Can you test the branch over at https://github.com/bank2ynab/bank2ynab/tree/431-output-file-not-showing-swedish-letters-%C3%A5-%C3%A4-and-%C3%B6 and see if this fixes your issue, @alanlahoni?

alanlahoni · Answer 6 · Sun Feb 05 2023 06:29:18 GMT+0800 (China Standard Time)

Yes, this fixes the issue. Nicely solved! Thank you.

Niall O'Callaghan · Answer 7 · Sun Feb 05 2023 06:29:55 GMT+0800 (China Standard Time)

Great, thanks again for the detective work.