tenox7 / ttyplot

a realtime plotting utility for terminal/console with data input from stdin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-byte characters not supported?

MIvanchev opened this issue · comments

I'm using ttyplot and I LOVE it, but whenever I set -c to one of https://en.wikipedia.org/wiki/Block_Elements, the display becomes kaleidoscopic. So I have the suspicion that multi-byte characters are not properly supported?

What OS and terminal is this on?

Hey, it's on Void Linux, Fish shell.

What is the terminal app, font and locale setting, LANG=

thanks

Alacritty, LiterationMono Nerd Font Mono from https://www.nerdfonts.com/, en_US.UTF-8

Hi! I am experiencing the same issue here, with Ubuntu 22.04, gnome-terminal, LANG=en_US.UTF-8, and LC_CTYPE unset. The command:

$ { ./torture | head -75; sleep 1m; } | ./ttyplot -c ''

displays this:

screenshot

The displayed image is the same whatever non-ASCII character I pass to -c. I tried a bunch of them: ², Œ, à, ß, £, µ, ≤, ÷, →, ©, ™.

Looking at the source code, it is clear that non-ASCII characters cannot possibly work. The option -c is parsed like this:

    case 'c':
        plotchar=optarg[0];
        break;

Here, optarg is an array of char holding the UTF-8 code units of the provided character, and only the first one is stored in plotchar. In the example above, I used the character “╎” (U+254E Box drawings light double dash vertical), which in UTF-8 is encoded as the sequence {0xe2, 0x95, 0x8e}, then plotchar got initialized with 0xe2 (sign-extended to 32 bits).

@edgar-bonet nice analysis! 👍

Yes, excellent find! Sadly a 100% correct solution will require something like iconv and a library dependency. A solution for the maybe most common case UTF-8 would require calling nl_langinfo and a UTF-8 parser like this. If the locale is not UTF-8 ttyplot could print a warning and use the default char.

@MIvanchev: I think we could use mbtowc() to convert the character to a wchar_t. This function does not add any dependency: it is in libc since C99. Then use mvvline_set() instead of mvvline(). The difference between these two functions is that the former takes the character as a cchar_t (instead of chtype), which is an array of wchar_t with attributes.

Note that mbtowc() only works after calling setlocale(). I wrote this small code for testing mbtowc():

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

int main(int argc, char *argv[])
{
    if (argc != 2) {
        fprintf(stderr, "Usage: %s character\n", argv[0]);
        return EXIT_FAILURE;
    }
    setlocale(LC_ALL, "");
    wchar_t c;
    if (mbtowc(&c, argv[1], MB_CUR_MAX) < 1) {
        fprintf(stderr, "Could not convert %s\n", argv[1]);
        return EXIT_FAILURE;
    }
    printf("U+%04X\n", c);
    return EXIT_SUCCESS;
}

It successfully converts non-ASCII characters:

$ ./read-wchar ┆
U+2506

Yeah @edgar-bonet that sounds like the way to go! Let's draw straws to see who has to write the patch. I'll go first.

This is my straw: ~=====~

@MIvanchev: Here is mine ====. OK, I'll write the patch...

You're too fast! I just had another idea that doesn't use wide strings and thus no change in dependencies. We could use mblen to find out how many bytes the first character takes, copy them to a new buffer plus \0 and pass this to ncurses instead. So in essence

int char_len = mblen(argv[1], strlen(argv[1]));
assert(char_len < SIZE_MAX)
char *buf = malloc(char_len + 1);
strncpy(buf, argv[1], char_len + 1);

/* Use buf with ncurses. */

How would you draw vertical lines with this char * then?

ttyplot@master uses mvvline() for this. All the functions from the vline() family take either a chtype (an integer holding an ASCII character, with attributes in the upper bits) or a const cchar_t * (which is based on wchar_t). None of them takes a multibyte sequence as a char *.

Maybe replacing mvvline() with a loop of mvaddstr()? It may be worth a try. If this can work with a non-wchar version of ncurses, this could be a win! Would you write that patch?

Let me see if I remember how programming works...

Scrape my idea, I tested extensively over the weekend and it doesn't really work, ncurses doesn't seem to support multi-byte chars through char.

@MIvanchev: In the mean time, multi-byte characters work file on the development branch.

Yeah, I know, they have worked for a long time thanks to your effort :D I was just curious whether ncursesw is really necessary but it seems it really is ¯\(ツ)

Closing as fixed by #99