Feature Request: sixel graphics support

Question

Feature Request: sixel graphics support

migueldeicaza opened this issue 5 years ago · comments

Would like to see Sixel support in the Terminal, this is the standard used to show graphics in the console.

Sixel is part of the original DEC specification for doing graphics in terminals and has been re-popularized in recent years for doing graphics on the command line, in particular by Pythonistas doing data science.

The libsixel library provides an encoder but is also a great introduction to the subject (better than the Wikipedia page):

https://github.com/saitoha/libsixel

Chad Brockman commented 5 years ago

Need.

Hao commented 5 years ago

OMG

yetanotherpillow commented 4 years ago

Please.

Deleted user commented 3 years ago

bump

Philippe Majerus · Answer 1 · Tue May 07 2019 11:49:29 GMT+0800 (China Standard Time)

While implementing Sixel, it is important to test with images that contain transparency.
Transparency can be achieved by drawing pixels of different colors but not drawing some pixels in any of the Sixel colors, leaving the background color as it.
I believe this is the only way to properly draw non-rectangular Sixels, and would be especially nice with the background acrylic transparency in the new Windows Terminal.

Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.

Garrett Serack · Answer 2 · Wed May 08 2019 01:08:06 GMT+0800 (China Standard Time)

OOh. Sixel is very cool stuff.

I've decided that I need that. NEED.

Mike Griese · Answer 3 · Wed May 08 2019 01:29:32 GMT+0800 (China Standard Time)

I'll happily review a PR :)

therealkenc · Answer 4 · Thu May 09 2019 07:31:13 GMT+0800 (China Standard Time)

Caught the Build 2019 interview today that mentioned this request. I still maintain that Xorg on sixel is just wrong. So very very wrong.

The ffmpeg-sixel "Steve Ballmer Sells CS50" demo never gets tired tho. Gotta say, it is a little disappointing the video lacks sound (sound really makes the video). Consoles already have sound, naturally. They totally beep. Precedent set. What we really need is a new CSI sequence for the opus clips interleaved with the frames, amirite?

Dustin L. Howett (MSFT) · Answer 5 · Thu May 09 2019 08:27:53 GMT+0800 (China Standard Time)

Ken, I truly deserve this for mentioning Sixels ;)

…

Dustin L. Howett (MSFT) · Answer 6 · Thu May 09 2019 12:33:43 GMT+0800 (China Standard Time)

Related: #120

Mike Griese · Answer 7 · Thu May 09 2019 21:14:38 GMT+0800 (China Standard Time)

LOL I was watching the stream and I just thought to myself "here's my boss assigning me work live in front of a studio audience".

WSLUser · Answer 8 · Fri May 10 2019 13:05:56 GMT+0800 (China Standard Time)

Please make this a priority for v1.0!

WSLUser · Answer 9 · Fri May 10 2019 13:06:50 GMT+0800 (China Standard Time)

3d animations can be v1.5 😛

Matěj Štágl · Answer 10 · Fri Jun 07 2019 22:18:41 GMT+0800 (China Standard Time)

Upvoting this request, Sixels would be such an amazing thing to have in the Terminal.

Deleted user · Answer 11 · Tue Aug 06 2019 03:47:12 GMT+0800 (China Standard Time)

This weekend I finished implementing sixel read support for my MIT-licensed Java-based TUI library, and it was surprisingly straightforward. The code to convert a string of sixel data to a bitmap image is here, and the client code for the Sixel class is here.

I have done very little for performance on the decoder. But when using the Swing backend, performance is still OK, as seen here. (The snake image looks bad only because byzanz used a poor palette creating the demo gif.) I was a bit taken aback how quickly it came together. It's very fair to say that the "decode sixel into bitmap" part is the easy bit, the hard bit is the "stick image data into a text cell, and when that is present blit the image to screen rather than the character".

Just want to mention it to other folks interested in terminal support for sixel, and hoping it could help you out.

Keith Farmer (MSFT) · Answer 12 · Tue Aug 06 2019 13:15:40 GMT+0800 (China Standard Time)

I'll upvote if someone else writes a Jupyter notebook client ;)

WSLUser · Answer 13 · Tue Aug 06 2019 21:30:18 GMT+0800 (China Standard Time)

We already have an example of Sixel support in mintty which is written in C (vice java). Only thing needed is a refactor to C++ (at least for initial support). Still always good to see how it's been implemented in other projects.

Deleted user · Answer 14 · Tue Aug 06 2019 23:42:09 GMT+0800 (China Standard Time)

We already have an example of Sixel support in mintty which is written in C (vice java). Only thing needed is a refactor to C++ (at least for initial support). Still always good to see how it's been implemented in other projects.

Any issues with mintty's license (GPLv3 or later)?

WSLUser · Answer 15 · Wed Aug 07 2019 00:42:53 GMT+0800 (China Standard Time)

https://github.com/mintty/mintty/blob/master/LICENSE

Deleted user · Answer 16 · Wed Aug 07 2019 01:53:06 GMT+0800 (China Standard Time)

https://github.com/mintty/mintty/blob/master/LICENSE

From that link:

Sixel code (sixel.c) is relicensed under GPL like mintty with the
permission of its author (kmiya@culti)

If you transliterate that exact code to C++, the derivative work would need to be licensed GPLv3 or later, as per its terms, or not distributed at all. (One could also ask kmiya@culti if they are willing to offer sixel.c under a different license, or if it was once available under something else find a copy from that source.)

I don't know what is acceptable or not for inclusion in Windows Terminal -- my quick glance at Windows Terminal says it is MIT licensed, so depending on how it is linked/loaded using a direct descendant of mintty's GPLv3+ sixel.c could lead to a license issue.

Anyway, sorry to be bugging someone else's project here, heading back to the cave now...

İsmail Yılmaz · Answer 17 · Tue Aug 20 2019 08:16:40 GMT+0800 (China Standard Time)

There is a sixel capable, humble terminal emulator widget written in C/C++ for Windows/Linux, and it has a SixelRenderer class which you can use, (though it needs some optimization), and it has a BSD-3 license. Arguably its biggest downside is that it is written for a specific C++ framework. Still, IMO the SixelRenderer's code is translatable with little effort. (I know this because I am its author. :) )

https://github.com/ismail-yilmaz/upp-components/tree/master/CtrlLib/Terminal

Meadhbh Hamrick · Answer 18 · Sat Jan 18 2020 09:12:13 GMT+0800 (China Standard Time)

While implementing Sixel, it is important to test with images that contain transparency.
Transparency can be achieved by drawing pixels of different colors but not drawing some pixels in any of the Sixel colors, leaving the background color as it.
I believe this is the only way to properly draw non-rectangular Sixels, and would be especially nice with the background acrylic transparency in the new Windows Terminal.

Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.

hmm. the VT340 i'm in front of honors the P2 parameter in the DCS P1 ; P2 ; P3 ; q sequence that initiates the SIXEL sequence. Xterm, on the other hand, seems to ignore it. But if you use the raster attributes sequence ( " Pan ; Pad ; Ph ; Pv ) and give it a height and width, it will clear the background so you get a black pixel.

i was thinking about getting the free trial of the ttwin emulator and checking out how it's behviour differs from the VT340 and the Xterm acting as a VT340.

But... +1 on the idea of supporting SIXEL in general and +10 for the idea of coming up with compatibility tests.

Jesús Leganés-Combarro · Answer 19 · Sat Jan 18 2020 21:02:30 GMT+0800 (China Standard Time)

We could add support for iTerm2 Inline Images Protocol once we are there... At least it should be easier to implement, it just only need a path to the image and does everything on its own.

One doubt I have with both systems is, what happens with aligment? If images width or height are a multiple of chars width or height everything is ok, but if not, should a padding be added only in lower and right sides, or should image be centered adding padding to all sides?

Mike Griese · Answer 20 · Thu Jun 04 2020 21:45:51 GMT+0800 (China Standard Time)

Hey here are some relevant links for research:

"Basics for a Good Image Protocol" on terminal-wg (and a linked earlier discussion)
this massive thread about sixel support

Meadhbh Hamrick · Answer 21 · Fri Jun 05 2020 01:42:44 GMT+0800 (China Standard Time)

We could add support for iTerm2 Inline Images Protocol once we are there... At least it should be easier to implement, it just only need a path to the image and does everything on its own.

That probably should be a different task. Sixel and ReGIS are explicitly for in-band graphical or character data. I'm not saying it's a bad idea, I'm just saying it should be treated as a different feature.

One doubt I have with both systems is, what happens with aligment? If images width or height are a multiple of chars width or height everything is ok, but if not, should a padding be added only in lower and right sides, or should image be centered adding padding to all sides?

Alignment of Sixel and ReGIS graphical data is described (poorly) in various manuals. Sixel images are aligned on character cell boundaries. If you want a black border around an image, you have to add those black pixels yourself; there's no concept of anything like HTML's margin or padding. Each line of sixel data describes a stripe six pixels high. If you're trying to align sixel image data with text characters on a terminal emulator, this can be frustrating as the software generating the sixel data may not know how many pixels high each character glyph is. If you have an old-school xterm handy, you can see this by starting it up in vt340 mode, specifying different font sizes (to give you different character cell sizes) and then printing out some sixel data that tries to align image data with text data. (Here's a simple test file that looks correct when I tell the font server to use 96DPI and I specify a 15 point font. Modifying the font size causes images to increasingly come out of alignment with the text. https://gist.github.com/OhMeadhbh/3d63f8b8aa4080d4de40586ffff819de )

The original vt340s didn't have this problem because (of course) you didn't get to specify a font size when turning the terminal on.

The other thing you can see from that image, that isn't well described in the sixel documentation is that printing a line of sixel data establishes a "virtual left margin" for the image data. If you do the moral equivalent of a CR or CRLF using the '$' or '-' characters, the next line is printed relative to this virtual left margin, not the real left margin at the left side of the terminal.

Hope this helps.

Meadhbh Hamrick · Answer 22 · Fri Jun 05 2020 02:40:35 GMT+0800 (China Standard Time)

Finally scrolling back to read this. Sorry for the tardy reply.

Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.

It shouldn't be too hard to support transparency in xterm. I've been digging around in the code for other reasons. I fear that someone, somewhere is depending on this behaviour of Xterm so would recommend putting it behind a compatibility flag, which also should be straight-forward. But then there's the question of the default value. What should be the default? Black or transparent.

Do we know what the original VT240, 241, 330 and 340's did? Could I suggest trying to faithfully represent the experience of an actual VT as the default behaviour? You could test this by printing inverted space characters, then layering sixel graphics above them and seeing what color unspecified pixels render as.

I don't know that I care too much what the default is for the msft terminal as long as there's the capability of behaving like Xterm emulating a VT340. The code I've written to do loglines over ssh in the terminal sort of assumes the "unspecified pixels are black" behaviour described above. I'd have to rewrite that code if we make this change.

James Holderness · Answer 23 · Fri Jun 05 2020 03:27:37 GMT+0800 (China Standard Time)

If you're trying to align sixel image data with text characters on a terminal emulator, this can be frustrating as the software generating the sixel data may not know how many pixels high each character glyph is.

The original vt340s didn't have this problem because (of course) you didn't get to specify a font size when turning the terminal on.

Is there any reason why a terminal emulator couldn't just scale the image to exactly match the behaviour of the original DEC terminals? So if the line height on a VT340 was 20 pixels, then a image that is 200px in height should cover exactly 10 lines, regardless of the font size. That seems to me the only way you could remain reasonably compatible with legacy software, which is kind of the point of a terminal emulator.

I can understand wanting to extend that behaviour to render images at a higher resolution, but that should be an optional extension I think (or just use one of the existing proprietary formats). So ideally I'd like the default for Sixel to be as close as possible to what you would have gotten on an actual DEC terminal.

therealkenc · Answer 24 · Fri Jun 05 2020 16:15:18 GMT+0800 (China Standard Time)

Hey here are some relevant links for research:
"Basics for a Good Image Protocol" on terminal-wg

Sixel is broken because it cannot be supported by tmux with side-by-side panes.

therealkenc · Answer 25 · Fri Jun 05 2020 16:17:56 GMT+0800 (China Standard Time)

therealkenc commented 4 years ago

Deleted user · Answer 26 · Mon Jun 08 2020 06:55:39 GMT+0800 (China Standard Time)

It took some work (actually a lot of work), but with sixel one can perform nearly all of the "images in a terminal" tricks one can image:

Layered per-cell-masked images in a terminal: https://jexer.sourceforge.io/images/sixel_many_images.png
A floating (multiplexed) terminal window in a terminal that is using sixel for VT100-style double-width support: https://jexer.sourceforge.io/screenshots/jexer_sixel_in_sixel.png
"tmux-style" tiled terminals with images: https://gitlab.com/klamonte/jexer/-/wikis/uploads/7603381f82414ef9ae214bfcf759c064/example_tilingwm2_1.png
Multi-headed shared terminal session with differing text cell sizes showing the same plot: https://jexer.sourceforge.io/screenshots/multiscreen_2b.png
The use of sixel to render CJK and emoji that are not present in the main terminal's font: https://jexer.sourceforge.io/screenshots/xterm_sixel_cjk.png

I have included some other remarks at the referenced "good" protocol thread that might be of interest.

If nothing else, sixel is a good stepping stone to working out the terminal side infrastructure of mixed pictures-and-text. Speaking from direct experience, the terminal side (storing/displaying images) is about 1/4 as hard as the multiplexer/application side (tmux/mc et al).

Charlotte · Answer 27 · Fri Jun 26 2020 05:38:26 GMT+0800 (China Standard Time)

sixels are indeed the ideal solution for in-band graphics (for example over ssh): as they are supported by many existing tools, they are ready to use for practical purposes like plotting timestamp sync issues on the go.

As illustrated by therealkenc and further explained by klamonte in 640292222 everything can be handled with sixels, even side-by-side images, but it requires some work.

A while ago I was working with a few other people on a fallback mode for tmux, using advanced unicode graphics to represent sixel images in terminals that do not support sixel.

It is a bit like automated ANSII art, taking advantage of special block characters that are present in most fonts: this equivalent color unicode representation could be substituted for the sixels, then later overwritten by the actual sixel image (or not!). It would also solve the problem of keeping all the sixel pictures for scrolling back, by substituting them with low fidelity unicode placeholders (for ex to save memory), and having placeholders for sixel images when they can't be displayed for whatever reason.

The code was public domain. It could be usable immediately as a first step towards sixel support:

detect when sixels sequence are transmitted, then compute the unicode text replacement
diplay this unicode sequence, which is already supported by Windows Terminal
later, when sixels are implemented, render on top the sixel sequence.

Would you be interested?

BTW I recognize here my familiar gnuplot x^2 sin and 10 sin(x) plots I'm happy it provided some inspiration 😄

hackerb9 · Answer 28 · Fri Sep 11 2020 03:00:00 GMT+0800 (China Standard Time)

@DHowett Is acac350 a first step toward actually rendering sixel graphics? I'm getting requests for sixel support in Microsoft Terminal from folks using ssh and wanting to view directories of images using my lsix program.

Dustin L. Howett · Answer 29 · Fri Sep 11 2020 03:32:47 GMT+0800 (China Standard Time)

Sorta. We now have the ability to handle incoming DCS sequences. We haven't hooked up any handlers yet, but having the infrastructure to do so was pretty important. 😄

Chester Liu · Answer 30 · Fri Oct 09 2020 11:27:02 GMT+0800 (China Standard Time)

Here's some updates. I have a working branch here. An early screenshot looks like this:

Contrary to what I originally thought, the most difficult part of rendering sixel images is actually the conpty layer. Sixel images are supposed to be inline objects. The rendering of sixel images depends on the rendering size of a character. However due to the extra conpty layer we actually can not get the rendering size of a character when processing sixel sequences. This sounds very abstract and vague. Anyone who's interested in this can checkout my branch and see how it's done.

Overall, the conpty layer makes it very difficult to handle scrolling and resizing of sixel images. In my branch it works if you only need to display it. But both scrolling and resizing are completely broken.

WSLUser · Answer 31 · Fri Oct 09 2020 20:27:15 GMT+0800 (China Standard Time)

Didn't look yet but can you use pass-through mode to implement in Terminal itself? I would still add it in OpenConsole but sounds like sharing code isn't possible. Since Windows Terminal needs to be decoupled from OpenConsole at some point, you're best off simply duplicating the code for both. Also are you basing it on yours and j4james PRs for parameters? That would likely help as well.

Chester Liu · Answer 32 · Fri Oct 09 2020 22:10:58 GMT+0800 (China Standard Time)

@WSLUser Thanks for the attention. This screenshot is actually from about a month ago, when the fantastic parameters PR from j4james does not even exists. My work is entirely inside Windows Terminal, not conhost. I showed this PR to the Console team internally and made some progress since then. But I'm stuck because of the conpty problem.

WSLUser · Answer 33 · Fri Oct 09 2020 22:41:23 GMT+0800 (China Standard Time)

Yeah I'd rebase off of master and add #7578 and #7799. From there, maybe see what's missing in ConPTY for pass-through mode. I wonder Mintty is using pass-through for ConPTY mode.

Mike Griese · Answer 34 · Fri Oct 09 2020 22:50:05 GMT+0800 (China Standard Time)

I wonder Mintty is using pass-through for ConPTY mode.

Pretty sure mintty isn't using conpty at all 😜

The trick here with conpty is that the console (conpty) will need to know about the cells that are filled with sixel contents, as to not accidentally clear that content out from the connected Terminal. Maybe conpty could be enlightened to ignore painting cells with sizel graphics, and just assume that the connected Terminal will leave those cells alone.

That might mess up some of our optimizations (like we can't EraseLine rows that have sixel data), but it might be a good enough start

</showerthought>

James Holderness · Answer 35 · Sat Oct 10 2020 18:20:07 GMT+0800 (China Standard Time)

Maybe conpty could be enlightened to ignore painting cells with sizel graphics, and just assume that the connected Terminal will leave those cells alone.

This had been my original plan as well, and it may well be the best solution with the current conpty architecture, but there are a number of complications.

How would this interact with DCS streaming (which I don't think we've even got a solution for yet). I'm assuming we'd need some kind of split stream concept that passed the byte stream through to conpty at the same time as it's sent to the conhost buffer, but that seems like it would add a lot of unnecessary overhead to the process.
This would only work if you know the pixel cell size of the conpty terminal. I've mentioned before I think the best solution for Sixel is to match the cell size of the original VT terminals, and if we were doing that this wouldn't be an issue. However, as far as I'm aware, no other terminal emulators do that, so it wouldn't work with anyone else.

Chester Liu · Answer 36 · Sat Oct 10 2020 18:51:19 GMT+0800 (China Standard Time)

The second issue @j4james brought up becomes even more complicated with the consideration of different font, different font size and font resizing. So generally I think there's 3 aspects of the issue:

First conpty will need to know about the cells that are filled with sixel contents, Without this, the backing buffer in conpty and the drawing buffer in WT will be inevitably out of sync.
In order to do that, conpty will need to know pixel cell size in the drawing context, which is handled by the drawing layer in WT. There is a huge gap between conpty and the actual DXRenderer, which makes this a difficult task.
Besides, when the font or the font size changes, ideally the sixel image should change correspondingly.
And finally deal with other things like pane, alternative buffer, differential drawing, scrolling, etc.

James Holderness · Answer 37 · Sat Oct 10 2020 20:47:02 GMT+0800 (China Standard Time)

The second issue @j4james brought up becomes even more complicated with the consideration of different font, different font size and font resizing. So generally I think there's 3 aspects of the issue:

Just to be clear, my point was that none of that would be a problem if we exactly matched the behaviour of a VT340, so a 10x20 pixel image would occupy exactly one character cell, regardless of font size. It's only an issue if we want to match the behaviour of other terminal emulators, and that could always be an option that is left for later. There would still be complications with this approach, but I personally think they're less of a concern.

My bigger concern is that you seem to be ignoring the DCS streaming issue, which I expect could fundamentally change the architecture of the solution. The steps I would like to have seen are: 1. Resolve #7316; 2. Agree on a solution for cell pixel size; 3. Get something working in conhost; 4. Once all the complications are worked out in conhost, only then consider how we make it work over conpty.

Chester Liu · Answer 38 · Sat Oct 10 2020 20:53:18 GMT+0800 (China Standard Time)

Sorry for leaving the DCS streaming issue. In my current implementation I just store the entire string and pass it to the engine. This introduces performance issue when the sequence is larger. But at least it works. So my comments above are largely based on it. But you are right. The DCS streaming issue is actually the top priority if someone else want to get their hands dirty on this. 获取 Outlook for iOS<https://aka.ms/o0ukef>

Yatao Li · Answer 39 · Tue Oct 13 2020 17:23:50 GMT+0800 (China Standard Time)

Per discussion in #57, I thought conpty doesn't care about fonts at all?

wrt resizing I think the most natural way to do it is to "anchor down" the image into character cells once the image arrives, and re-calculate image size based on the anchor geometry. Anything else will cause inconsistency in image vs. character cells.

Chester Liu · Answer 40 · Tue Oct 13 2020 17:27:49 GMT+0800 (China Standard Time)

@yatli Yes. That's also what makes the issue tricky.

10x20 pixel image would occupy exactly one character cell

This is unfortunately wrong, at least for my current font setting.

Correct me if I'm wrong, but for pixel perfect image display, I think we do need to care about fonts.

Yatao Li · Answer 41 · Tue Oct 13 2020 17:28:39 GMT+0800 (China Standard Time)

@skyline75489 pls see my updated comment about the "anchor"

Yatao Li · Answer 42 · Tue Oct 13 2020 17:30:09 GMT+0800 (China Standard Time)

The cell data structure needs to be updated as char | sixel anchor

The sixel anchor should contain information about:

A pointer to the image object
The char cell region it occupies, in floating numbers (e.g. 5.2 lines x 7.8 cols)

Chester Liu · Answer 43 · Tue Oct 13 2020 17:37:12 GMT+0800 (China Standard Time)

It's a good idea but the implementation details were killing me, due to the extra translation in conpty layer. To avoid spamming people with email, feel free to reach me on Teams @yatli if you're interested.

James Holderness · Answer 44 · Tue Oct 13 2020 19:04:48 GMT+0800 (China Standard Time)

10x20 pixel image would occupy exactly one character cell

This is unfortunately wrong, at least for my current font setting.

What I'm suggesting is that you should make that the case. If you create a 10x20 pixel image and output it on a real DEC VT320 terminal, it's going to take exactly one character (at least in 80 column mode). So if we're trying to emulate that terminal, then we should be doing the same thing. If your current font happens to be 30x60, then you need to scale the image up. If your font is smaller, then you scale the image down.

This guarantees that you can output a Sixel image at any font size and always get the same layout. If you want it to cover a certain area of the screen, or you want to draw a border around it with text characters, you know exactly how much space the image will occupy.

Correct me if I'm wrong, but for pixel perfect image display, I think we do need to care about fonts.

It's true that you're not going to get "pixel perfect" images this way, but I don't think that should be the primary goal. Many modern computers have high dpi displays where it's routine for images to be scaled up, so it's not like this is a strange concept. And if we want to keep the layout consistent when the user changes their font size, we're going to have to scale the image at some point anyway, so you might as well do it from the start and get all the benefits of a predictable size.

And of course the other benefit of doing things this way is that it could feasibly be implemented over conpty. I don't see how you can make conpty work if the area occupied by the image is dependent on the font size, which you can't possibly know.

I'm not going to pretend this approach won't have any downsides, but I think the positives outweigh the negatives.

Petter Strandmark · Answer 45 · Tue Oct 13 2020 19:07:20 GMT+0800 (China Standard Time)

What if the font has a different aspect ratio than 10:20?

İsmail Yılmaz · Answer 46 · Tue Oct 13 2020 19:14:40 GMT+0800 (China Standard Time)

What if the font has a different aspect ratio than 10:20?

May I suggest reading this long - and somewhat "brutal"- discussion about the general problems regarding the inline images in terminal emulators.

It can give you the general idea.

Best regards

James Holderness · Answer 47 · Tue Oct 13 2020 22:10:10 GMT+0800 (China Standard Time)

What if the font has a different aspect ratio than 10:20?

The image may be a bit stretched or squished, but I don't think that's the end of the world.

Let me demonstrate with a real world example. Imagine I'm a Bond villain, and I've got an old security system using a VT340 as the frontend. Now because of the coronavirus, I'm in lockdown and working from home, so I'm logging into the system remotely with Windows Terminal. If we exactly match the VT340 this is no problem - the terminal looks like this:

But maybe I prefer fonts with a weird aspect ratio. So let's see what it would look like with Miriam Fixed, which is wider than most. The image of Bond now looks a bit squished, but he is still easily recognisable.

The alternative would be to go with a pixel perfect image (not currently feasible with conpty, but let's pretend for a second). Bond no longer looks squished, but now the image is only a fraction of the size it was expected to be. And the higher the resolution of your monitor, the worse this is going to look.

Maybe this is a matter of personal preference, but I know I'd definitely choose option 1 over option 2.

Also note that there is no reason we couldn't have options to tweak the exact behaviour when the font aspect ratio isn't 1:2. One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries. Any of these choices would be better than an exact pixel rendering in my opinion.

Jesús Leganés-Combarro · Answer 48 · Tue Oct 13 2020 23:06:27 GMT+0800 (China Standard Time)

Maybe this is a matter of personal preference, but I know I'd definitely choose option 1 over option 2.

Me too, just only it would be better to know the font has a different aspect ratio, so image can adjust itself and keep the correct one.

One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries

I think it's better to center them.

hackerb9 · Answer 49 · Wed Oct 14 2020 05:29:39 GMT+0800 (China Standard Time)

Maybe I'm misreading this thread. Are we actually talking about the terminal faking 10:20 characters for sixel image? I think that will cause many problems like the Bond distortion. Doing it the right way may be more difficult, but, in my humble opinion, a modern terminal should be font agnostic and leave it up to application programmers to deal with sixels and character cells.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application. The image viewing program I use works exactly like that. As I change font family or size, the displayed thumbnail updates to always be precisely five text lines high. The width is scaled proportionally for the image, unless it would be larger than a certain (in this case, rather large) maximum. By basing the image size on the character cell, it works automatically on high-DPI screens.

While the VT340 is a noble goal to emulate, fixing character cell resolution at 10:20 (and thus limiting resolution for the entire screen) is a mistake. The VT340 was only one of several sixel implementations, so its font size isn't necessarily more correct.

Forcing 10:20 will also lead to ugly kludges. (E.g., how to respond to a request for the size of the terminal window in pixels. Tell the truth, presuming they'll be positioning windows on the screen? Or, always return 800x480, presuming the user is scaling images for sixel output?)

James Holderness · Answer 50 · Wed Oct 14 2020 06:17:18 GMT+0800 (China Standard Time)

Are we actually talking about the terminal faking 10:20 characters for sixel image?

Yes.

a modern terminal should be font agnostic

This proposal is font agnostic. The application doesn't need to know anything about the font. That's the whole point.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

I'm not exactly sure what method you're using, but the way I've seen this done before is with a proprietary XTerm query to get the window pixel size, and another query to get the window cell size, and then using that data to calculate the actual cell pixel size. The downsides of such an approach are:

It's proprietary, so wouldn't work on a real terminal, or any terminal emulator that exactly matched a real terminal.
If the user changes their font size while your application is running, then your calculations will no longer be correct, and images will be rendered at the wrong size (unless you're continuously recalculating the font size which seems impractical).
If the user has a high resolution display, and/or large font size, you're forced to send through a massive image to try and match that resolution. Considering how inefficient Sixel is to start with, that can amount to a lot of bandwidth.

That said, I understand that this is a mode that some people may wish to use, and I think we should at least have an option to support it one day (for reasons discussed above, this just isn't possible at the moment). But in my opinion, this is not the best approach for Sixel.

Meadhbh Hamrick · Answer 51 · Wed Oct 14 2020 11:08:56 GMT+0800 (China Standard Time)

I have 300+ VT340's in nuclear power plants that I would like to eventually replace. There are commercial terminal emulation packages we could use, but I think all but one have been EoL'd. We have replaced some of them with Linux PCs running XTerm (or less frequently, Win10 + Hummingbird + WSL running XTerm), because it has a half-way decent open source sixel implementation and a sort of bad, but open sourced ReGIS implementation. The likelihood that we will be writing new software for the part of this system that generates the sixel octet stream is NIL. If your objective is to send graphics over an inline octet stream, there are other options. But if you want to support sixel graphics, you should support sixel graphics in a way that is halfway similar to previous implementations. This, unfortunately, means you should emulate the behaviour of exemplar systems (i.e. VT240, VT241, VT330 and VT340 terminals) even when it comes to integrating graphics with text. This is a mock-up of the kind of thing I'm talking about. It would be very nice if any new Sixel implementation maintains compatibility with existing implementations so images do not run off the edge of the screen or only fill half the screen. https://vimeo.com/user32814426/review/467991744/ac5892fa7e

hackerb9 · Answer 52 · Wed Oct 14 2020 14:19:01 GMT+0800 (China Standard Time)

a modern terminal should be font agnostic

This proposal is font agnostic. The application doesn't need to know anything about the font. That's the whole point.

I meant the terminal should be font agnostic instead of imposing 10:20 on every font. The application should be able to know the actual font size, if it wishes, since it's the application that knows the domain of what it is trying to show and can figure out the best way to present text and graphics together.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

I'm not exactly sure what method you're using, but the way I've seen this done before is with a proprietary XTerm query to get the window pixel size, and another query to get the window cell size, and then using that data to calculate the actual cell pixel size.

Yup, that's about right. There's also a query to directly get the character cell size, but I don't think that's as widely supported as just getting the screen size and dividing by ROWS and COLUMNS.

The downsides of such an approach are:

1. It's proprietary, so wouldn't work on a real terminal, or any terminal emulator that exactly matched a real terminal.

That's not a downside. It only means the program has to fall back on doing what it would have done anyway: presume $TERM=="VT340" means character cells are 10:20, "VT240" means 10:10, "mskermit" means 8:8, and so on.

Also, it's not an xterm proprietary sequence. Getting the screen size is called a "dtterm" escape sequence, but it was actually first implemented in SunView (SunOS, 1986). I believe it was later documented in the PHIGS Programming Manual (1992). Try sending "\e[14t" to a few terminal emulators and you'll see it is widely implemented.

2. If the user changes their font size while your application is running, then your calculations will no longer be correct, and images will be rendered at the wrong size (unless you're continuously recalculating the font size which seems impractical).

This is not a problem. The program simply traps SIGWINCH and only recalculates if the window has actually changed.

3. If the user has a high resolution display, and/or large font size, you're forced to send through a massive image to try and match that resolution. Considering how inefficient Sixel is to start with, that can amount to a lot of bandwidth.

Yes, sixel is extremely inefficient. But on modern computers, sending full screen images is quite usable, even over ssh. Does the Microsoft Terminal have some sort of baudrate limitation?

By the way, I believe sixel does have a "high DPI" mode where every dot is doubled in width and height. I've never used it and I don't think xterm even implements it, but perhaps that would alleviate concerns about bandwidth.

That said, I understand that this is a mode that some people may wish to use, and I think we should at least have an option to support it one day (for reasons discussed above, this just isn't possible at the moment).

This "mode" is simply having characters and graphics aligned just like the various historical sixel terminals did and current emulators do. I admit, I don't understand why it is not possible to do the same in Microsoft Terminal. If you say this 10:20 kludge is the best that can be done, I will trust that you are correct and thank you for doing it. A distorted picture is much better than nothing.

Jesús Leganés-Combarro · Answer 53 · Wed Oct 14 2020 14:47:51 GMT+0800 (China Standard Time)

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

@hackerb9, what's the actual escape sequence to get the font dimensions?

Hans Petter Jansson · Answer 54 · Wed Oct 14 2020 22:17:01 GMT+0800 (China Standard Time)

The relevant XTerm sequences can be found here: https://invisible-island.net/xterm/ctlseqs/ctlseqs.html -- look for XTWINOPS.

Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.

Just as a data point, the sixel branch for libvte is taking the cell size-agnostic route @hackerb9 is talking about. It treats incoming sixel data as "pixel perfect" and rescales previously received images across zoom levels and font sizes to cover a consistent cell extent. When merged, this implementation will be available to a large share of Linux terminal emulators, including GNOME Terminal, the XFCE Terminal, Terminator, etc. Superficially this seems to be interoperable with at least XTerm and mlterm.

Since libvte records a per-image virtual cell size, it'd be trivial to make this work with a fixed virtual 10x20 cell size too for interoperation. However, we'd need a way for programs to communicate their expected pixel:cell ratios to the terminal (e.g. by extending the DCS parameters). That could be very useful in general, since it'd also provide a form of pixel density control in bandwidth-constrained environments, as you touched on above.

Jesús Leganés-Combarro · Answer 55 · Wed Oct 14 2020 22:29:14 GMT+0800 (China Standard Time)

Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.

Linux console returns always 0... they should fix that, though, but seems are not willing too :-/

Charlotte · Answer 56 · Mon Dec 14 2020 13:04:45 GMT+0800 (China Standard Time)

What if the font has a different aspect ratio than 10:20?

The image may be a bit stretched or squished,
(...)
The alternative would be to go with a pixel perfect image (not currently feasible with conpty, but let's pretend for a second). Bond no longer looks squished, but now the image is only a fraction of the size it was expected to be. And the higher the resolution of your monitor, the worse this is going to look.

Also note that there is no reason we couldn't have options to tweak the exact behaviour when the font aspect ratio isn't 1:2.

Actually, there are 2 reasons: Windows Terminal supports neither TIOCGWINSZ nor OSC 14

One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries. Any of these choices would be better than an exact pixel rendering in my opinion.

This should not be left solely under the control of Windows Terminal: applications have ways to introspect the terminal properties, and adapt their behaviours. If they can't do that, the terminal implementation is broken.

Currently, any software outputting sixels can't introspect the size on Windows Terminal, so it can't adapt the size of the sixel images it sends to the font being used by the terminal. However, with the number of rows, columns, and the x,y size of the terminal in pixel, this is easy to do - and I would be surprised if the power plant monitoring software used as an example in #448 (comment) by @OhMeadhbh didn't already do that.

Problem is Windows Terminal doesn't return the correct values through TIOCGWINSZ, and doesn't support either the OSC14t query, so there's no way to make t work.

I opened a separate issue and referenced #448, as properly returning the windows size would help a lot.

As @hackerb9 pointed out in #448 (comment) :

Try sending "\e[14t" to a few terminal emulators and you'll see it is widely implemented.

Indeed, XTWINOPS is extremely basic and expected to be working.

As @hpjansson mentionned in #448 (comment)

Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.

I know, for I use that in production, and Windows Terminal not supporting either method breaks things.

Charlotte · Answer 57 · Mon Dec 14 2020 14:05:30 GMT+0800 (China Standard Time)

I have 300+ VT340's in nuclear power plants that I would like to eventually replace. There are commercial terminal emulation packages we could use, but I think all but one have been EoL'd.

I don't know if it's the right place to say that, but I offer these kind of services.

It would be very nice if any new Sixel implementation maintains compatibility with existing implementations so images do not run off the edge of the screen or only fill half the screen. https://vimeo.com/user32814426/review/467991744/ac5892fa7e

I have specifically written software that does just that: making sure whatever sixel sequence is sent will be displayed correctly, resizing it if needed to make the sixels fit properly.

If your budget is limited, I will soon release a new version of tmux-sixel that includes some of these features.

For example, it supports sixels in multiple panes (of course):

But it also supports scrolling, with textmode renditions to make it fast. Here, I displayed several images on the left pane, and I'm scrolling in the history (current position 10 of 129 lines)

These renditions also let tmux-sixel provide a fallback mode. It allows terminals that are not sixel aware to still "see" what the sixels represent, like current Windows Terminal and the various libvte terminals, instead of having some blank space:

As you may have noticed, I scrolled up a bit further to show you how everything that has been received is kept and displayable. Notice how the total is 128 lines now and how the math formula on the right handside is aligned differently: this is because of text reflow, to match the different terminals using different fonts and different resolutions.

If needed, the panes can be resized. In case you are not familiar with tmux, it is multi user capable: the users of the sixel-aware terminals and those of the "regular" terminals can all be attached to the same shared session, like a text-mode RDP / VNC remote session : they see the same thing, only in different quality depending on their terminal. And any one of them can type, so they can all work cooperatively.

Of course, text mode can't perfectly replace images, so it's not perfect, but still it can go quite far: this is what just 161 columns and 41 rows can give you - check the result of TIOCGWINSZ below

If you are uncertain it may fit your needs, and first want to test the performance on your images, that part is open source, so you can evaluate it using https://github.com/csdvrx/derasterize

Let me know if you would like to get in touch!

Christian Parpart · Answer 58 · Mon Dec 14 2020 16:49:44 GMT+0800 (China Standard Time)

@csdvrx oh dear, that looks awesome. You have done some really great work! I'd like to avoid some too early advertisement here, but how can I get in touch with you on very similar matters? You can email me christian@parpart.family with a very short "Hi" or so, so I can click on Reply. Sorry for the interference, but Github doesn't have private messaging yet (does it?) :-)

Jesús Leganés-Combarro · Answer 59 · Mon Dec 14 2020 17:19:22 GMT+0800 (China Standard Time)

I think It has, but if not, you can always look for the other user email on their profile page :-)

Yatao Li · Answer 60 · Mon Dec 14 2020 17:40:39 GMT+0800 (China Standard Time)

@csdvrx looks like you've made major breakthrough on this -- congratulations!
Haven't revisited my branch since 2017 but I think now is the time!

Charlotte · Answer 61 · Tue Dec 15 2020 02:10:22 GMT+0800 (China Standard Time)

@csdvrx looks like you've made major breakthrough on this -- congratulations!

Thanks! For the fallback mode, the next step is adding back fine details that have been stripped.

For example, you can infer the presence of the lemur hair under its chin, but you can't see them nicely. This is due to the loss of what are low frequencies in the spectral domain (FFT), as can be seen more easily if you run derasterize on https://github.com/csdvrx/derasterize/blob/master/samples/wave.png which I use for tests: it's simply a simulated collimator interference pattern, but it saves you the trouble of comparing spectrograms of inputs and outputs images.

Then it's obvious that the loss in more present at some frequency bands and following some vectors, and of course more extreme in lower resolution (Shannon-Nyquist, duh!)

My goal is to improve the result mostly in the diagonal and vertical, as the loss there is due to the set of characters chosen.

But you can't just try every Unicode glyph while looking for a better fit, as the problem is combinatorial explosion: if you want to keep some applications (ex: playing videos), you can't simply multiply the combining characters and the test character set, then test everything for every frame. Even with serious optimization, it's too slow.

The 2 best approaches seems to be 1) a band pass filter to concurrently select the character and the combining character 2) another pass sequentially, testing fitness improvement of just adding combining characters.

The difference in thinness between combining characters and regular characters make that perfect for a band pass approach, but it will require some calibration, so I'm partial to (2) as this would guarantee there's no regression: either the combining character improves the fitness score and it's added, or it doesn't and the current results remain.

Anyway, I will also release new versions of derasterize soon, with a new set of glyphs and a better optimized color picker. Ascii Art is becoming my new favorite Christmas activity :-)

Haven't revisited my branch since 2017 but I think now is the time!

Thanks a lot for your great work on tmux, which inspired me a lot!

But before revisiting your branch, could you please wait a bit to allow me to be done with the code review on mine?

This way, you can fork it!

Another thing someone may want to do is to release another https://github.com/saitoha/xserver-sixel : when sixel support comes to the Windows Terminal, it will be possible to run X inside, on vanilla Windows 10. This would be a major advancement for WSL.

Yatao Li · Answer 62 · Tue Dec 15 2020 03:38:12 GMT+0800 (China Standard Time)

But you can't just try every Unicode glyph while looking for a better fit, as the problem is combinatorial explosion: if you want to keep some applications (ex: playing videos), you can't simply multiply the combining characters and the test character set, then test everything for every frame. Even with serious optimization, it's too slow.

Sounds like a small scale convolution neural network would do...
Get some ground truth pics. Chop it into character-sized tiles. For each tile, search for the best representation (slow is ok for training). Throw a tile into the NN and force it to generate the good representation. Done (
(Just my random thoughts at 3AM)

But before revisiting your branch, could you please wait a bit to allow me to be done with the code review on mine?

Sure please go ahead!

Jesús Leganés-Combarro · Answer 63 · Tue Dec 15 2020 03:39:15 GMT+0800 (China Standard Time)

It's a bit crazy that you are trying to get high resolution images on the terminal using unicode characters, and in https://piranna.github.io/tty.css/ the most difficult thing I'm facing off is to get images to look low-res like they are shown in a terminal with block characters 😅

Charlotte · Answer 64 · Tue Dec 15 2020 06:11:12 GMT+0800 (China Standard Time)

Sounds like a small scale convolution neural network would do...
Get some ground truth pics. Chop it into character-sized tiles. For each tile, search for the best representation (slow is ok for training). Throw a tile into the NN and force it to generate the good representation. Done

It's am interesting idea: basically, selecting the most advantageous 'base blocks' using the equivalent of a LUT/rainbow table - except it will be embodied in code.

However, it would require a separate training step, which would have to be redone whenever new glyphs are added. Also, it may limit future evolutions by being a black box and not playing nicely with other refinements. More on that below.

(Just my random thoughts at 3AM)

These are good thoughts!

The core problem of encoding images is not new. What's new is the specific limitations that are imposed by going to a text format:

the set of glyphs is limited (depending on the terminal limits: unicode>ascii),
the resolution of each axis is asymmetric (as most characters are taller than wider, unless we go to CJK double width, but this piggybacks on the previous limitation)
each glyph can only take 2 colors, selected in a colorspace also dependent on the terminal limitations (24bpp > 256 color > 16 color > no color)

In a way, the problem reminds me of the history of TV image encoding: since some information is more important (luminance), it historically constrained the design: black and white TV was first, just like ASCII art was black and white first. In a way YPbPr is analogous to ANSI colors: you "spice" the most important signal with some extra eye candy, that can be safely ignored by limited decoders, so that at least you can get the basic signal, the most important part.

I've thought about different approaches, they all have their drawbacks:

black box approaches may limit future evolutions,
look up tables/rainbow dictionary are big,
some algorithm while nice on paper are not suited to the problem (more on that below)

On top of that, quality and speed optimize in opposite directions, that may limit some usecase (ex: playing a youtube video in unicode in real time). Simple solutions have their places, but they may divert us away from the bigger picture. The ideal approach should be versatile or at least flexible.

A memoization like approach is currently being introduced, but in a different way: the public version of derasterize uses a 4x8 block. Unicode characters were selected based on this limit. The version that will soon be released uses a 128 bit for the 8x16 block decomposition canvas, to introduce more glyphs.

Obviously, extending to 8x16 increases the runtime needlessly for glyphs that are equivalent in 4x8 and 8x16: a typical example is the half-blocks, like U+2584 lower half block and U+258C left half block.

The approach selected is not really a memoization: it works the other way around! The input image is downsampled to 2 different resolutions, and the algorithm uses both downsampled version for computing the fitness of the individual blocks as it iterates through the glyphs: when it hits those that are amenable to a simpler computation (like the halfblocks), it uses the downsampled version for computing the fitness score on 32 bit, as it will be faster than doing that on 128 bits!

In a way, it's like a binary decision diagram, except it's at the algorithm level.

I've been considering introducing other refinements. Fine lines using combining characters are not just for lemur hairs (!!), but for a big use case: graphs. This is because you don't want the thin lines to become blurry and be lost in the details of block decomposition.

But there are many other low hanging fruits!

Here's another simple example: circles may be worth preserving, because they are extremely important in human perception. And it can be fast too, if you use the right tools from historical computer vision: a few Hough transforms could quickly isolate the circles.

And I don't just want to only add the combining characters on the premade blocks in a second step (on top of my head U+0307, U+0308, U+030A, U+0323, U+0324, U+0325, U+0359, U+035A, U+0360 cf https://en.wikipedia.org/wiki/Combining_character) like for the fine details of the lemur hair. I mean introducing the set of round glyphs: ⴰ ⸰ ° ᴼ o ᴑ ○ O O ◯ ⚪

The NN approach could certainly dealt with that, except if circles fall on the border between blocks during the decomposition step... then, they would be ignored.

However, maybe it's worth tolerating a few pixels of difference in the circle center? And that's even if that would totally ruin the fitness score when it's computed at the pixel level. Maybe that's true even if the circle is put on the wrong 8x16 block.

It's not clear cut, so some advanced rule could be made to reserve that to the most important features (ex: preserve circles when they stand out in luminance) - and unfortunately all this would be lost in a NN approach.

If you find such problems interesting, would you be interesting in working on derasterize?

It's the testing ground for tmux-sixels, by working on standalone images first.

But before revisiting your branch, could you please wait a bit to allow me to be done with the code review on mine?

Sure please go ahead!

Thanks!

I will try to get the important feature upstreamed in tmux, but I fear it may be refused, as the author doesn't see the usecase for sixels and expressed his opposition in the past.

Yet sixels are so practical once you get a taste of them that I found it worth forking tmux. So did you, and a few others.

BTW I see you've cloned wemux. Apparently, you are inconvenienced by tmux limitations. If the upstreaming fails, what about joining our effort?

We could maintain our fork tmux-sixel, and add all that's missing in regular tmux!

Yatao Li · Answer 65 · Thu Dec 17 2020 01:17:09 GMT+0800 (China Standard Time)

@csdvrx good idea. I'm interested in co-op these topics. Let's move our discussions to tmux/derasterize issues and give room for windows terminal discussion here 😅😅

Dustin L. Howett · Answer 66 · Fri Apr 23 2021 02:14:50 GMT+0800 (China Standard Time)

@vulpinefoxxo Was it necessary for you to e-mail the 250 people subscribed to this issue so that you could request a status on a bug we've explicitly stated is not in scope for the near future?

You can register your agreement with the +1 button, and you can watch this thread with the subscribe button. Rest assured: any updates made to this thread will end up in your inbox. 😄

James Holderness · Answer 67 · Mon May 31 2021 21:21:05 GMT+0800 (China Standard Time)

Is there anyone here that has access to an actual VT340 terminal that is willing to run some compatibility tests? You'd just need to be able to output a text file to the device and photograph the result. I have about a dozen such tests I'd like to run, but depending on the results there may be some follow-ups. No need to commit to anything, though - any help would be appreciated.

And if nobody has a VT340, a VT330, or even a VT382 would be OK.

Dennis E. Hamilton · Answer 68 · Mon May 31 2021 23:48:17 GMT+0800 (China Standard Time)

And if nobody has a VT340, a VT330, or even a VT382 would be OK.

I have a question about ECMA-48 here. That specification (and its ISO counterpart) was last updated in 1991. Is there a provision in ECMA-48 that accomplishes this device-specific feature?

I'm referring to the Windows Console and Terminal Ecosystem Roadmap, which cites ECMA-48, and the Console Virtual Terminal Sequences defined so far.

Kalle Olavi Niemitalo · Answer 69 · Tue Jun 01 2021 13:53:45 GMT+0800 (China Standard Time)

@j4james, would a VT420 be suitable?

Jesús Leganés-Combarro · Answer 70 · Tue Jun 01 2021 14:20:20 GMT+0800 (China Standard Time)

@j4james, would a VT420 be suitable?

https://en.wikipedia.org/wiki/VT420

There were no color or graphics-capable 400 series terminals; the VT340 remained in production for those requiring ReGIS and Sixel graphics and color support.

So... no, seems a VT420 is not suitable, but thanks anyway :-)

Kalle Olavi Niemitalo · Answer 71 · Tue Jun 01 2021 15:03:07 GMT+0800 (China Standard Time)

@orcmid, the only ECMA-48 reference I can find in the console documentation is in Classic Console APIs versus Virtual Terminal Sequences:

These sequences are rooted in an ECMA Standard and series of extensions by many vendors tracing back to Digital Equipment Corporation and Tektronix terminals, through to more modern and common software terminals, like xterm.

AFAICT, ECMA-48 does not define sixel graphics or any similar feature. However, it defines the opening and terminating delimiters of the device control string that DEC uses for sixel graphics, so a terminal that does not support sixels can at least recognise the delimiters and avoid displaying the bytes between them as text. The data formats defined in ECMA-48 leave a lot of space for vendor-specific extensions.

Deleted user · Answer 72 · Tue Jun 01 2021 20:12:03 GMT+0800 (China Standard Time)

Is there anyone here that has access to an actual VT340 terminal that is willing to run some compatibility tests? You'd just need to be able to output a text file to the device and photograph the result. I have about a dozen such tests I'd like to run, but depending on the results there may be some follow-ups. No need to commit to anything, though - any help would be appreciated.

And if nobody has a VT340, a VT330, or even a VT382 would be OK.

Instructions for emulating VT-series terminals via MAME are here. .

A dump of some of the VT340 ROMs is here, along with a pointer to the discussion around getting those.

I have done the VT102 MAME, it worked alright. I haven't tried to get a VT340 going, and do not know if MAME got that capability or not. But perhaps there is enough out there to now to do it.

James Holderness · Answer 73 · Tue Jun 01 2021 20:53:19 GMT+0800 (China Standard Time)

@j4james, would a VT420 be suitable?

@KalleOlaviNiemitalo Unfortunately that won't do. But thanks for the offer. That would definitely be useful once I get around to working on some of the VT420 functionality.

I haven't tried to get a VT340 going, and do not know if MAME got that capability or not.

@klamonte Unfortunately the MAME VT3xx driver is just a skeleton - it's not functional. Their VT240 implementation is OK, and I've been testing with that as much as I can, but it doesn't have all the features of the VT330/340.

Dennis E. Hamilton · Answer 74 · Tue Jun 01 2021 22:16:27 GMT+0800 (China Standard Time)

. The data formats defined in ECMA-48 leave a lot of space for vendor-specific extensions.

It appears that DEC used the extension provisions as required, the usual agreement among implementations proviso entering into it.

It strikes me that Windows Terminal is not ncurses and I don't see any provision for device-specific selections that embrace extensions..

Another concern, however, is regarding accessibility. That has come up in issue #7766. I have no idea how there are requirements with respect to that for WT. I had not thought about that wrinkle and it impacts my ideas about demonstrating CUA display of the MS-DOS variety. I can see how extending outside of text, even raw Unicode text, might ECMA-48 already. Must look.

Deleted user · Answer 75 · Wed Jun 02 2021 05:44:54 GMT+0800 (China Standard Time)

@orcmid

I had not thought about that wrinkle and it impacts my ideas about demonstrating CUA display of the MS-DOS variety.

Do you mean what is commonly called TUI today, i.e. mouse-driven textmode windowing? If so, you may find this to be of interest. I do not have a Windows system with Terminal to test it on, but someone else told me that mouse was working as of version 1.4.3243.0 on Windows 10.0.19041.1 (using a Debian WSL instance).

WSLUser · Answer 76 · Thu Jun 03 2021 04:40:20 GMT+0800 (China Standard Time)

@j4james , do you have a build you can publish on your fork that provides Sixel? I can think of a few things to use for testing/debugging such as the sample projects using notcurses. My biggest hope for this feature in fact is to use notcurses applications in WSL2 (you will find the library is available in many distro repos).

James Holderness · Answer 77 · Thu Jun 03 2021 06:28:48 GMT+0800 (China Standard Time)

@WSLUser I'm afraid I don't have any plans to publish my build in the short term. It was really just an experimental framework for me to test different implementation strategies with, and to investigate whether it would be feasible for us to support both standard VT340 applications (which is my primary use case) as well as more modern sixel derivations (which require extended functionality).

Unfortunately what I've found so far is that modern apps often rely (usually unnecessarily) on broken behaviour in XTerm, and unless we replicate that behaviour (which then makes us unusable as a VT340 emulator) we won't be able to run those apps. So before releasing anything, I thought it might be best to try and get some of those apps fixed first, but I want to be absolutely certain of my facts when reporting bugs, which is one of the reasons I'm looking for a VT340 to test with.

I should also mention that I've been talking to some other terminal devs to see if they'd be willing to agree on a standard for apps to negotiate extended sixel functionality, again in the interests of supporting both VT340 and more modern apps. Unfortunately that discussion doesn't look like it's going anywhere, and I'm about ready to give up at this point.

Charlotte · Answer 78 · Mon Sep 27 2021 17:12:22 GMT+0800 (China Standard Time)

In case it might benefit anyone who has subscribed to this thread but doesn't want to wait for sixel support, I have decided to release the version of sixel-tmux that had been demonstrated here last year.

Compared to the previous version that only respected sixel sequences and therefore required a compatible terminal, this new version provides an immediate way to display sixel content inside most terminals (and therefore Windows Terminal) as it features an integrated derasterize.

This provides a way of converting sixels into something that can be displayed even by terminals that can't handle sixels natively, as long as they have enough colors for the content (fortunately, WT already support truecolor)

The source code is on https://github.com/csdvrx/sixel-tmux and the binary for msys2 on https://github.com/csdvrx/sixel-tmux/blob/main/tmux.exe

It works nicely in Windows Terminal to offer sixel-like features without being much slower than a standalone derasterize: see for example how it display the usual test images, after which derasterize is being used to show the lemur example:

Here's another one with Windows Terminal next to mintty, then being fed the content currently displayed in mintty when it connects to the shared session:

To launch it inside WT, please use script due to https://cygwin.com/pipermail/cygwin/2020-May/244878.html for example /usr/bin/script -c '/usr/bin/tmux c' /dev/null to create a new session or /usr/bin/script -c '/usr/bin/tmux a' /dev/null to attach to an existing session started in another terminal.

@yatli let me know if you find this sixel-tmux interesting, I would enjoy working with you on practical improvements as explained before

@hpjansson feel free to add other supports besides sixels, or to integrate a chafa backend if you can make a license exemption for tmux (as it's BSD)

Hans Petter Jansson · Answer 79 · Mon Sep 27 2021 22:39:37 GMT+0800 (China Standard Time)

@csdvrx Amazing work! Since you already integrated Derasterize, you probably don't need Chafa. But if you want a unified API to simultaneously support Kitty, iTerm, sixels and full-Unicode pseudographics at some point, you should be able to link with it from BSD source as the LGPL permits that. If there are license or other issues holding you back, I'm happy to help.

Maybe we should consider continuing the general sixel/terminal graphics talk in a discussion thread... Feel free to create one in the Chafa repo; as a project it touches on more or less every aspect of terminal graphics. E.g. we just started a thread about improving font support for pseudographics.

Charlotte · Answer 80 · Tue Sep 28 2021 10:49:33 GMT+0800 (China Standard Time)

@hpjansson thanks, I'm also a big fan of your work and I love how you implemented in chafa some of the unicode ideas I was pondering with :)

We can totally talk on another place if you prefer, I just wanted to make the first announcement here to reach out to people who may be waiting for a sixel support in Windows Terminal (maybe @migueldeicaza who started the request?)

Now, even if it's imperfect, they have a working stable solution: along with a few other persons, I have been using sixel-tmux for a year, it's stable, no weird issues except sometimes not being able to recognize and intercept very large sixels image (I suppose some tmux "optimization" is eating text again)

There's one issue with having terminal of different geometries synchronizing on one size for the derasterized output when in fallback mode, but there can't be a solution for that during shared session unless the sixel source is kept and separately derasterized on each client, which would be wasteful (but could be done when sixels sequences will be preserved)

The best I could come with is to let the geometry of whoever had the input dictate the size to the others, so that at least the person who requested the image got it of the right size. Others can tweak their font size to match the geometry (that's what the dots are for, giving feedback!)

As for formats, the idea is not to stop there, but to input "anything" and output "anything", so the sixels -> {sixel | derasterize } pipeline is just a beginning.

I don't have much experience with other formats like Kitty, but I see tmux as a simple place to put all this plumbing: this way, formats will cease to matter, and console users will be able to mix and match the graphical tools regardless of which precise format (say sixel or iTerm) their software require: as long as their terminal supports at least one known format, sixel-tmux could do the conversion.

A consequence of this idea is the desire to have everything under a BSD license ("universal donor") to facilitate code diffusion and adaptation - say in Terminal emulators.

Play a bit with it sixel-tmux if you can, see if you like the concept; a collaborative work would be in the best interest of the people who like graphics in terminals because there are not that many of us, yet much work to be done!

hackerb9 · Answer 81 · Tue Sep 28 2021 11:32:16 GMT+0800 (China Standard Time)

That looks fun! How does sixel-tmux do when tested with sixvid -b nyantocat.gif?

Hans Petter Jansson · Answer 82 · Wed Sep 29 2021 05:29:57 GMT+0800 (China Standard Time)

@csdvrx Regarding moving: I'm just concerned that we're hijacking an issue where the poor MS Terminal developers are trying to track their work :) But if they don't mind, it's obviously a non-issue.

Mike Griese · Answer 83 · Wed Sep 29 2021 05:35:38 GMT+0800 (China Standard Time)

At the moment, I don't care 😋 We know that sixel is something we need to work on, we've got some steps towards getting it to work done already. I'm fairly confident that @j4james is continuing to experiment with it. When we need to wrest control of this thread for our own feature tracking, I'll come back through and mark it all as off topic. Till then, go for it.

Jeremiah Johnson · Answer 84 · Wed Oct 06 2021 10:38:21 GMT+0800 (China Standard Time)

rasterize terminal text characters to image/texture.
overlay any graphics (sixel or otherwise) on image/texture created in 1.
blit image to screen (the compositor, really)

once 1 is working, which requires 3 to display, injecting 2 should be of relatively little concern. for game developers, I bet all three of these could be done in a week. they live and breathe rasterization and blitting.

keep your eyes open for a performance-delivering update to Terminal in the coming year. I bet you that sixel support will not be far behind. I have no inside information, and this is all educated guesswork.

my understanding is that 1 and 3 are being worked on, now, for performance reasons. number 2 won't be far behind.

James Holderness · Answer 85 · Thu Oct 07 2021 20:37:45 GMT+0800 (China Standard Time)

James Holderness commented 3 years ago

Mike Griese · Answer 86 · Thu Oct 07 2021 20:42:15 GMT+0800 (China Standard Time)

Mike Griese commented 3 years ago

Christian Parpart · Answer 87 · Fri Oct 08 2021 01:54:43 GMT+0800 (China Standard Time)

Haha. Github comment of the day! Source code or it didn't happen. :-P

James Holderness · Answer 88 · Fri Oct 08 2021 08:09:10 GMT+0800 (China Standard Time)

Source code or it didn't happen. :-P

Here you go:
https://gist.githubusercontent.com/j4james/9c2e67686306e2c37aa07e71fe1d2504/raw/7e0a1d7c6a0206e801241b4f23324c5a6a2d0997/owl.txt

But note that it requires a terminal that can emulate a 10x20 cell size for the owl to be positioned and sized correctly. I've also simplified the original code a little, and cut the image palette down to 15 colors, so it should theoretically work on a real VT340 now.

hackerb9 · Answer 89 · Fri Oct 08 2021 12:53:45 GMT+0800 (China Standard Time)

Success! Tested on a real VT340 and it worked perfectly.

For evidence, here is the output from the VT340 after I told it to send a MediaCopy to the host (essentially a screenshot) of the VT340 screen in sixel format:

https://gist.github.com/hackerb9/fb5eb56391e51de23af6dd5cedb12464/raw/ac9032df3afd62ea6e5f6f0b6a5621923c1a1630/vt340mediacopy.six

And here is that MediaCopy file converted to a PNG:

Note the occasional glitches where a byte has gotten its eighth bit set high is an artifact of my mediacopy.sh script. The owl looks perfect on the VT340's screen.

Deleted user · Answer 90 · Wed Jan 12 2022 02:53:36 GMT+0800 (China Standard Time)

@naikrovek cough cough Scroll all the way to the bottom for some notes on mixing images and text in a single cell.

Has anyone requested SGR-Pixels (mouse mode 1016) yet? I see it mentioned here as not supported. Decode sixel, add SGR-Mouse, and you've got a viable new gaming medium. Sure would be nice to have some roguelikes that could put real images in when they needed them.

Or if we need to be all business-y to justify it, how about a nice little MSPaint app that works over ssh?

@PhMajerus

While implementing Sixel, it is important to test with images that contain transparency.

Transparent sixels are available here from @hackerb9 and here from me, and can be generated by @hpjansson 's chafa and the git head version of this.

James Holderness · Answer 91 · Wed Jan 12 2022 03:05:30 GMT+0800 (China Standard Time)

I wasn't considering 1016, but I do have a POC of the DEC locator mode. There's no point in doing either of them until we have sixel, though, because the pixel coordinates will need to be tightly coupled to the sixel resolution.

Mike Griese · Answer 92 · Wed May 04 2022 03:05:46 GMT+0800 (China Standard Time)

Moving a discussion comment from #13024

In theory, we could work on DRCS and Sixel support without having to figure out how they transit ConPTY up front, which will let us parallelize that work

Could we now just flush the frame with TriggerFlush(false) to end any current buffered conpty content, and then pass that string through? I think we're getting pretty close here to getting the strings to the Terminal, albeit not rendered yet

James Holderness · Answer 93 · Wed May 04 2022 04:42:47 GMT+0800 (China Standard Time)

Could we now just flush the frame with TriggerFlush(false) to end any current buffered conpty content, and then pass that string through?

I don't think so, no. I suppose if all you want to do is "cat" an image from the command line with something like img2sixel, that might suffice, but anything complicated will probably break. We either need the full passthrough mode working, or for the conpty renderer to be capable of regenerating the sixel on the fly so it can repaint areas of the screen that have been invalidated. Of the two, passthrough mode seems more feasible.

Ofek Lev · Answer 94 · Sun May 14 2023 23:42:29 GMT+0800 (China Standard Time)

Has there been any update on this?

Mike Griese · Answer 95 · Mon May 15 2023 00:25:06 GMT+0800 (China Standard Time)

Nope. We'll make sure to update this thread when there is. In the meantime, might I recommend the Subscribe button?

That way you'll be notified of any updates to this thread, without needlessly pinging everyone on this thread ☺️

Murad "Gness Erquint" Beybalaev · Answer 96 · Mon Jan 15 2024 00:00:55 GMT+0800 (China Standard Time)

Happy New Year without Sixel, fellow dreamers!

Speaking of subscribing and sitting tight… GitHub seems to have basically hid this subscription from my account even despite re-subscribing. Having to find it through search to check back on the lack of progress. Guess 5 years is just too old to be tracked.