alacritty / alacritty

A cross-platform, OpenGL terminal emulator.

Home Page:https://alacritty.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for libsixel

HalosGhost opened this issue · comments

libsixel is an ANSI-compatible library for SIXEL/DEC graphics painting in a terminal. This allows for richer interface design and better integration between the text-based and graphical environments.

There is at least one person who has already started work on wrapping libsixel for use in Rust, though a different wrapper may be preferrable.

xterm whose terminfo alacritty defaults to already supports libsixel (through a compile-time option).

commented

Sixel support would be really nice to have. By the way it would not require any binding for libsixel format is pretty simple to implement directly.

sixel doesn't support 24-bit colors yet, but it is still good.

commented

Sixel does support 24-bit, it is just palette based. It is just not very efficient if you have too many distinct colors.

Extra motivation: it seems lsix relies on sixel.

Hey. I would love sixel support in alacritty as well.
Have there been any attempts yet?
Sounds like something I could try to implement, but unfortunately I'm pretty busy.
Might get around to it in a few weeks?

Have there been any attempts yet?

There have not.

Sounds like something I could try to implement, but unfortunately I'm pretty busy.
Might get around to it in a few weeks?

If you want to give it a shot, feel free and let me know if you need any help.

I'd be interested in working on this.

@Aaron1011 If you have any specific questions, please let me know either here or on the Alacritty IRC on freenode.

If anything, implementing both is a good idea. Sixel has quite a few nice uses.
It is not just displaying images. Plots for example: Sixel backend for Python's Matplotlib.

commented

Sevaral scientific plotting packages support sixels (example), which is nice because you don't need a browser or additional windows to see the plots. However, very few terminals, and no modern terminals, support it. It'd be nice to have a modern, fast terminal support this.

commented

@leotaku You're right that most of those programs generate a PNG and then translate it to sixels, and that the kitty protocol is likely to be a better, more modern solution. To me as a user, the important feature is to get images on the terminal, in a way that is as standard, well-documented and widely supported as possible.

Sixel is a standardized and accepted protocol. Kitty's own protocol is not.

You better create a new issue about kitty protocol.

@leotaku, I line up with @chrisduerr. Sixel has been a standard for an incredibly long time; more than that, it is wholly backwards-compatible. Every other format I've seen for images-on-terminal (including kitty, iTerm2, terminology, and others) is propretary, is not interoperable and/or is not backwards compatible.

Sixel is the only reasonable option. It also has the benefit of having several high-quality implementations out there already.

Is sixel going to support 24bit color space? The current color space is 8bit.

@crocket, I believe you were a part of this thread: saitoha/libsixel#44

tl;dr: most terminal implementations of true-color are actually also not backwards-compatible, and so, in principle, sixel cannot reasonably be compatible with them and maintain its backwards-compatibility. However, there is the --high-color option for more colors.

@HalosGhost I definitely see the advantage in using a well-established protocol. However, IMO this is somewhat of a special case:

  • Sixel in its "original form" was not at all intended for the applications it is used today (source: libsixel README)
  • Sixel does not properly support truecolor
  • There is a limited number of (actively developed) terminals that support Sixel
  • There is a limited number of (actively developed) TUI applications that support Sixel (most of the projects on the libsixel README haven't been updated in years)
  • Some modern terminals have already chosen to not support Sixel and go with their own implementation, sometimes because of the other reason listed here

Also, supposedly:

  • Implementing Sixel efficiently is hard (source: Kitty issue tracker)

The culmination of these factors lead me to believe that (at least exploring) other options should be considered.

Essentially I believe that we are currently in a state where Sixel is outdated and badly supported enough that it makes more sense to go with a new standard than to hack Sixel to support the features we want. (while also losing the backwards compatibility we initially choose Sixel for)

However that's just my opinion and I might be mistaken.

It's true that Sixel is palette-based and that makes it not the best solution if you need truecolor. But is that something that should be the first priority? If it's not 100% required I don't really see why even try moving to another system.

Kitty, iTerm2 and others do their own thing completely and they could make breaking changes in the future. Should alacritty be tied to their changes? Or if they make changes should we stick with whatever version of the protocol alacritty is on and not being able to use apps depending on their new protocol version?

Sixel is the only thing that can be considered a standard at the moment for graphics in the terminal. Sure, it can be improved on. But in my opinion it's better to stick to Sixel and if anyone wants to improve the Sixel protocol that's a different discussion but we would be in a better position to continue without breaking anything

@jmriego I think the ideal course of action to prevent the sort of thing you are talking about would be asking if any of the existing projects would be willing to treat their (currently proprietary) protocols as a standard and not make backwards-incompatible changes.

But is that something that should be the first priority? If it's not 100% required I don't really see why even try moving to another system.

Maybe, but I just don't see the appeal of choosing a solution that is suboptimal for the core thing it is intended to achieve...

Sure, it can be improved on.

Can it realistically be improved on in such a way that it is efficient and able to display images with high color depth?
If that's possible, sure I'd say go with Sixel, but from what I've seen that's not really possible.

From what I've seen the kitty authors want their protocol to be a standard and certainly document it as such:
https://sw.kovidgoyal.net/kitty/graphics-protocol.html

Is there anyone willing to implement any terminal graphics protocol in alacritty?
I'd welcome any protocol at this point.

Looks like the new version of iTerm2 includes sixel support.

I just finished implementing sixel read support for my TUI library, and it was surprisingly straightforward. The code to convert a string of sixel data to a bitmap image is here, and the client code for the Sixel class is here.

I have done very little for performance on the decoder. But when using the Swing backend, performance is still OK, as seen here. The snake image looks bad only because byzanz used a poor palette creating the demo gif.

I was a bit taken aback how quickly it came together. It's very fair to say that the "decode sixel into bitmap" part is the easy bit, the hard bit is the "stick image data into a text cell, and when that is present blit the image to screen rather than the character".

Lacking support for graphics is the reason I switched over to kitty... I think ultimately the question of which API to support isn't really all that important. At least in the sense that there's technically no reason to only support a single one, since they're only trivially different for the most part. The way I see it there are 3 kinds of APIs/formats/protocols:

  • SIXEL: The tried and true standard, supported by multiple terminals and a "wide" range of software. It's basically its own image format with in-band transmission to the terminal. A bit slow, and inefficient, but with transparent network support.
  • urxvt/iterm/etc: Basically just an escape code with a path to an image file and some metadata like dimensions and where to draw to. Some variations with in-band base64 encoded transmission or some such exist, but for the most part they're similar enough that they're almost interchangeable by a string substitution.Limited network support, depending on the exact implementation.
  • kitty: This one is the most complex with multiple ways to transmit images, both in and out-of-band, among them a very fast shared memory path, which allows for things like really nice and fast video playback with very little overhead. Supports both (A)RGB raw image and png, via path, in-band transmission and shm, with optional zlib compression.

Ultimately just adding SIXEL support requires decoding it and uploading it as a texture to draw it, but at that point decoding png/jpg/whatever and hooking up another escape code isn't really going to add much more work on top of that. The only thing that really changes is the front-end. So adding urxvt/iterm/we support would be almost free, with the advantage of being faster/more efficient and possibly higher quality, since it eliminates the redundant encoding/decoding step in between. The hard part is adding any kind of graphics support at all in the first place, I think.

kitty's protocol is the most complex with multiple ways to transmit image data, redrawing images multiple times at different positions without retransmission, drawing over and under text, with alpha blending, scaling and so on, but it's also the most complete and efficient way to draw graphics on a terminal. It's a bit overkill IMO, but not bad per se. This is the only one requiring substantially more work to add.

Sooo... Since this has been open for quite a while and no has one dropped any code yet, I've been thinking about looking into it myself during the upcoming spring holiday season, starting with SIXEL or whatever is easiest to get up and running at all and working my way up from there. No promises though ;)

The notion that once one protocol is implemented we could just tack on the other formats in a few minutes of work is entirely false. The different formats might share some similarities, since after all they all have the same purpose, however supporting multiple of them would lead to significant additional code that would likely go unused for many, which would however still have to be maintained.

Most of these protocols also have some serious flaws, making it pointless to just rush into it and try to implement some crappy protocol that isn't going to get any actual use out of it. Which is why most of the existing protocols are used very, very rarely, with people even preferring the hacky barely working w3m solution over them.

A few minutes might be a stretch, but given that images works at all, adding another protocol within a few hours doesn't seem implausible. I don't know alacritty's codebase well enough to say for sure, but I'd imagine adding any kind of image protocol to take much longer, relatively speaking. Not saying that supporting too many is a good thing necessarily...

But as far as the amount of code goes there's nothing in urxvt's protocol you wouldn't also need for kitty's protocol, for example. Same for iterm2. Everything it does would also be required for what kitty's protocol does, except for the literal escape code (and the download stuff...). Not sure what else there is, vector graphics is a thing, too, I guess. I was just saying that because you seemed fine with adding kitty's protocol which is kind of a superset of the others as far as I can tell.

If you only want a single protocol in alacritty that's understandable, too. I might also be overlooking details that invalidate some of my assumptions...

Getting the image buffer to display once you have it shouldn't be that much work. The biggest trouble is taking the escape sequence and somehow getting an RGB(A) buffer out of it.

I guess that nobody is working on this since there is no activity in the last month. I would like to start with it.

I wrote some notes about how to implement it. Any feedback will be very helpful.

References

The source code of Xterm contains some links for reference:

The domain ftp.cs.utk.edu does not exist anymore, but I found a few mirrors:

For sixel_graphics_news.txt
For all_about_sixels.txt

all_about_sixels.txt, written in 1990, is the best explanation about sixel that I have found.

Implementations

To test some sixel implementations I used this software:

  • Xterm 344.

    In X11 resoruces:

    XTerm.*.decTerminalID: vt340
    XTerm.*.numColorRegisters: 256
    
  • ImageMagick 6.

    Example:

    $ convert \
        -size 400x400 xc:white         \
        -tile plasma:                  \
        -draw 'circle 200,200 200,400' \
        -colors 16                     \
        sixel:-
  • Gnuplot 5.2, installed with the gnuplot-nox package in Debian Buster.

    Example:

    $ gnuplot -e "set term sixel; plot sin(x)"
  • The img2sixel command from the libsixel 1.8 package.

  • lsix.

Configuration

In Xterm we can enable or disable the support for sixel using the terminal ID (-ti option, or decTerminalID resource). I guess that some users will not want to include the support for images, so it can be useful to provide a way to enable or disable them.

Some options:

  • A new section in the configuration file. Something like this:

    graphics:
      sixel:
        enabled: true
        colors: 256
        max_memory: 64M
  • A graphics compile-time feature.

CSI Sequences

We need to add or extend some CSI sequences to support software like lsix:

  1. Extend CSI c (Primary Device Attributes).

    The terminal has to include 4 in the response. Currently, Alacritty only returns 6:

    $ read -p $'\e[c' -sd c ; printf %q $REPLY
    $'\E[?6'

    The response is fixed in the identify_terminal function. If the support for sixel is configurable (using a feature and/or a configuration item), the 4 should not be included in this response if graphics are not available.

    Xterm returns a few more:

    $ read -p $'\e[c' -sd c ; printf %q $REPLY
    $'\E[?63;1;2;4;6;9;15;22
  2. Add CSI ? Pi; Pa; Pv S: Graphics Attributes.

    This seems to be a specific sequence for Xterm. It is used by lsix to change how many colors are available (source). It is documented in the ctlseqs.txt file of Xterm:

    CSI ? Pi ; Pa ; Pv S
          Set or request graphics attribute, xterm.  If configured to
          support either Sixel Graphics or ReGIS Graphics, xterm accepts
          a three-parameter control sequence, where Pi, Pa and Pv are
          the item, action and value:
    
            Pi = 1  -> item is number of color registers.
            Pi = 2  -> item is Sixel graphics geometry (in pixels).
            Pi = 3  -> item is ReGIS graphics geometry (in pixels).
    
            Pa = 1  -> read attribute.
            Pa = 2  -> reset to default.
            Pa = 3  -> set to value in Pv.
            Pa = 4  -> read the maximum allowed value.
    
            Pv can be omitted except when setting (Pa == 3 ).
            Pv = n <- A single integer is used for color registers.
            Pv = width ; height <- Two integers for graphics geometry.
    
          xterm replies with a control sequence of the same form:
    
               CSI ? Pi ; Ps ; Pv S
    
          where Ps is the status:
            Ps = 0  <- success.
            Ps = 1  <- error in Pi.
            Ps = 2  <- error in Pa.
            Ps = 3  <- failure.
    
          On success, Pv represents the value read or set.
    
          Notes:
          o   The current implementation allows reading the graphics
              sizes, but disallows modifying those sizes because that is
              done once, using resource-values.
          o   Graphics geometry is not necessarily the same as "window
              size" (see the dtterm window manipulation extensions).
              For example, xterm limits the maximum graphics geometry at
              compile time (1000x1000 as of version 328) although the
              window size can be larger.
          o   While resizing a window will always change the current
              graphics geometry, the reverse is not true.  Setting
              graphics geometry does not affect the window size.
    

    Examples:

    $ read -p $'\e[?1;1;0S' -sd S ; printf %q $REPLY
    $'\E[?1;0;256'
    
    $ read -p $'\e[?2;1;0S' -sd S ; printf %q $REPLY
    $'\E[?2;0;1000;810'
    
    $ read -p $'\e[?1;3;512S' -sd S ; printf %q $REPLY
    $'\E[?1;0;512'
    
    $ read -p $'\e[?1;3;9999S' -sd S ; printf %q $REPLY
    $'\E[?1;3;0

    This sequence is parsed in the csi_dispatch function.

  3. Add CSI ? 8 0 h/l: Sixel Scrolling Mode.

    Sixel display mode (DECSDM) control function is used to enable or disable the sixel scrolling mode. The value 80 has to be added to Mode::from_primitive.

Regular text and images

This script can be used to mix text and images:

clear

# Grid with the ▢ character
ruby -e '7.times { puts "▢"*15 }'

# Draw a random image
tput cup 1 1
convert -size 130x120 -colors 4 plasma: sixel:- 

# Replace part of the image with dots
tput cup 2 3 
echo "...."

# Draw a red line
tput cup 2 0
convert -size 200x15 xc:red sixel:-

tput cup 8 0

Result:

Text and images

At the bottom-right, we can see that the image overlaps the text, instead of replacing it:

Image overlaps

When the text is selected, the original content is still there:

▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢
▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢
▢▢▢....▢▢▢▢▢▢▢▢▢▢▢▢▢
▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢
▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢
▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢
▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢▢

mintty uses the U+FFFC ᴏʙᴊᴇᴄᴛ ʀᴇᴘʟᴀᴄᴇᴍᴇɴᴛ character as a placeholder to indicate that there is an image in the cell (source), so it can't mix text and images.

I don't know if this behaviour is intended, or just an implementation detail of Xterm. But I think that we should support the ability to put text on top of images, and draw images over other images.

Parser

Running Alacritty with -vvv we can see that sixel data is received in the hook, put, and unhook functions.

[DEBUG] [unhandled hook] params=[0, 0, 0], ints: [], ignore: false
[DEBUG] [unhandled put] byte=34
[DEBUG] [unhandled put] byte=49
[DEBUG] [unhandled put] byte=59
[DEBUG] [unhandled put] byte=49
[DEBUG] [unhandled put] byte=59
[...]
[DEBUG] [unhandled unhook]

A possible approach to parse sixel data is to change the current ProcessorState to something like:

enum ProcessorState {
    Empty,
    PrecedingChar(char),
    SixelData(Box<SixelParser>),
}

And use it in the functions invoked for sixel data, like:

#[inline]
fn hook(&mut self, params: &[i64], intermediates: &[u8], ignore: bool, c: char) {
    if c == 'q' {
        self.state = ProcessorState::SixelData(Box::new(SixelParser::new(params)));
    } else {
        debug!(
            "[unhandled hook] params={:?}, ints: {:?}, ignore: {:?}",
            params, intermediates, ignore
        );
    }
}

#[inline]
fn put(&mut self, byte: u8) {
    match self.state {
        ProcessorState::SixelData(parser) => parser.put(byte),
        _ => debug!("[unhandled put] byte={:?}", byte),
    }
}

#[inline]
fn unhook(&mut self) {
    match self.state {
        ProcessorState::SixelData(parser) => self.handler.add_graphics(parser.result()),
        _ => debug!("[unhandled unhook]"),
    }
}

The return value of SixelParser::result will be the final image, in RGB format.

Storage

I'm not sure how to store the images in order to achieve the best balance between performance and memory usage.

If I understand correctly the current implementation, this is how Alacritty adds new content:

  • Input from the PTY (like sixel data) is sent to the impl Handler for Term<T> (source).
  • Term stores the content in grid: Grid<Cell> and alt_grid: Grid<Cell> (source).
  • Grid<T> uses raw: Storage<T> (source).
  • Storage<T> uses inner: Vec<Row<T>> (source).
  • Finally, Row<T> uses inner: Vec<T> (source).

Cell contains the character, the color, and the flags for each cell.

A possible approach to store images is something like this:

  1. Add a flag to indicate that the cell contains an image.

    bitflags! {
        pub struct Flags: u16 {
            // ...
    
            #[cfg(graphics)]
            const GRAPHICS = 0b0001_0000_0000_0000;
        }
    }

    When an image is added to the grid, the flag is set for every cell where the image is present. If some text replaces part of the image (after moving the cursor over the image), the renderer knows that some parts of the image are hidden.

    This flag is unrelated to the existing ones, but it is the only way to avoid increasing the size of the struct Cell.

  2. Add a list of images in struct Row<T>.

    pub struct Row<T> {
        // ...
    
        #[cfg(graphics)]
        graphics: Vec<GraphicsRow>,
    }
    
    struct GraphicsRow {
        raw: Arc<graphics::Graphics>,
    
        /// First column where the image is rendered
        start_column: Column,
    
        /// Offset from the top of the image
        offset_y: u16,
    }
    
    
    // graphics.rs
    struct Graphics {
        /// Unique identifier, from AtomicUsize
        id: usize,
    
        /// Pixels (in GL_RGB format)
        rgb: Vec<u8>,
    
        // Graphics size
        height: u16,
        width: u16,
    
        /// Height of the cells when the graphics was inserted
        cell_height: u16,
    }

    Using a reference counter we can release the memory of the image as soon as there are no more rows with it. I'm using Arc instead of Rc because Send is required by event_loop.rs:303.

    Row::reset() will clear the graphics list.

    cell_height is used only if the image needs to be resized (for instance, when the user changes the font size). Using the original height we can determine the new size of the image.

    We also need a way to invoke glDeleteTextures when the graphics is dropped. For example, we can have some global queue of dropped graphics identifiers:

    struct Graphics {
        // ...
        cleanup_queue: Arc<Queue<usize>>,
    }
    
    impl Drop for Graphics {
        fn drop(&mut self) {
            self.cleanup_queue.send(self.id);
        }
    }
    
    // In the renderer
    while let Some(id) = cleanup_queue.recv() {
        if let Some(gl_tex) = graphics_textures_cache.remove(&id) {
            unsafe { gl::DeleteTextures(1, [ gl_tex ].as_ptr() as *const _) }
        }
    }

Since Row needs to modify the cells to add the GRAPHICS flag, the add_graphics (or equivalent) function needs an impl bound.

trait GraphicsCell {
    fn set_graphics_flag(&mut self);
}

impl<T: GraphicsCell> Row<T> {
    pub fn add_graphics(&mut self, graphics: GraphicsRow) { ... }
}

What do you think about this approach? Also, I'm not sure about the name for the new types.

Rendering

The way to implement the render depends on the final solution for storage, but I think that the overall idea is something like this:

  1. In Display::draw, after rendering the cells, iterate over all lines containing graphics.

  2. For each graphics, render the fragment corresponding to the cell, if the cell has the GRAPHICS flag.

    If the graphics has no texture associated, create a new one (glGenTextures) and load the image (glTexImage2D). At this point, maybe we can release the memory used by the field rgb: Vec<u8> in struct Graphics.

The shader for render the graphics over the cells should be very similar to the shader used to render glyphs (text.f.glsl, text.v.glsl).

I guess that nobody is working on this since there is no activity in the last month.

That's to a large degree because none of the existing solutions are any good. I'm uncertain if it would be smart to add sixel support at this point.

In Xterm we can enable or disable the support for sixel using the terminal ID (-ti option, or decTerminalID resource). I guess that some users will not want to include the support for images, so it can be useful to provide a way to enable or disable them.

I don't think that's necessary. A sufficiently well working implementation shouldn't have any significant drawbacks for users.

I'd rather not add configuration options for this unless absolutely necessary.

A graphics compile-time feature.

I see no reason for this at all. If there's a runtime advantage to disabling this at compile-time, the implementation is not sufficient to land on Alacritty's master.

The response is fixed in the identify_terminal function. If the support for sixel is configurable (using a feature and/or a configuration item), the 4 should not be included in this response if graphics are not available.

Alacritty currently reports as VT102, which does not have support for the sixel graphics parameter. So we'd have to add support for VT220 first.

Add CSI ? Pi; Pa; Pv S: Graphics Attributes.

This seems to be a specific sequence for Xterm. It is used by lsix to change how many colors are available (source).

Sounds like it wouldn't really be required? At least for an initial implementation. But with good defaults this could potentially be irrelevant.

Regular text and images

This entire section seems like the sixel protocol is unnecessarily complex when it comes to interaction between text and images. I don't see any reason why you'd ever want to partially render text above an image for example.

It might be interesting to explore how other graphics rendering escape sequences handle this and if it would allow for a simpler Alacritty implementation.

I'm not sure how to store the images in order to achieve the best balance between performance and memory usage.

We should focus on performance really. I'd like to make it clear that this should not touch performance of text rendering at all. Though I suspect it wouldn't be that difficult to find an implementation that is both optimal wrt performance and memory usage.

Add a flag to indicate that the cell contains an image.

This entire thing wouldn't be necessary if text couldn't half-overlap images for example.

Add a list of images in struct Row.

Why would you add this to Row, instead of storing the images separately? Is it necessary to be able to partially clear images?

Row::reset() will clear the graphics list.

This is extremely performance sensitive. Doing any kind of significant work will likely significantly tank performance. Resetting the entire row instead of just dirty cells for example is impossible because of performance limitations, which itself is a fairly simple operation. So caution is required.

cell_height is used only if the image needs to be resized (for instance, when the user changes the font size). Using the original height we can determine the new size of the image.

Do we really need to resize images when the font size is changed? That just seems very complicated. Do other terminals do this or are they just clearing the image?

I'm uncertain if it would be smart to add sixel support at this point.

We can decide about it when it can be tested. I think it's worth a try.

[...]

The response is fixed in the identify_terminal function. If the support for sixel is configurable (using a feature and/or a configuration item), the 4 should not be included in this response if graphics are not available.

Alacritty currently reports as VT102, which does not have support for the sixel graphics parameter. So we'd have to add support for VT220 first.

What changes are necessary to support VT220?

I see that some features of VT220 are already present in Alacritty, like DECTCEM or ECH.

[...]

Add CSI ? Pi; Pa; Pv S: Graphics Attributes.
This seems to be a specific sequence for Xterm. It is used by lsix to change how many colors are available (source).

Sounds like it wouldn't really be required? At least for an initial implementation. But with good defaults this could potentially be irrelevant.

It is useful to get the size of the window (CSI ? 2 ; 1 ; 0 S), so applications can scale their images to fit the visible area of the terminal. I don't know if there is another way to provide that information.

We can ignore the commands to change the number of color registers, and provide only the getters.

[...]

Regular text and images

This entire section seems like the sixel protocol is unnecessarily complex when it comes to interaction between text and images. I don't see any reason why you'd ever want to partially render text above an image for example.

The problem will exist with any image protocol, not only sixel. We have to decide what happens if an application moves the cursor to a cell where an image was added, and then write something.

An option is just to ignore the new text, since most applications will not use that feature, but we still need a way to remove images.

If an application like hunter use sixel to preview a picture, how can it remove the image when another file is selected?

[...]

Add a list of images in struct Row.

Why would you add this to Row, instead of storing the images separately? Is it necessary to be able to partially clear images?

I added it to Row because I suspect that this method has better performance, but I have not done any measurement.

An alternative is to put the image list in Grid, so we have something like this:

struct Grid<T> {
    // ..

    images: Vec<Image>,
}

struct Image {
    id: usize,

    // Position in the storage
    top_line: Line,
    bottom_line: Line,
    column: Column,

    // Size
    width: usize,
    height: usize,

    // Data, RGB
    pixels: Vec<u8>
}

With this approach:

  • For the top_line and bottom_line fields, we have two options:
    • Add a counter to Row that is always increasing, so it can be used as a unique identifier.
    • For every new row added to the grid, update the fields in every image of the grid.
  • Every time a row is deleted we have to check if any image is out of the storage (storage.inner.first.id > image.bottom_line), so it can be removed.

Is this a better solution?

[...]

Row::reset() will clear the graphics list.

This is extremely performance sensitive. Doing any kind of significant work will likely significantly tank performance. Resetting the entire row instead of just dirty cells for example is impossible because of performance limitations, which itself is a fairly simple operation. So caution is required.

By «clear the graphics list» I mean just Vec::clear(&mut self.graphics). Most rows will have no images, so clear is almost a no-op. Rows with one image will have to decrease the counter of Arc and, sometimes, invoke drop.

[...]

cell_height is used only if the image needs to be resized (for instance, when the user changes the font size). Using the original height we can determine the new size of the image.

Do we really need to resize images when the font size is changed? That just seems very complicated. Do other terminals do this or are they just clearing the image?

Xterm always keep the original size. The main issue is that the text after the image will be in a different position.

For example, if we have this:

Before

And then reduce the size of the font, the shell prompt will be moved on top of the image:

After

I guess that changing the font size via configuration is not very common, so we can ignore the issue. It is more important when the font size is modified dynamically (IncreaseFontSize and DecreaseFontSize), but in that case we can scale the image with font_size / config.font.size.

An option is just to ignore the new text, since most applications will not use that feature, but we still need a way to remove images.

Jexer is an example of an application/library that can write on top of images.

We can decide about it when it can be tested. I think it's worth a try.

Oh, feel free to give it a try. I just don't want to waste anyone's time. Especially because the different protocols might have significant enough differences to be incompatible.

What changes are necessary to support VT220?

I couldn't tell you without looking it up myself. We're working on a list with all escape sequences here, but as you can see it's pretty empty.

I see that some features of VT220 are already present in Alacritty, like DECTCEM or ECH.

That is true, but afaik we cannot report to be a VT220 terminal unless all essential VT220 escapes are implemented. If a single feature is queried using terminfo for example, that requirement can obviously be circumvented, but that doesn't exist for this CSI escape.

It is useful to get the size of the window (CSI ? 2 ; 1 ; 0 S), so applications can scale their images to fit the visible area of the terminal. I don't know if there is another way to provide that information.

There are different escapes to query window, cell and padding size. But we do not yet support all of them (padding for example we do not).

The problem will exist with any image protocol, not only sixel. We have to decide what happens if an application moves the cursor to a cell where an image was added, and then write something.

That is true, but there are far simpler solutions than what has been suggested here. Just clearing the image for example would be far, far less complex and prevent you from having three layers of rendering. Writing below the image is another easier option that is less complicated. Writing partially over the image is about the most complicated solution there could be to this problem.

but we still need a way to remove images.

Of course, but that should be done by clearing the screen I'd assume? Not overwriting the image with characters one by one. That just seems extremely inefficient.

I added it to Row because I suspect that this method has better performance, but I have not done any measurement.

An alternative is to put the image list in Grid, so we have something like this:

It probably doesn't make a ton of sense to speculate unless it is tried out, I'm open to creative solutions as long as the code is clean and performance is good.

As you've already noticed tracking the terminal lines is not a trivial task, but we already do have to do that for selection so that shouldn't be a problem. Neither of your solutions would work, but we do have facilities in place that should make this fairly simple, at least once #3589 is merged.

By «clear the graphics list» I mean just Vec::clear(&mut self.graphics). Most rows will have no images, so clear is almost a no-op.

Doing it at the speed of yes output (which is a bit fast) might still be problematic. So I'd just be very careful about what I put in that method.

Xterm always keep the original size. The main issue is that the text after the image will be in a different position.

Having these kinds of behavior possible and seemingly unspecified (or at least nobody seems to care about it?), is probably one of the reasons why people don't seem to care about sixel. Without even having tested it out, it already seems very disappointing.

I'm curious how applications are expected to behave on reflow. What happens when the window is shrunk below the width of the image? I'd imagine XTerm just truncates the image?

Jexer is an example of an application/library that can write on top of images.

I have zero interest in supporting these kinds of applications in the terminal. However there might be some better arguments with slightly overlapped images, I can definitely see how that would be useful. In theory though that could all be handled by the application itself, but Alacritty would probably be able to do it faster.

If it doesn't add significant complexity, I'd probably go for an approach that supports it.

That is true, but afaik we cannot report to be a VT220 terminal unless all essential VT220 escapes are implemented. If a single feature is queried using terminfo for example, that requirement can obviously be circumvented, but that doesn't exist for this CSI escape.

If I understand correctly, the issue is adding 4 to the Primary Device Attributes command. According to its documentation, 4 only indicates that sixel is available. Do applications assume that other VT220 escape sequences are available if both 4 and 6 are found?

[...]

It is useful to get the size of the window (CSI ? 2 ; 1 ; 0 S), so applications can scale their images to fit the visible area of the terminal. I don't know if there is another way to provide that information.

There are different escapes to query window, cell and padding size. But we do not yet support all of them (padding for example we do not).

Do you mean ioctl(TIOCGWINSZ)?

I searched for other escape sequences to get the window size, but I could not find any. Also, it seems that the on_resize implementation does not copy the new size.

I guess that ioctl(TIOCGWINSZ) is enough for most applications, but other (like lsix) will not be able to get the actual window size.

Also, does ioctl(TIOCGWINSZ) works on Windows?.

[...]

I'm curious how applications are expected to behave on reflow. What happens when the window is shrunk below the width of the image? I'd imagine XTerm just truncates the image?

Yes. Since XTerm does not support text reflow, everything keeps its position. Also, when the window restores its width, the image is untouched, but the text is removed.

resize

If I understand correctly, the issue is adding 4 to the Primary Device Attributes command. According to its documentation, 4 only indicates that sixel is available. Do applications assume that other VT220 escape sequences are available if both 4 and 6 are found?

According to xterm's documentation, these parameters do not mean anything for VT100, but only for VT220+.

Do you mean ioctl(TIOCGWINSZ)?

No, I mean CSI 13 ; 2 t and CSI 14 ; 2 t.

Yes. Since XTerm does not support text reflow, everything keeps its position. Also, when the window restores its width, the image is untouched, but the text is removed.

I suppose that makes some sense, it should be possible to copy that behavior for images where it is just truncated.

Do you mean ioctl(TIOCGWINSZ)?

No, I mean CSI 13 ; 2 t and CSI 14 ; 2 t.

Oh, sorry. I thought it was something already implemented.

I think that it is reasonable to provide the window size using the window control sequences added by Sun's shelltool program. Both are non-standard, but CSI Ps t is more likely to be available in more terminals.

According to the docs in Xterm, codes from 11 to 19 (except 12 and 17) are used to get info about the window terminal. mlterm (which implements sixel) supports all of them:

$ env | grep -i mlterm
TERM=mlterm
MLTERM=3.8.6

$ for C in {11..19}; do echo -en "\\e[${C}t"; read -t 1 -sd t && printf "CSI $C t = %qt\\n" $REPLY ; done
CSI 11 t = $'\E[1't
CSI 13 t = $'\E[3;0;0't
CSI 14 t = $'\E[4;1881;1836't
CSI 15 t = $'\E[5;2160;3840't
CSI 16 t = $'\E[6;33;17't
CSI 18 t = $'\E[8;57;108't
CSI 19 t = $'\E[9;65;225't

$

In Xterm, by default, everything is blocked except CSI 15 t (report screen size in pixels), which looks like a bug in their configuration:

$ echo $XTERM_VERSION
XTerm(344)

$ for C in {11..19}; do echo -en "\\e[${C}t"; read -t 1 -sd t && printf "CSI $C t = %qt\\n" $REPLY ; done
CSI 15 t = $'\E[5;2160;3840't

$

An issue that we should consider is that, since Alacritty can set TERM to xterm-256color, some applications using $TERM =~ xterm to guess the terminal capabilities will expect CSI ? ... S to be available. At least, we should return an error (CSI ? Pi; 3 ; 0 S), so those application can use another method with no relying on timeouts.


By the way, do you prefer a big pull-request with everything, or smaller pull-requests with partial implementations of the feature?

In Xterm, by default, everything is blocked except CSI 15 t (report screen size in pixels), which looks like a bug in their configuration:

XTerm intentionally prevents interactions with the window by default. Unless you're trying to say that CSI 15 t working is the bug.

By the way, do you prefer a big pull-request with everything, or smaller pull-requests with partial implementations of the feature?

It depends. If it makes sense to split it up into smaller PRs, since those make sense indepedently then that's good. But splitting it up just for the sake of reducing diff size is pointless. We shouldn't introduce a bunch of changes that have no positive effect on Alacritty in a PR for something that might never actually land after that PR has been merged.

XTerm intentionally prevents interactions with the window by default. Unless you're trying to say that CSI 15 t working is the bug.

Yes, I think that someone forgot to add 15 to the default value of disallowedWindowOps.

By the way, do you prefer a big pull-request with everything, or smaller pull-requests with partial implementations of the feature?

It depends. If it makes sense to split it up into smaller PRs, since those make sense indepedently then that's good. But splitting it up just for the sake of reducing diff size is pointless.

Maybe we can split the feature in two patches:

  • The first one with the implementation for CSI 14 t and CSI 18 t, as described in dtterm(5).
  • The other one with everything else.

Do you agree to add CSI ? Pi ; .. S to return a failure (CSI ? Pi ; 3 ; 0 S)?

Maybe we can split the feature in two patches:

  • The first one with the implementation for CSI 14 t and CSI 18 t, as described in dtterm(5).
  • The other one with everything else.

That would certainly be a reasonable distinction.

Do you agree to add CSI ? Pi ; .. S to return a failure (CSI ? Pi ; 3 ; 0 S)?

I don't really see a point. If applications check for term name to determine features and users use a terminfo other than ours, that's their problem not ours.

It seems like an error should be returned when the operation couldn't be executed successfully, not always.

Sorry about just disappearing, the last few months were unexpectedly stressful, I lost my laptop with over two month's worth of work on an unpushed branch and only got a new machine like two weeks ago. I'm still interested in working on this, but until very recently I just wasn't able to. You can see that I only very recently picked up work on hunter, too, to fix the most itchy bugs I had already fixed in that branch I lost with my laptop. Oh well. I feel mostly comfortable to leave it as it is for now, so yeah...

If an application like hunter use sixel to preview a picture, how can it remove the image when another file is selected?

It's just overwriting the image with spaces. I'm not so sure this is strictly correct from reading this thread, but it works in xterm. Otherwise clearing the whole screen, or parts of it, using the usual CSI escape sequences also works. I don't think there's any explicit sixel clearing machinery. Anyhow, in xterm writing a character to a cell with image data replaces it with that character. For hunter this is just fine and in fact I prefer the simple semantics over kitty's complicated image data management where images stay where they are until explicitly deleted.

@ayosec
I wouldn't mind to start working on this right now, so I'm wondering if you have written any code yet (other than what you posted here) and/or how wild you are on actually implementing this yourself. Not sure how to proceed since you seem to have taken over and if you're in the middle of it, it doesn't make sense for me to start fresh. If no code exists yet, I wouldn't mind doing a POC using your notes as a starting point and then worry about the details and cleanup for a merge once I see something on the screen. So unless you tell me to stop for some reason I'll start pushing code in the next few days after taking a deeper look at alacritty's code.

EDIT:
If we end up working on this together, maybe you could join alacritty's irc channel, so we can coordinate our efforts better?

ftp://ftp.cs.utk.edu/pub/shuford/terminal/sixel_graphics_news.txt
ftp://ftp.cs.utk.edu/pub/shuford/terminal/all_about_sixels.txt

Do you maybe have a local copy of those files? The server hasn't been reachable all day and I'm not sure when it's going to be available again.

EDIT2:
https://www.digiater.nl/openvms/decus/vax90b1/krypton-nasa/all-about-sixels.text
The second one is available here, too.

We should focus on performance really.

Agreed!

@ayosec
I wouldn't mind to start working on this right now, so I'm wondering if you have written any code yet (other than what you posted here) and/or how wild you are on actually implementing this yourself. Not sure how to proceed since you seem to have taken over and if you're in the middle of it, it doesn't make sense for me to start fresh. If no code exists yet, I wouldn't mind doing a POC using your notes as a starting point and then worry about the details and cleanup for a merge once I see something on the screen. So unless you tell me to stop for some reason I'll start pushing code in the next few days after taking a deeper look at alacritty's code.

My first step was adding support for a sequence to get the window size (#3635), which is still unfinished.

It will be awesome if you can implement the feature.

ftp://ftp.cs.utk.edu/pub/shuford/terminal/sixel_graphics_news.txt
ftp://ftp.cs.utk.edu/pub/shuford/terminal/all_about_sixels.txt

Do you maybe have a local copy of those files?

https://github.com/dse/vt/blob/master/sixel_graphics_news.txt and https://github.com/dse/vt/blob/master/all_about_sixels.txt. There are other mirrors in #910 (comment).

Also, check out SIXEL GRAPHICS EXTENSION in the DEC reference manual. It explains everything about sixel.

Also, check out SIXEL GRAPHICS EXTENSION in the DEC reference manual. It explains everything about sixel.

Great resource, thanks! I started writing the SIXEL parser yesterday after reading through the alacritty code for a while and there are a few things I'm not exactly sure about, like if colors can be defined anywhere within the SIXEL data, or just in the beginning. Hopefully that reference clears it up.

Anyway, if things go well there should be a working prototype in 2 weeks or so. I've got some ideas about how the plumbing of SIXEL data can be done in a more clean way (although @ayosec's write-up was certainly a useful starting point!), but I was wondering if parsing shouldn't actually be part of vte? It also seems possible to directly add the SIXEL escape codes to esc_dispatch() instead of relying on hook() and unhook().

That's as far as I got yesterday. Well, I'm off to finishing the parser now.

My notes here may be of interest: https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/26#note_524831 , specifically:

I think terminal authors would benefit from working through sixel support as a step towards supporting a good image protocol, for the following reasons:

  • The internal infrastructure to support sixel will not be much different than what is needed to support a good image protocol.
  • A library of sixel tools exist that can stress-test their terminal under high image load. img2sixel against animations and Jexer (my library) can be useful to challenge garbage collection / reference-counting schemes. This testing is important: regardless of the merits of what an application is doing, the terminal should not crash from it.

For a terminal, it is useful to do sixel as a stepping stone to robustness. I have added several session captures at https://jexer.sourceforge.io/sixel.html for those who wish to stress-test their sixel implementation.

I have highlighted a key bit regarding reliability. Most people have no interest in supporting a Jexer-like application, and that's fine. (And actually, I think that terminals are digging into features that are not actually needed by the market.) But, if a terminal chooses to implement an image standard, then the door is wide open for every kind of mixed text-and-images to be sent to that terminal.

Right now anyone can crash multiple terminals just by 'cat'ing a file. In fact, I can point to only three terminals that survived their first exposure to Jexer-style sixel output: xterm of course, yaft, and RLogin. Every other terminal on my list with image support, including my own, has had to fix a crash bug along the way. This is why I chime in on these threads. I don't care if a terminal wants to run Jexer-type applications, I just want everyone to have more tools to ensure stability.

One final note: Jexer does not display images, and then overwrite the invisible parts with text as I believe was claimed earlier in this thread. Its strategy is outlined here . It draws images within text cell boundaries, and then draws text everywhere the images are not, so terminals are free to implement whatever they want in terms of image management. I suspect most terminals crash because they were not designed to handle up to { cols X rows } distinct images that are being overwritten all the time with new text and images.

https://github.com/rabite0/hunter file manager supports sixel and kitty image protocols.

commented

https://github.com/rabite0/hunter file manager supports sixel and kitty image protocols.

Yes and https://github.com/rabite0/ who made hunter is the one who has started implementing sixel for alacritty 😀

Is this still being worked on?

commented

@rabite0 Are you still working on this? Anything I can help out with or pick up?

I've set aside some time over the coming weeks, hoping to move the ball a little bit further forward.

@twitchyliquid64 @rabite0 I'm not a rust dev, but am happy to help however I can with this (testing, code review, etc).

The lack of any image support is what's been keeping me (and a few engineers I work with) from switching to alacritty, so I'd love to help move the ball forward on this, too.

After staawwrrking @rabite0's profile I get the impression they unfortunately got hit by life again in one way or another, let's hope they're doing alright.

The diff of rabit's POC branch is about 500 LOC, kind of leaving that here as a pointer because I'm not familiar enough with neither alacritty's codebase nor Rust to be of any help in pushing this forward. I might go and try to figure out what those changes do though, and how that lines up with what's been discussed in this issue, and if I don't immediately feel stupid I might tinker around a little.

Some of the prequisite work has been done by @ayosec and merged in #3635.

@twitchyliquid64 you might want to have a look at the info above.

Happy Holiday season, everyone.

I have more time now, so I'm going to try to implement this feature.

commented

I have more time now, so I'm going to try to implement this feature.

Let me know if there's anything I can do to help!

I have more time now, so I'm going to try to implement this feature.

Let me know if there's anything I can do to help!

Ditto. As said, I'm not really in a place to contribute anything meaningful implementation wise, but if you need somebody with a particularly stupid and broken system (currently running Ubuntu 20.10 with pretty much no "Ubuntu" in it apart from aptitude) to test your changes I'm game ;D

I'll be watching this place. Excited!

I have more time now, so I'm going to try to implement this feature.

That'd be awesome, and I'd love to help testing this. At notcurses We are eagerly waiting for this feature in order to be able to finish our sixel backend.

I've spent some time implementing sixel as per @ayosec's writeup and using @rabite0's sixel parser from the repo linked above. I've got img2sixel work but the scrolling is backwards and after displaying image, the text rendering is also affected which I haven't been able to figure out. Here is a sample alacritty window displaying a 1280x720 image which goes downwards:
image

The code I've put together is here https://github.com/tantei3/alacritty/tree/sixel

There are few things left:

  1. The image scrolling issue.
  2. Normal text doesn't render if an image is displayed, I didn't understand why this would happen either.
  3. Deletinng the images from OpenGL texture. I didn't understand how to share data between alacritty and alacritty_terminal crates.

If someone can provide me feedback on the code, I'd be happy to work on this further.

@tantei3 I'm sorry for the lack of updates on this issue after my last comment. I have been focused on the code and forgot about this.

My implementation is ready to be published as a draft. It can be found in master...ayosec:graphics. I'm doing some more tests with real-world applications, and writing a summary of the changes. I hope to have enough time this weekend to publish it.

Hi everyone!

I just published the draft to add support for graphics in the terminal. There are a couple of unresolved questions that need to be discussed, but the basic features should work with no issues.

I added support for both Sixel and iTerm2 protocols. You can see the details in #4763.

Examples

The following video shows:

  • Gnuplot with Sixel output.
  • Show a PNG file using imgcat from iTerm2.
  • Scrolling the grid.
  • Increasing and decreasing the font size to scale the graphics in the grid.
examples-with-actual-software.mp4

The following video shows video frames from a video in YouTube using a script to extract frames with ffmpeg.

video-10s-b.mp4

Help to test

I developed the feature on Linux/X11. I'd really appreciate if people can test the implementation in other environments (Wayland, Windows, macOS).

If you are interested, these are some examples to test:

Did you just reproduce 24-bit colors with sixel?

commented

boxes.sh on Wayland:

image

Is the partial occlusion with the subsequent shell prompt intended?

(And did I get the right branch?)

[xxx@xxx]:~> cd /tmp/
[xxx@xxx]:/tmp> git clone https://github.com/ayosec/alacritty
Cloning into 'alacritty'...
remote: Enumerating objects: 31, done.
remote: Counting objects: 100% (31/31), done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 14945 (delta 12), reused 26 (delta 11), pack-reused 14914
Receiving objects: 100% (14945/14945), 10.36 MiB | 8.80 MiB/s, done.
Resolving deltas: 100% (10096/10096), done.
[xxx@xxx]:/tmp> cd alacritty/
[xxx@xxx]:/tmp/alacritty> git checkout graphics
Branch 'graphics' set up to track remote branch 'graphics' from 'origin'.
Switched to a new branch 'graphics'
[xxx@xxx]:/tmp/alacritty> cargo run
<snip>
     Running `target/debug/alacritty`

I tested on wayland (sway) and it seemed to work pretty well. One thing I did notice is that if I use the iterm2 protocol with a 640x480 jpeg it takes a few seconds to render, however displaying the same jpeg using convert image.jpg sixel:- renders much faster. (although it looks better with the iterm2 protocol than sixel due to truecolor support). I'm curious why the iterm2 protocol takes so much longer.

@crocket

Did you just reproduce 24-bit colors with sixel?

What do you mean?

In Sixel, the parameters to specify a color allow values between 0 and 100, so you can have 101³ different colors (≈ 1M). For example, the command #0;2;100;100;50 set the color #FFFF7F to the register 0. However, a single image can use a limited amount of colors (usually 256, but some implementations allow 1024).

@twitchyliquid64

Is the partial occlusion with the subsequent shell prompt intended?

Not intended, but it is expected.

boxes.sh does not check the dimensions of the font, and it just assumes that the elements will be rendered in the first 40 rows. If you modify the last line from tput cup 40 0 to tput cup 41 0 (or more), the cursor will be at the correct position.

(And did I get the right branch?)

Yes, the code is in the graphics. I just uploaded some changes.

You can compile with --release if you want to test the real performance of the implementation.

@tmccombs

One thing I did notice is that if I use the iterm2 protocol with a 640x480 jpeg it takes a few seconds to render, however displaying the same jpeg using convert image.jpg sixel:- renders much faster.

This is unexpected.

Did you compile with the --release flag?

Can you compare the time for these commands?:

time (printf '\e]1337;File=inline=1:' ; base64 -w0 image.jpeg ; printf '\a' ; read -p $'\e[c' -srdc)

time (convert image.jpeg sixel: ; read -p $'\e[c' -srdc)

I did not compile with the --release flag. I can try that tonight.

Are there viable alternatives that support 24-bit colors?

Doesn't the iterm2 protocol in @ayosec's PR support 24-bit colors?

@ayosec, yes with a release build the iterm2 method is much faster. And is in fact now faster than the sixel method.

Just compiled https://github.com/ayosec/alacritty/tree/graphics
with cargo build --release

My system:
Arch linux, sway latest, wlroots latest

Some feedback:
for #910 (comment) time for same image

time (printf '\e]1337;File=inline=1:' ; base64 -w0 17022100.jpg ; printf '\a' ; read -p $'\e[c' -srdc)
=> real  0m0.005s
time (convert 17022100.jpg sixel: ; read -p $'\e[c' -srdc)
=> real  0m0.121s

for #910 (comment)

  • imgcat - works as expected
  • boxes.sh - works same as #910 (comment)
  • video - works as expected

Some notices:
For each key pressed during video playback - Alacritty logs warning:
[WARN ] [graphics] Can't decode base64 data: Encoded text cannot have a 6-bit remainder.

I think this is ok because of mechanism how video is displayed

testing @ayosec's code on macOS big sur:

crashes on startup unfortunately

% cargo run --release
    Finished release [optimized + debuginfo] target(s) in 0.32s
     Running `target/release/alacritty`
Created log file at "/var/folders/xj/x1yvjh0124jbm9663lt8y2bnp7m7vf/T/Alacritty-79498.log"
[2021-03-03 12:43:07.587250000] [ERROR] [alacritty] Alacritty encountered an unrecoverable error:

                                                    	There was an error initializing the shaders: Failed compiling shader: ERROR: 0:30: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:30: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:30: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:30: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:31: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:31: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:31: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:31: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:32: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:32: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:32: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:32: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:33: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:33: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:33: Use of undeclared identifier 'gl_FragColor'
                                                    ERROR: 0:33: Use of undeclared identifier 'gl_FragColor'

I've got no experience with OpenGL, but a quick search indicates that gl_FragColor is deprecated, and apparently not supported by macOS.

In alacritty/res/graphics.f.glsl, adding a definition:

layout(location = 0) out vec4 diffuseColor

followed by changing the #define TEX(N) to use diffuseColor instead of gl_FragColor works!

Haven't tried all the test cases, but imgcat works, and the video script works for a few dozen frames and then freezes, but I tried it in iterm2 and it does the same thing there, so no worse. Heck of a lot higher frame rate than iterm2, as well.

edit: I think the video hanging was caused by playing a large video for some reason; I added scaling parameter and now it runs flawlessly

@wfraser Thanks for testing it!

I will change the use of gl_FragColor with the rest of the pending changes.

@ayosec I tested your branch against Jexer . It survived without crashing (woohoo!) or leaving artifacts. Performance for larger images (e.g. img2sixel small movies) was "OK", faster than my Java Swing backend but slower than xterm. Performance against lots of small images wasn't so great though. Overall, kudos!

Performance for larger images (e.g. img2sixel small movies) was "OK", faster than my Java Swing backend but slower than xterm. Performance against lots of small images wasn't so great though. Overall, kudos!

This is unexpected. I tried a little script to compare the performance in both Xterm and Alacritty, with multiple sizes, and Alacritty is always faster:

Size Alacritty Xterm
10 0.00s 0.80s
50 0.01s 0.70s
100 0.03s 0.65s
500 0.56s 3.02s
Script to test performance
#!/bin/bash

set -x

cd $(mktemp -d)

for SIZE in 10 50 100 500
do
  seq 100 | xargs -IX -P$(nproc) \
    convert                      \
      -size ${SIZE}x${SIZE}      \
      -colors 128                \
      plasma:                    \
      sixel:X.sixel

  command time -ao totaltime -f "$SIZE - %e" \
    bash -c 'cat *.sixel; printf "\\e[c"; read -rsdc'
done

cat totaltime

Did you compile Alacritty in release mode?

Performance for larger images (e.g. img2sixel small movies) was "OK", faster than my Java Swing backend but slower than xterm. Performance against lots of small images wasn't so great though. Overall, kudos!

This is unexpected. I tried a little script to compare the performance in both Xterm and Alacritty, with multiple sizes, and Alacritty is always faster:

Hmm, when I look at it again it does look like alacritty is faster, a LOT faster. I wonder what it is I was doing, maybe something else was going on I didn't notice. Or...hmm...when I tested it the first time I was using an external monitor that behaves a bit odd sometimes, maybe it was some interaction with that and OpenGL.

But regardless, on my main laptop screen alacritty is winning these rounds, and sometimes by as much as 10x. So wow!

Did you compile Alacritty in release mode?

I used 'cargo build --release'.

Another question:

alacritty has support for iTerm2 image protocol. Did anyone ever settle on a means for the application to detect that without doing a terminal ID query and maintaining their own database, or do we just blindly send data and hope for the best? (EDIT: There are now 4 terminals with iTerm2 that I know of: alacritty, iTerm2, mintty, and wezterm. It's on its way to being the 24-bit standard.)

alacritty has support for iTerm2 image protocol. Did anyone ever settle on a means for the application to detect that without doing a terminal ID query and maintaining their own database, or do we just blindly send data and hope for the best? (EDIT: There are now 4 terminals with iTerm2 that I know of: alacritty, iTerm2, mintty, and wezterm. It's on its way to being the 24-bit standard.)

Adding iTerm's image protocol support is not a given, and in fact has been met with some resistance. The idea is that we only want to support a single image protocol.

As for ways to detect this, to my knowledge there hasn't been a concerted effort to standardize anything properly, as can be seen by the diverging protocols themselves. I wish I had something more helpful to say.

I was involved in the discussion circa 2019, and settled on this as a way to just get 24-bit image data across as simply as possible and force the application side to do all the heavy lifting -- which now that I've done it, see that it isn't so bad. The encoding side is a bit awful though.

With more terminals gaining robust sixel, perhaps a natural successor will become more apparent.

I'm seeing two potential problems:

  • When i use P2=1 for "transparent" pixels ("keep the color the same"), it's transparent all the way through the terminal, even if there was a cell printed there. Here's an example of what I'm seeing in your alacritty fork:

2021-04-14-014300_800x1417_scrot

note that the orca is a visible box, with the desktop shown behind it. On xterm, mlterm, and foot, i see the following:

2021-04-14-014608_884x1415_scrot

the stream being sent to alacritty is the same as that being sent to the other emulators -- the background glyphs are sent, and then the sixel is printed. i can break this down to a minimal example if you need.

  • From what I can tell, your branch and current alacritty return the same 6c as alacritty currently does. how do you propose users decide whether to query XTSMGRAPHICS? if queried on current alacritty, the read will hang, but there doesn't seem to be anything that indicates that XTSMGRAPHICS is safe to call on your branch. am i missing something? thanks!

it's transparent all the way through the terminal, even if there was a cell printed there

This is because cells are reset when a new graphic is added.

We can emulate the Xterm behaviour, but I remember that there were some cases where Xterm does something unexpected. My impression is that mixing characters and graphics in the same cell is a consequence of how it is implemented, and not an intended feature.

Do you have use cases for that feature?

From what I can tell, your branch and current alacritty return the same 6c as alacritty currently does. how do you propose users decide whether to query XTSMGRAPHICS? if queried on current alacritty, the read will hang, but there doesn't seem to be anything that indicates that XTSMGRAPHICS is safe to call on your branch. am i missing something? thanks!

IMO, the best approach is to send both XTSMGRAPHICS and DA (in that order), and wait until the DA is completed (i.e. you get the c of the DA response). If the terminal does not support XTSMGRAPHICS, you only receive the response DA.

I wrote a small program to test this approach.

In Alacritty, the response includes both sequences:

$ ./query_sixel
\x1B[?1;0;1024S
\x1B[?6c

$

In a terminal with no Sixel support (like RXVT) I get this:

$ ./query_sixel
\x1B[?1;2c

$

This is a good feature to have in alacritty. Is there someone still working on it?

This entire section seems like the sixel protocol is unnecessarily complex when it comes to interaction between text and images. I don't see any reason why you'd ever want to partially render text above an image for example.

Here’s a screenshot of a little prototype I’ve been working on:

Screenshot from 2021-06-03 03-35-37 sixel prototype

The lines are all drawn with a single rectangular sixel image, with transparency. You can see how printing text over the image is useful when the text lines up with with the image. Of course the transparency here is such that the text can be rendered below the image and the result is the same. (The messages are randomized, because prototype.)

https://www.youtube.com/watch?v=afNuDH7QpYA

This shows a real VT330 drawing some ReGIS graphics, with text lining up with the ticks on the axes.

It might be interesting to explore how other graphics rendering escape sequences handle this and if it would allow for a simpler Alacritty implementation.

Kitty allows you to specify a z-index for every image. All text is at z-index=0; negative z-indexes are below the text.

Given how both Alacritty and Kitty send text and images to the GPU, this is a simpler model to implement. The DEC terminals just modified the framebuffer, making the temporal order of things matter.

This is a good feature to have in alacritty. Is there someone still working on it?

There’s been some progress in the pull request.

The lines are all drawn with a single rectangular sixel image, with transparency. You can see how printing text over the image is useful when the text lines up with with the image. Of course the transparency here is such that the text can be rendered below the image and the result is the same. (The messages are randomized, because prototype.)

Yeah, any image protocol that doesn't allow this kind of atrocity is clearly superior if you ask me.

Hmm. Can you explain why it is an atrocity? It’s fairly important that map labels line up with the map, for example.

Screenshot from 2021-06-03 03-35-37 sixel prototype

oh my gosh, that is beautiful

Thank you :)

As an aside, you might also be interested in the interactive behavior. I don’t think I this player can reproduce sixels though!

Just curious. Is there a better image protocol than sixel? There are other image protocols such as kitty and iTerm.

Just curious. Is there a better image protocol than sixel? There are other image protocols such as kitty and iTerm.

There is an attempt to work on that. We have a common agreement on something called "Good Image Protocol", that arose in the terminal-wg forum, and later on picked up by me, where I am trying to formalize this into a draft spec with an initial implementation. I am currently busy with maintenance tasks in my own project, but soon I'll resume working and finalizing on the Good Image Protocol spec.

Good Image protocol sounds nice!!! I hope it won't require dbus or systemd. Freedesktop has ties to redhat which is known to push systemd and dbus wherever possible.

Just curious. Is there a better image protocol than sixel? There are other image protocols such as kitty and iTerm.

kitty's protocol has a lot of undefined behavior and Linux-isms (e.g. shared memory) that render it IMHO a dead end. iTerm2's protocol also has undefined behavior, doubles as a file transfer mechanism (which is annoyingly the default behavior), and at last check wezterm and mintty were the two most reliable implementations of it. Neither protocol advertises itself to the application via DA1 like sixel, so one has to either do terminal fingerprinting or just send the sequence and hope for the best.

I put together a very stipped-down "just get some 24-bit pixels across the wire" protocol here, but should anything emerge as the clear winner then I would move to that instead.

I put together a very stipped-down "just get some 24-bit pixels across the wire" protocol here, but should anything emerge as the clear winner then I would move to that instead.

i like the jexer protocol, though it has some shortcomings IMHO: the only way to indicate a transparent ("missing") pixel seems to be via supplying PNG data with an alpha channel, since the primary format is RGB with no way to denote a missing pixel. that's kinda annoying, because i rely on transparency to do graphics-over-text (i'd otherwise need somehow render the text myself, a dead end). since it's sixel, i can just print glyphs over it to destroy the graphic at cell granularity; since it's sixel, i likewise can't do glyph-over-graphic, like i can in kitty. likewise, while kitty has a number of features i question (local file transfer being primary among them), i consider its ability to delete and move graphics via identifiers very useful.

i like the jexer protocol, though it has some shortcomings IMHO: the only way to indicate a transparent ("missing") pixel seems to be via supplying PNG data with an alpha channel, since the primary format is RGB with no way to denote a missing pixel. that's kinda annoying, because i rely on transparency to do graphics-over-text (i'd otherwise need somehow render the text myself, a dead end). since it's sixel, i can just print glyphs over it to destroy the graphic at cell granularity; since it's sixel, i likewise can't do glyph-over-graphic, like i can in kitty. likewise, while kitty has a number of features i question (local file transfer being primary among them), i consider its ability to delete and move graphics via identifiers very useful.

i've started putting some thoughts down here, but it's in no way complete: STEGAP

Just curious. Is there a better image protocol than sixel? There are other image protocols such as kitty and iTerm.

kitty's protocol has a lot of undefined behavior and Linux-isms (e.g. shared memory) that render it IMHO a dead end. iTerm2's protocol also has undefined behavior, doubles as a file transfer mechanism (which is annoyingly the default behavior), and at last check wezterm and mintty were the two most reliable implementations of it. Neither protocol advertises itself to the application via DA1 like sixel, so one has to either do terminal fingerprinting or just send the sequence and hope for the best.

Name some undefined behavior the kitty protocol has, that yours does not. And shared memory is in the POSIX standard. It is not a "Linuxism". Not to mention it is optional to implement. As for the protocol not advertising itself, it not only does advertise itself, it does so in a comprehensive fashion that allows clients to not only detect the existence of the protocol but also detect the most efficient means of transmission for it, at the same time. https://sw.kovidgoyal.net/kitty/graphics-protocol.html#querying-support-and-available-transmission-mediums

If you wish to criticize something, at least take the time to understand it first,

And shared memory is in the POSIX standard. It is not a "Linuxism"

I think the point is it doesn't work on windows. Although windows does seem to have a shared memory mechanism so maybe it could be adapted? And if not, there are still other mechanisms of transferring the data.

As for the protocol not advertising itself, it not only does advertise itself, it does so in a comprehensive fashion that allows clients to not only detect the existence of the protocol but also detect the most efficient means of transmission for it, at the same time.

I think it is a bit of a stretch to call replying with a success message if you try to load an image "advertising itself". That said, there isn't really a great method of advertising terminal features. terminfo (and termcap) isn't as useful as one would hope, because so many terminals pretend to be xterm or a variant thereof. Environment variables risk polluting the environment of other programs. Some sort of "try it and see if it works" (the suggested detection method for kitty's graphics protocol) results in printing garbage to the screen on terminals that don't support it. I wish there was a standard escape code you could send that would cause the terminal to reply with the features it supports. But that is out of scope for this discussion.

And again that doesn't seem like a deal breaker for the kitty protocol, and could be readily remedied.

And shared memory is in the POSIX standard. It is not a "Linuxism"

I think the point is it doesn't work on windows. Although windows does seem to have a shared memory mechanism so maybe it could be adapted? And if not, there are still other mechanisms of transferring the data.

A terminal emulator running on windows will simply not support it. No client can use either the filesystem or shared memory without querying the terminal emulator, since neither of those is guaranteed to be available. So, if a TE doesnt want to support shared memory, it can simply return errors when the client queries it.

If you wish to criticize something, at least take the time to understand it first,

I will refer people to this thread from 2019 beginning here, in which I ceased participating after you said "IMO terminal multiplexers are horrible hacks".

If you have changed your design to eliminate the ambiguities, then wonderful! More terminals may choose to adopt your protocol, and develop the infrastructure to support the new applications that can enable. I welcome ANY 24-bit protocol to emerge as the de facto standard; maybe that will be yours.

In the meantime, I personally find your approach to these discussions quite distasteful, so have chosen to block you on GitHub and any other forum where we might cross paths in the future. Goodbye.

Goodbye.

🤣 So no actual objections, just FUD as usual. Don't let the door hit you on the way out.

I agree that terminal multiplexers introduce complexities. Tmux doesn't try to be friendly to terminal image protocols. I am trying to replace tmux with a terminal session manager such as dtach or abduco. Window management can be done by sway or i3.

If tmux developers wanted tmux to be compatible with image protocols, things may change.

Goodbye.

🤣 So no actual objections, just FUD as usual.

This is sadly happening a lot on the internet and you are not free from that either (i was once complaining to you for that precise reason somewhere else). I think it is also hard to always object correctly, and behave in a positive way. Your must agree, terminals and their features can be a heated discussion on everybodies ends.

Don't let the door hit you on the way out.

Note, this tone and understatement in sentences like these is, why so many people dislike others.

I can't change other people's behavior, but at least I'd like to raise the awareness to respect each other. Treat others at least like you expect to be treated.

As I said before, I am happy to discuss good faith proposals to improve
the graphics protocol. I will not however abide FUD. If you (and I mean
the generic you here, not you individually) wish to point out a
shortcoming in the protocol, it is incumbent upon you to at least read
the full thing and see if your points have not already been addressed.

For the record, additions to the terminal protocol I have made to
address genuine shortcomings pointed out by people that actually bother
to do their homework:

  1. Suppressing cursor movement when placing images
    kovidgoyal/kitty#3411

  2. Adding ids for placements and allowing clients to request unused ids
    kovidgoyal/kitty#3133

  3. The ability to suppress responses from the terminal
    kovidgoyal/kitty#3163

  4. Animation (although this may not strictly be a protocol improvement, more
    like a new sub-protocol)
    kovidgoyal/kitty#3498

And for reference, here is how to use shared memory in windows: https://docs.microsoft.com/en-us/windows/win32/memory/creating-named-shared-memory from a quick read it is sufficiently similar to POSIX shared memory to be useable unchanged in the kitty graphics protocol, but that would need to be verified by somebody that develops on windows.

Even if kitty's protocol isn't suitable as-is, it seems like a good place to start since it already has a working implementation. Rather than starting a whole new protocol from scratch.

I was in the process of muting this thread, but given the conversation above, I feel like I should probably say something instead.

This is not the first time I've seen someone get pushed out of a conversation by kovidgoyal. I don't want to speak for anyone else, though, so I'll just stick to my own experience with him.

Kovidgoyal's consistently antagonistic responses are actually the reason I recently started looking to get rid of Kitty in my own tool set (so thanks, I guess, for helping me find Alacritty). What I find a bit troubling now, though, is that he's seemingly being allowed to push people away from other open source projects, and not just his own.

I don't know if this is the right place to share, but this article was being passed around last year and I found it immensely helpful when considering what to do about similar situations in my own communities.

Anyway, I'm not very familiar with the this community, so I don't know how my response will be received. I don't see a code of conduct in the repo, and I honestly have no idea what to expect. I do know, though, that it's a lot of time and effort to manage an open source project, and this kind of thing is the last thing anyone wants to deal with on top of PR review, backlog triage, and the million other tasks required to keep a successful project afloat. I know that it's exhausting and that it's stressful, and so I'm sorry to ask for more work on top of that. But I would really appreciate it if a maintainer could say something about whether or not they're okay with everything above, and what is considered acceptable behavior in the Alacritty community.

Yes, indeed, I would like to hear from the alacritty maintainers if they
allow this kind of baseless personal attack in their community, and
request that Jonathan Wren be banned from it, forthwith.