Currently:
The virtual text is added after the image, so nvim renders the image padding first, and then the text. But we're assuming that the image should be below all of the extmarks that are on the same line. This is something that I missed in my last PR