Hex Viewer

Custom editor extension to open files in raw byte representation. Also renders possible text decoding results. Doesn't support editing the document (yet).

Features

Select decoder to render textual data
Supports multibyte encodings
Custom decoder scripts (not supported in virtual workspaces)

Builtin decoders

The extension implements builtin decoders for standard text encodings. The following decoders are currently implemented:

ASCII (default)
ISO 8859-1
UTF-8
UTF-16
UTF-32

Decoders for multibyte encodings like UTF-8 will skip invalid byte sequences and render the skipped bytes with an error color.

Implementing a custom decoder

Custom decoder scripts are currently only supported in non-virtual workspaces. The decoder script should be a CommonJS script that exports a single function. The function should adhere to the Decoder type in the definitions below:

type DecodedValue = 
	| null // undecoded single byte, rendered as dot with weaker text color
	| string // decoded single byte
	| {
		// can be used for multibyte sequences
		// or for single byte values that should be rendered in a specific color
		text?: string; // treated like null above if not specified
		length?: number; // length of the byte sequence, defaults to 1
		style?: {
			color?: string; // valid CSS color string, can also be a CSS variable defined by VS Code for theming
		};
	};

type RenderControlCharacters = 'hex' | 'abbreviation' | 'escape' | 'caret' | 'picture';

interface DecoderConfig = {
	fileUri: string;
	settings: {
		renderControlCharacters: 'off' | RenderControlCharacters | RenderControlCharacters[];
	};
};

type Decoder = (data: Buffer, config: DecoderConfig) => DecodedValue[];

When the custom decoder supports multibyte sequences, the result array likely won't be of the same length as the source data. The sum of single and multibyte decodings shouldn't exceed the source data length. If the sum is less than the source data length, the result will be padded with null to match the source data length. Custom scripts are not sandboxed as VS Code extensions aren't either. The script's working dir is the workspace it is in or relative to the script itself if it is outside the workspace (specified by an absolute path).

Below is a sample decoder, that simply renders the numeric value of each byte in its decimal form:

module.exports = (data) => [...data].map((byte) => `${byte}`);

⚠️ Typically a decoded text unit is a single character but it doesn't have to be. The extension can render strings of arbitrary length for a single byte, the CSS grid layout makes sure the columns will align. With that said, strings that exceed two or three characters will likely distort the layout in unpleasant ways.

To use a custom decoder it has to be registered in the settings. The configuration entry point is hexViewer.customDecoders and is an object lists key/value pairs, where the key is the name for the decoder and the value is the path to the script file. Relative paths are resolved from the workspace root.

Configuration

`hexViewer.customDecoders`

Map of custom decoders. Keys are decoder names and values are the path to the JS file. Relative paths are resolved from the workspace root.

{
	"type": "object",
	"additionalProperties": {
		"type": "string"
	},
	"default": {},
}

`hexViewer.decode.renderControlCharacters`

Render control characters with a graphical representation. Supports multiple options for choosing a representation. If specified as an array of options, the oprionts are evaluated from left to right in descending priority (not all options cover all the control characters). Possible representations can be seen on Wikipedia. The hex option is the only one that also covers C1 control codes.

{
	"oneOf": [
		{
			"type": "string",
			"enum": [
				"off",
				"hex",
				"abbreviation",
				"escape",
				"caret",
				"picture"
			]
		},
		{
			"type": "array",
			"items": {
				"type": "string",
				"enum": [
					"hex",
					"abbreviation",
					"escape",
					"caret",
					"picture"
				]
			},
			"minItems": 1,
			"uniqueItems": true
		}
	],
	"default": "off",
}

Possible future enhancements

These are just ideas that may or may not happen:

more builtin decoders for standard text encodings like Windows 1252
async decoders
make default decoder configurable for file patterns
go to offset
data inspection
display unicode information for decoded text
find byte sequences / decoded text
make case of hexadecimal letters configurable
configuration for rendering whitespace and other invisible unicode characters
configurable offset width
editing documents

tao-cumplido / vscode-hex-viewer