Brief may update

Question

Brief may update

cmyr opened this issue 2 years ago · comments

status

The past month or so has been a bit slow, and a bit frustrating, and so I want to step back and outline my current thinking & progress.

I've ended up spending a bunch of time just developing a better understanding of how to think about table structure, packing, tracking offsets, etc. I've also spent a bunch of time trying to come up with an API/design that supports both from-scratch compilation as well as allowing reuse of parsed font data.

In brief, there is a fundamental tension between the design of a harfbuzz-style subsetter and the design of a general purpose font compiler. I am comfortable coming up with a design for each of these things independently, but I am finding it challanging to come up with something that does both (although I believe we should be able to have components, such as a repacker, that can be reused in both cases, and this is not insignificant.)

While the subsetter cannot be reused for compilation, general purpose compilation types can be reused for subsetting. We can, if desired, use these types to implement a subsetter or an instancer, in addition to a full compiler.

So I would now like to focus explicitly on the general case of 'types for compilation'. This involves having higher-level representations of the various tables and records, that can be created and mutated and then compiled down to their binary representations. It will be possible to create these types directly, and it will also be possible to convert parsed font types into these higher-level types.

If we find that the performance of this approach is unacceptable, we can then choose to write more specialized implementations. We can also do this piecemeal, specializing only specific tables.

next steps / tl;dr

I am taking the last two weeks of May off. I would like to spend the two weeks leading up to then really focusing on the general case of compiling font tables. I feel like I have been letting myself be too concerned with abstract concerns about performance and flexibility, and that this is letting me expand the scope of the problem too drastically. Once I have a design that basically works, we will be able to determine its limitations, and decide if there are specific areas that need further attention.

Behdad Esfahbod · Answer 1 · Wed May 04 2022 02:53:22 GMT+0800 (China Standard Time)

So I would now like to focus explicitly on the general case of 'types for compilation'. This involves having higher-level representations of the various tables and records, that can be created and mutated and then compiled down to their binary representations. It will be possible to create these types directly, and it will also be possible to convert parsed font types into these higher-level types.

If underneath your 'types for compilation' you have a serializer layer that takes iterators to compile the binary objects, then the harfbuzz-style subsetter can feed its own iterators to the same serializer.

Rod · Answer 2 · Wed May 04 2022 03:22:33 GMT+0800 (China Standard Time)

there is a fundamental tension between the design of a harfbuzz-style subsetter

Can you make that more specific? - I don't disagree, I just think it can be spelled out a little more. IMHO it might be worth starting with examples of specific problems caused by this tension, and then generalizing the concern.

I think, as is perhaps suggested by the comment about repacker, my expected outcome would be that some (most?) things are shared by subsetter and compiler but not all. Hopefully most of the complicated bits :)

Colin Rofls · Answer 3 · Wed May 04 2022 23:23:26 GMT+0800 (China Standard Time)

If underneath your 'types for compilation' you have a serializer layer that takes iterators to compile the binary objects, then the harfbuzz-style subsetter can feed its own iterators to the same serializer.

Yes, this is what I am picturing: the low level serializer API will have an interface where types write themselves out as bytes, and indicate the positions and 'object ids' of any offset tables.

there is a fundamental tension between the design of a harfbuzz-style subsetter

Can you make that more specific? - I don't disagree, I just think it can be spelled out a little more. IMHO it might be worth starting with examples of specific problems caused by this tension, and then generalizing the concern.

Yes, fair, this is vague.

The fundamental tension is that a major component of the design of the harfbuzz subsetter is that it doesn't allocate, and it is a requirement of the general purpose compiler that it does. This means we end up with unrelated owned/zerocopy versions of various types, which leads to complicated type signatures if we want to be able to use these interchangeably.

This tension constrains the design space in weird ways. As an example, when using the compilation types, you might want to build up a typed representation of a given table, without serializing it yet. This becomes way trickier if we want to mix the compilation and zerocopy types, since we then need to track the lifetimes everywhere.

To avoid this I spent a bunch of time working on other approaches that serialized everything immediately, but this also led to frustratingly bad UX. As one simple example: when serializing a parsed table, we need to track the bytes used to resolve child offsets, as well as the type information of the child tables. Trying to generate code to handle this was frustrating, since the requirements of passing down the bytes with which to resolve children complicates the API in the general case.

I think, as is perhaps suggested by the comment about repacker, my expected outcome would be that some (most?) things are shared by subsetter and compiler but not all. Hopefully most of the complicated bits :)

I agree, and I think that designing a serializer/repacker API that is reusable is a much cleaner goal than what I was trying, which was something more like having table builders that could interchangeably use either compilation or zerocopy types.

Rod · Answer 4 · Fri May 06 2022 12:15:11 GMT+0800 (China Standard Time)

I think it might be wise to loop in @garretrieger. He's something of an expert on subsetting and repacking so I'm optimistic if you guys grabbed an hour or two to chat you might come up with some interesting ideas.

Colin Rofls · Answer 5 · Fri May 06 2022 22:20:57 GMT+0800 (China Standard Time)

For the time being I've got a dummy table packer that just fails on overflow, but I will definitely reach out to garret when it's time to get serious. :)