googlefonts / fontc

Where in we pursue oxidizing (context: https://github.com/googlefonts/oxidize) fontmake.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GDEF glyph classes in IR

cmyr opened this issue · comments

In order to properly generate mark attachment rules, we need to know the GDEF class assignments for glyphs. This information can be specified in various ways:

  • in glyphs.app, it is a property on the individual glyphs.
  • in UFO, it is an optional field in the root lib, public.openTypeCategories

In addition to these locations, the GDEF classes can also be specified via FEA syntax, and this is the main source of problems. Basically: we would really like to have a single source of truth here. There is also a chicken-egg problem, where for generated features we need to know these class rules.

So: I would like to decide on some standardized behaviour, and I would like to move these classdefs into IR.

  • for glyphs.app sources, the source of truth will always be per-glyph categories.
    • Q: are these always available, or only if explicitly set by the user? Otherwise I assume they are computed from GlyphData.xml?
  • for UFO/designspace sources, the source of truth will be public.openTypeCategories dictionary, if present.

Issues to consider (and my current sense of how to address them)

  • what if a UFO does not have the public.openTypeCategories, but does have explicit classes defined in FEA? we use the categories from the FEA.
  • what if a UFO has both, and they don't match? print a warning and use the lib version.
  • what if UFO doesn't have either? good question. Probably I would synthesize one from GlyphData.xml? Is there a reason this is bad?

the FEA spec says that if the GDEF classes are not stated explicitly then they should be generated based on what glyphs appear in mark/base/lig mark attachment rules, but since we are generating most of these lookups automatically, this really doesn't work for us. We could infer these based on the lookups we generate, but this runs into possible problems if there are non-base glyphs that include an anchor that is used to attach a component, but which are not expected to participate in mark-to-base? @anthrotype mentioned this as a possible situation, although I'm not sure how common it is.

TL;DR:

  • let's add GDEF glyph class defs to IR
  • let's get it "not from FEA" by default, and fall back to getting it from FEA
  • let's use GlyphData.xml to synthesize it if necessary (always for glyph.app except for manual overrides; only for UFO if no lib key and no GDEF block in FEA)

for glyphs.app sources, the source of truth will always be per-glyph categories.
Q: are these always available, or only if explicitly set by the user? Otherwise I assume they are computed from GlyphData.xml?

the category/subCategory glyph property are not always available in a .glyphs source file, they are present only if the user has explicitly set those (via CMD+ALT+I glyph info panel); otherwise it is assumed that the embedded default GlyphData.xml is consulted (looked up primarily by the glyph name, alternatively by unicode codepoint associated with the glyph IIRC).
One can also provide a modified GlyphData.xml as input to fontmake --glyph-data (see #766), which would override (and can be a subset of) the default data file.

what if a UFO has both, and they don't match? print a warning and use the lib version.

Currently in ufo2ft, the FEA GlyphClassDefs would always take precedence over the public.openTypeCategories, since it is assumed to be kind of more specific/lower level; no warning is issued, they are checked in order and if FEA is present, the other isn't consulted.

https://github.com/googlefonts/ufo2ft/blob/4d9aca9d0976efdde1913ab0481f27ef33d5b15d/Lib/ufo2ft/featureWriters/baseFeatureWriter.py#L356-L376

Originally only the FEA GlyphClassDefs were available to be defined and some old fonts may use that approach, however I'd expect most fonts nowadays to only define the lib public.openTypeCategories, not the other (let alone both).

what if UFO doesn't have either? good question. Probably I would synthesize one from GlyphData.xml? Is there a reason this is bad?

no, that would not match fontmake and would surprise users of UFO+DS. The GlyphData.xml is very Glyphs.app specific and should only be used for .glyphs sources. The glyph names defined in there are not a standard.
If there are no categories defined, the mark writer should assume that any potential glyph that has attaching anchors is either a base, or a mark, or a ligature (if it has the _1, _2, etc. suffixed anchors).

this runs into possible problems if there are non-base glyphs that include an anchor that is used to attach a component, but which are not expected to participate in mark-to-base?

potentially there is this risk, but it's not really an issue (worst case we generate more mark rules than one may need), but the font developer has a documented way to ensure that only the selected group of bases, marks and ligatures are considered for mark/mkmk feature generation (by definining a public.openTypeCategories in the lib).
Also, I don't think it's common from a pure ufo+DS workflow (not converted from .glyphs) to have anchors that aren't meant to be used for mark/mkmk feature but only for automatic component alignment, since the latter is another specific Glyphs.app-only feature as far as I am aware.

let's add GDEF glyph class defs to IR

SGTM

let's get it "not from FEA" by default, and fall back to getting it from FEA

unfortunately, FEA currently would take precendence so we should match that.

let's use GlyphData.xml to synthesize it if necessary (always for glyph.app except for manual overrides; only for UFO if no lib key and no GDEF block in FEA)

we should only use GlyphData.xml for .glyphs sources

update:

of course, this is much worse than I'd imagined.

The categories used by glyphs.app are not really analogous to the GDEF categories. They work for 'marks', but there really isn't any coherent equivalent to "bases" (as far as I can tell.) There is a 'ligature' category, but no 'component' category, which makes sense I guess? this last category is confusing to me in any case, since I would expect most components to also be bases.

Also-also: it looks like glyphs has some heuristic to decide if something is a ligature, and basically anything_separated_by_underbars is considered a ligature? Which is the convention I suppose.

In any case I'm back to not really knowing how to think about this. We only really need to know if something is a mark, I think (we can infer ligature-ness from the anchors themselves) and so now I'm inclined to just adapt the GlyphData.xml data to tell me if something is a mark or not, although we should probably also let individual glyphs override this (as in, if the user has overridden the category via the cmd+alt+i binding)

There is a 'ligature' category, but no 'component' category, which makes sense I guess?

Right. A glyph becomes a component when it is consumed by a ligature substitution so it is purely contextual.

for glyphs.app sources, the source of truth will always be per-glyph categories.
Q: are these always available, or only if explicitly set by the user? Otherwise I assume they are computed from GlyphData.xml?

the category/subCategory glyph property are not always available in a .glyphs source file, they are present only if the user has explicitly set those (via CMD+ALT+I glyph info panel); otherwise it is assumed that the embedded default GlyphData.xml is consulted (looked up primarily by the glyph name, alternatively by unicode codepoint associated with the glyph IIRC). One can also provide a modified GlyphData.xml as input to fontmake --glyph-data (see #766), which would override (and can be a subset of) the default data file.

Okay interesting, I hadn't seen that issue.

what if a UFO has both, and they don't match? print a warning and use the lib version.

Currently in ufo2ft, the FEA GlyphClassDefs would always take precedence over the public.openTypeCategories, since it is assumed to be kind of more specific/lower level; no warning is issued, they are checked in order and if FEA is present, the other isn't consulted.

https://github.com/googlefonts/ufo2ft/blob/4d9aca9d0976efdde1913ab0481f27ef33d5b15d/Lib/ufo2ft/featureWriters/baseFeatureWriter.py#L356-L376

Originally only the FEA GlyphClassDefs were available to be defined and some old fonts may use that approach, however I'd expect most fonts nowadays to only define the lib public.openTypeCategories, not the other (let alone both).

okay, good to know. It does seem to me that if they're both present it's probably a mistake of some kind? But this seems like a minor point.

what if UFO doesn't have either? good question. Probably I would synthesize one from GlyphData.xml? Is there a reason this is bad?

no, that would not match fontmake and would surprise users of UFO+DS. The GlyphData.xml is very Glyphs.app specific and should only be used for .glyphs sources. The glyph names defined in there are not a standard. If there are no categories defined, the mark writer should assume that any potential glyph that has attaching anchors is either a base, or a mark, or a ligature (if it has the _1, _2, etc. suffixed anchors).

👍

let's add GDEF glyph class defs to IR

SGTM

let's get it "not from FEA" by default, and fall back to getting it from FEA

unfortunately, FEA currently would take precendence so we should match that.

let's use GlyphData.xml to synthesize it if necessary (always for glyph.app except for manual overrides; only for UFO if no lib key and no GDEF block in FEA)

we should only use GlyphData.xml for .glyphs sources

Okay so then this will be the situation as I understand it:

  • for ufo+ds sources, we will use whatever is defined in the FEA, falling back to the openTypeCategories lib key.
  • for glyphs.app, there are never explicitly defined GDEF classes, as far as I can tell? There are the Category/subcategory fields on individual glyphs, and these influence... something? I imagine it influences the glyphs code that generates mark lookups, and then I imagine that the GDEF classes are defined automatically by whatever is compiling the glyphs FEA code, based inferring the glyphs from what appears in various kinds of lookups? I could verify this experimentally but it would be nice to have this confirmed otherwise.

for glyphs.app, there are never explicitly defined GDEF classes, as far as I can tell?

they may be in the features.fea (perhaps less likely than in the UFO case), and when compiled with fontmake at least i'd expect them to similarly take priority, just like they do when compiling from UFOs.
glyphsLib used to actually write out a table GDEF block in the exported UFO's features.fea, but nowadays it is only writing a public.openTypeCategories in the UFO's lib.plist.

interesting, and glyphsLib is just using the glyph category/subcategory to determine the openTypeCategories? Is it using anything else? Is it setting the 'components' group?

As an aside, there also seems to be some special logic in glyphs for inferring whether a glyph is a ligature based on the name; e.g. if I create a new glyph with a name like v_e_r_y_u_n_c_o_m_m_o_n_l_i_g glyphs assigns it to the 'ligature' subcategory (but not very_uncommon_lig, but it does do this for both a_Abreve and a_Zbreve, which latter indicates it isn't checking that each component is a known glyph name.) Is this logic matched in glyphsLib somewhere?

glyphsLib is just using the glyph category/subcategory to determine the openTypeCategories?

https://github.com/googlefonts/glyphsLib/blob/e2ebf5b517d59bec0c9437da3a748c58f2999911/Lib/glyphsLib/builder/features.py#L205

setting the 'components' group?

those are unused/useless, ignore

inferring whether a glyph is a ligature based on the name

I think here https://github.com/googlefonts/glyphsLib/blob/e2ebf5b517d59bec0c9437da3a748c58f2999911/Lib/glyphsLib/glyphdata.py#L216-L242

Okay so per all of this I'm now feeling like moving GlyphData code into fontdrasil is unnecessary/incorrect, and all of that should stay entirely in glyphs2ir.

glyphsLib is just using the glyph category/subcategory to determine the openTypeCategories?

https://github.com/googlefonts/glyphsLib/blob/e2ebf5b517d59bec0c9437da3a748c58f2999911/Lib/glyphsLib/builder/features.py#L205

Interestingly, from that code:

Determining the categories requires anchor propagation or user care to work
as expected, as Glyphs.app also looks at anchors for classification

Which means that... we need to do this after anchor propagation, to be strictly correct, since propagation might add an anchor and so make some glyph into a 'Base'? This also complicates doing propagation in IR..