Making some internal fields public and allow serialisation of the OwnedFace type.

Question

Making some internal fields public and allow serialisation of the OwnedFace type.

wdanilo opened this issue 2 years ago · comments

Wojciech Daniło commented 2 years ago

Hi! I'm using the OwnedFace type and I have several needs that sound like something more people could use:

I'd like to access the information on where the glyf::Table data starts and where it ends. Would it be possible to either add an accessor to get reference to the glyf::Table::data or make it public?
I'm using the OwnedFace and I want to generate SDF representation of glyphs based on the binary data contained in it. I don't want to duplicate it, but unfortunately the SelfRefVecFace::data is private. Can we add accessor for this field, please?
Would it be possible to have serde::Serialize and serde::Deserialize derives for OwnedFace? Of course, this could be put behind a compilation flag. If we have an owned type, it seems that serialising / deserialising it may be a popular use case.

Yevhenii Reizner · Answer 1 · Wed Jul 27 2022 18:30:38 GMT+0800 (China Standard Time)

OwnedFace and SelfRefVecFace are part of the owned_ttf_parser crate, not this crate.
Table data ranges in the original file are not preserved. The closest thing is RawFace::table_records which is currently private. Why would you even what this? Maybe you're looking for Face::table_data()?
Do you want to (de)serde the whole font? It would not be possible, because ttf-parser doesn't parse anything except the core layout by default, so there is nothing to serialize. Also, a json representation of a font could easily reach hundreds of megabytes.

Wojciech Daniło · Answer 2 · Wed Jul 27 2022 18:56:25 GMT+0800 (China Standard Time)

@RazrFalcon thank you so much for the answers! As all of your questions are connected, let me reply not in points, starting with a short description of the use case I have here.

Use case

From legal reasons, we cannot ship the whole font representation in our app. So what I want to do is to parse the font, generate SDF representation of chosen glyphs, remove the glyph info from the font binary data, and serialise to disk the changed binary font data and the SDF glyph atlas (this will be our custom "font file" that we will be shipping with the app). This way, we will have a binary data containing helper info (such as the kerning table) + glyph atlases but without exact glyph shapes (with zeroed values in the glyf::Table::data).

OwnedFace issue (not really important here)

Right now, I'm converting the Vec<u8> to OwnedFace with OwnedFace::from_vec, and unfortunately, I can't access it anymore, as it's kept as private field, so I need to keep clone of the original Vec<u8> in order to later modify it. I understand that this is something I should report on the other repo.

This library issues

Regarding Table data ranges in the original file are not preserved - this is surprising to me. I was sure that all the Tables are just slices over the original &[u8], which means that I could use pointer arithmetics to understand where the glyf::Table starts and ends in the original Vec<u8>. I've created a code to remove this data and it seems to work:

    pub fn remove_glyph_data(&mut self) {
        mem::take(&mut self.msdf_font);
        let data_slice = &self.data[..]; // This is original `Vec<u8>`.
        if let Ok(font_face) = ttf::RawFace::from_slice(data_slice, FONT_FACE_NUMBER) {
            if let Some(glyph_table) = font_face.table(ttf::Tag::from_bytes(b"glyf")) {
                let glyph_table_ptr = glyph_table.as_ptr();
                let data_ptr = data_slice.as_ptr();
                // Safety: This is safe, as both pointers refer to the same slice.
                let start_index = unsafe { glyph_table_ptr.offset_from(data_ptr) };
                for offset in 0..glyph_table.len() {
                    self.data[start_index as usize + offset] = 0;
                }
            }
        }
        if let Ok(font_face) = ttf::OwnedFace::from_vec(self.data.clone(), FONT_FACE_NUMBER) {
            self.font_face = font_face;
        }
    }

What I don't like in the code above is the operation duplication. I already have self.font_face : OwnedFace, so I could access its internal Vec<u8> and I could access the glyf::Table information there without the need of constructing RawFace. Does it make sense to you?

Yevhenii Reizner · Answer 3 · Wed Jul 27 2022 19:32:42 GMT+0800 (China Standard Time)

Ok, I understand the idea.

I don't think that zeroing glyf table is enough to avoid licensing issues. I think OpenType layout tables (GDEF, GSUB, GPOS) are more valuable. But IANAL.
Do you aware that glyf is not the only way to store outlines?
Do you aware that modern fonts do no use kerning?
You might be interested in using fonttools. You can use the ttx utility to convert a font into XML, edit it, and convert back to ttf. Which is still illegal.
Yes, you can use pointer arithmetic to get offsets, but we do not provide any kind of API for that. It's a very rare use case.
Also, technically, by zeroing glyf you're making the font invalid, because you also have to update table's CRC and checksumAdjustment in head table. But many TrueType parsers, just like ttf-parser, simply ignore CRC.

Wojciech Daniło · Answer 4 · Wed Jul 27 2022 19:44:14 GMT+0800 (China Standard Time)

@RazrFalcon thanks so much for super fast answer!

Let me explain here an important thing – we are not trying to do anything illegal! We got an official information from the font store that if we will be able to display in our app glyphs but in such a way that the original fonts will not be extractable from the app, we can use a more liberal licensing. The whole thing is super strange, as they provide a lot of licensing options, including licensing for cloud-usage (which we are paying for), for phone-usage, even for watch-device-like usage, but there are no licenses available for standard desktop app installations. You can buy licenses of this font for "developers" that are using the font to build apps, but if the original font will be extractable from the app, then every user is considered a "developer". If it's not, then we can simply use cloud license + license for our developers. Anyway, what we are doing is not our "invention", we are following what our font provider told us to do.
I was not aware of it! Thank you so much for mentioning it! Would then zeroing GLYF, GDEF, GSUB, GPOS tables be enough to remove the info about glyph shapes?
I was not aware either. What is interesting is that so far we've been using DejavuSansMono in some parts of our app and it uses kerning. Using the rare opportunity that I can ask someone knowing much more about fonts than me, would you be so nice and tell me if there are any other tables that we should consider when layouting glyphs on the screen, assuming that we want to support English language only for now? (We have custom WebGL font renderer).
Thanks for mentioning it! Regarding legal issues, I hope I covered them in point 1, however, I'm still very thankful that you take care of that and you guide me not to do something stupid. I really appreciate it. Regarding doing that with font tools (via XML) versus the method that I described in my previous post - would that have any advantages? Zeroing (or removing) the data as in my code above seems like a good automated solution to me.
My only question here is if it would be possible to have API to gett reference to glyf::Table::data and to Face::raw_face. If these references (immutable) could be exposed, a lot of my code will simplify and I believe that it would not introduce any downsides for other users. What do you think of that?
Again, thank you so much for mentioning it. We will be parsing the updated binary format only with ttf-parser, so it will work in our case then.

Yevhenii Reizner · Answer 5 · Wed Jul 27 2022 20:22:33 GMT+0800 (China Standard Time)

Good. I'm glad that this part is covered.
Well... we're do I even begin. We have glyf, glyf + gvar (Apple variable fonts), CFF (OpenType/Adobe fonts) and CFF2 (OpenType variable fonts). Those tables contain font outlines. ttf-parser support all of them. Not to mention that there are also bitmap and SVG font tables out there.
As for GDEF, GSUB, GPOS, they contain glyphs layout metadata. Specifically complex substitution and positioning info (think Arabic scripts). They are required to do proper text shaping. Not sure what you app does right now. You might be interested in rustybuzz. But for Latin-only script you should be fine without it.
Well, fonttools provides proper subsetting, meaning you can use it to remove unused glyphs from the font. Basically a font optimization pass.
I think it's a very specific use case that can easily be implemented on the caller side.

You can also try ttf-explorer. This would help understanding the font format layout better.

Wojciech Daniło · Answer 6 · Wed Jul 27 2022 20:28:26 GMT+0800 (China Standard Time)

@RazrFalcon Again, thanks for fast reply!

:)
Thank you so much for this info!
Gotcha! :) We were considering rustybuzz, but for now it seems like an overkill for our use case. Thanks for mentioning it!
Make sense, thanks!
Without these changes I am required to do parsing 2 times (constructing both OnwedFace and RawFace). It is not a huge overhead, however, it will be just nicer to be able to access these fields. If I understand correctly your reply, you are not willing to expose this information in the API, am I correct? If so, then yes, I would just keep parsing the font data twice – I was just hoping that exposing these accessors will not be a bigger issue, as it does not negatively affect other users and might actually enable them to implement some more exotic use cases (as this one).

Thank you once again for all replies and your help!

Yevhenii Reizner · Answer 7 · Wed Jul 27 2022 20:34:34 GMT+0800 (China Standard Time)

Glad to hear.

Ok, I will think about exposing Face::raw_face.

Wojciech Daniło · Answer 8 · Wed Jul 27 2022 20:35:09 GMT+0800 (China Standard Time)

I would be really thankful for doing it!! ❤️

Yevhenii Reizner · Answer 9 · Sun Sep 18 2022 14:56:24 GMT+0800 (China Standard Time)

You can access this data using:

if let Some(record) = face.raw_face().table_records.into_iter().find(|r| r.tag == Tag::from_bytes(b"glyf")) {
    println!("{:?}", record); // TableRecord { tag: Tag(glyf), check_sum: 3391114546, offset: 28780, length: 105582 }
}