mapbox / DEPRECATED-mapbox-gl

Issue-only repository for discussion of Mapbox GL (DEPRECATED)

Home Page:https://www.mapbox.com/mapbox-gl/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Complex Text Rendering

mikemorris opened this issue · comments

What is the state of text rendering in Mapbox GL?

We currently do not render scripts that require bidirectional support or complex text shaping correctly in mapbox-gl-native or mapbox-gl-js. This ticket will track adding proper support to both projects.

Broken Arabic labels example courtesy of @mushon

What's missing?

We need to add the following functionality for proper text rendering:

  • Unicode Bidi Algorithm to cut labels into logical segments and flip the display order of RTL (right-to-left) text.
    • Necessary for proper rendering of RTL scripts and mixed script labels containing RTL scripts interspersed with LTR text runs like numerals.
  • Complex text shaping
    • Necessary for proper display of scripts where adjacent glyphs should be transformed into new glyphs or rendered as combined glyphs or ligatures.

In terms of scripts affected by this, Hebrew requires bidi support, Indic scripts like Hindi can require complex text shaping, and Arabic requires both bidi and complex text shaping support. Additionally, implementing the Unicode line-breaking algorithm should improve support for cases like smarter line breaking in Chinese.

How do we currently handle fontstack fallbacks?

Currently, the Protobuf-encoded "glyph tiles" we create with node-fontnik are a composited "fontstack" with missing glyphs in fonts higher in the stack being filled in by glyphs from fonts further down the stack and we therefore end up with a combined Helvetica, Arial Unicode fontstack with per-glyph fallbacks in rendered text.

Fontstack Coverage
"Helvetica" Latin
"Arial Unicode" Latin, Arabic
"Helvetica, Arial Unicode" Helvetica Latin, Arial Unicode Arabic

Why will this not work for complex text shaping?

Because shaping tables are specific to a font file, to apply shaping properly we will need to work exclusively with glyphs from a single font. Instead of using "fontstack" glyph tiles, we will need tiles which contain all the glyphs in a given range for a single font. This approach should also limit glyph atlas duplication for multiple fontstacks with a common fallback.

How will we do this?

We will first need to segment each label into text runs (splitting words into individual segments, and splitting Arabic text segments from numerical segments for example) with the Unicode bidi algorithm. Then, for each segment, we will attempt, with each font in the fontstack until a match is found, to shape the text segment with a single font's shaping table and check whether all characters in the shaped result can be rendered by that font (using a glyph coverage file). If coverage is incomplete, we will fall back to the next font in the stack.

(It's possible we could check glyph coverage first, but the necessary glyphs may change after shaping, and the glyph coverage check would have to be repeated. We should test performance to determine whether a possibly inaccurate initial coverage check is faster than redundant shaping passes for fonts lacking glyph coverage.)

Example

For the fontstack "Open Sans, Arial Unicode", no glyphs change when shaped with Open Sans/gsub.sfnt - do all characters in résumé exist in Open Sans.coverage.json? NO? Missing é? Reshape with Arial Unicode/gsub.sfnt, then check if all characters in résumé exist in Arial Unicode.coverage.json

Once a font with matching coverage has been determined, we can request glyph tiles from a single font containing the necessary glyphs, like Arial Unicode Regular/0-255.pbf.

How will we get/use these "shaping tables"?

Shaping tables are contained in font files as GSUB (glyph substitution), GPOS (glyph positioning) and KERN (kerning) tables, which can be read by the FreeType function FT_Load_Sfnt_Table. We will need to extract these tables from from uploaded font files, then request them from the client through an API. We've started work on extracted shaping tables but it isn't quite functional yet.

To use these shaping tables, we will need to pass them into HarfBuzz for mapbox-gl-native, or an emscripten port for mapbox-gl-js. I'm not sure if HarfBuzz currently has an interface for reading raw shaping tables (it generally works with full font files). If this interface doesn't currently exist, we'll need to add it.

Resources

Universal

C++

JavaScript

/cc @mapbox/gl

Made some initial progress on integrating Harfbuzz in mapbox-gl-native in https://github.com/mapbox/mapbox-gl-native/compare/harfbuzz, but the biggest stumbling block I've hit so far has been the requirement for using glyph indices (as opposed to Unicode points) for layout.

From the ICU docs (but Harfbuzz shares the same pattern here):

Since many of the contextual forms, ligatures, and split characters needed to display complex text do not have Unicode code points, they can only be referred to by their glyph indices. Because of this, the LayoutEngine's output is a list of glyph indices. This means that the output must be displayed using an interface where the characters are specified by glyph indices rather than code points.
http://userguide.icu-project.org/layoutengine

This is complicated by our current SDF spec only tagging glyphs by char code, not glyph index, and will be another consideration in how a v2 SDF spec will need to be structured.

As noted Hebrew labels are backward/flipped, in Tel Aviv for example http://localhost:9966/#19/32.09430/34.78352
Had someone who understands the language looking at my mapbox-gl-js map say "they're nonsense" until I pointed out the labels are just backward. I added a few name:en values via OSM which hides some of this for me but for anyone using Mapbox in Israel to get around this issue makes matching the map to signs (the actual wayfinding) an awkward exercise--assuming, and hoping they note the pattern in the first place. Is there a solution I can implement now?

@mikemorris is currently working on fixing this

@mikemorris do let us know if you need some help with testing as this is definitely a pressing issue for many of us. Thanks!

A little extra help would certainly be appreciated @mushon! I'll continue to post updates here as I get a better idea of how to break this project down into concrete chunks to build and test.

@jimmont My initial work in mapbox/mapbox-gl-js#1841 may be an option for you. It only handles bidirectional text (not complex shaping), but it sounds like that might be all you need currently?

We are anxiously waiting for any update of the issue status.

This issue is a fatal.(To use in CJK)
Is there a temporary workaround?

@epsg3857 Can you explain how/which CJK scripts are affected by the lack of bidirectional text or complex shaping? Are you referring to the line-breaking issue originally reported in mapbox/mapbox-gl-native#1223, vertical label support or something else?

@epsg3857, mapbox/mapbox-gl-native#5077 was incorrectly linked to this issue. This ticket tracks complex font shaping and right-to-left text support, not CJK. The issue you’re running into is mapbox/mapbox-gl-native#1681, possibly exacerbated by mapbox/mapbox-gl-native#1444.

I see.
understood.

Any updates? It's been a while :-(

I second that.

Just as an FYI, this is not some "nice to have" feature, it is a very serious bug. Right now every Mapbox GL map no matter what label language it uses shows many meaningless reversed text all around the Middle East and North Africa as many many OSM labels don't have an English name.

And I must add, I haven't found myself having to beg a company for RTL support since the early days of Macromedia Flash. Somehow Mapbox always seemed to me like a company with a different image of the world, and a different vision of how different cultures and places should be represented on the web. It is quite frustrating and frankly insulting to see how an issue that should be a blocker for any beta release and affects hundreds of millions users is continuously disregarded. While reversing our exotic letters for our cities and streets on every GL map makes them equally meaningless to you, for us this is the image of technological colonialism. On a map.

Forgive my harsh words, but I hope this helps you finally see this long ignored blind spot and address this issue more urgently.

With otherwise utter admiration and respect,

Mushon Zer-Aviv
Mushon.com | Shual.com | @mushon

On Aug 17, 2016, at 07:26, Arman notifications@github.com wrote:

Any updates? It's been a while :-(


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Hi Mushon,

We absolutely understand how this is an important issue, both in terms of parity with other technology and being able to represent all languages equally. I hope that, as this ticket's intro lays out and the many referencing issues explain: this is a very difficult issue. Flash and desktop applications can take advantage of existing C++ text-shaping logic, as well as fonts-on-disk. Mapbox GL JS loads fonts incrementally (yay) which means that this sort of problem is incredibly, deeply, months-long-difficult hard (boo).

We understand that this is a big issue - many Mapboxers aren't native English speakers and we want our maps to be a tool for equality and understanding. Unfortunately, this is, simply, incredibly hard, and thus it's taken an extremely long time to even get a prototype off the ground. If you know any tricks or want to connect us with people who know a shorter way to a solution, contributions or connections would be incredibly appreciated. But please, for the time being, understand that this isn't disregard or colonialism or etc., it's just unvarnished "difficulty".

  • Tom

Now.. can I fix this on android? is there a way?

@tmcw as someone part of a very large crowd interested in using multilingual GL maps for the 1 billion in India who don't use English, how can someone external to Mapbox put in some effort to try and find a solution? Is there a list of open tasks that folks can start working on toward finding a solution?

@mikemorris, @tmcw Is the appropriate place to solve this problem in the mapbox-gl-js repo and is the mapbox/mapbox-gl-js/pull/1841/ PR a good place to start? Does the contributing doc have all the criteria (seems to cover setting up a development env for new devs, less conventions in the codebase)? I assume the only thing missing from it (for a PR) is passing tests?

Is anyone actively working on solving this? Was going to start taking a look but I'm not familiar with the details, at all, and don't want to be wasting time.

mapbox/mapbox-gl-native#6057 has a simple proof of concept of right-to-left mirroring with naïve bidirectional support. To be clear, it does not implement contextual forms or ligatures, but it does appear to be sufficient for rendering Hebrew correctly.

Can a speaker of Arabic, Persian, Urdu, etc. comment on whether mirroring alone would at least improve readability, even if it remains painful to read this unshaped text? I’m hopeful we could at least land a change like this as a stopgap, along the same lines as mapbox/mapbox-gl-js#1841.

It won’t make much of a difference for Arabic, the text will be equally unreadable, and it makes absolutely no difference for Indic scripts as they are left-to-right scripts.

I agree with @khaledhosny. This fix might only address simpler RTL cases like Hebrew, which is not enough.
I am not in a position to help much with code or to even assess how complex fixing this might be, but I do know Mapbox is not the only company rolling GL based vector maps. Both big companies like Google and small ones like Mapzen provide GL vector maps that respect complex RTL scripts. While I don't think Google will be interested in sharing techniques, @mapzen which shares the Open Data / Open Source creed might.
Here's a comparison of how Google (👍), Mapzen (👍) and Mapbox (👎) render the label for Beirut:
beirut

@dphiffer do you or anyone on your team at @mapzen care to share how you solved this issue in your Tangram rendering engine?

My colleague and I tried the GNU FriBidi with the source code, in the file "src/mbgl/text/glyph_set.cpp" function "GlyphSet::getShaping()", We tried to first convert the labels string into a correct bidi text and then use the corrected string after on.
We added a function to convert raw label text to bidi corrected text (Credit goes to XBMC KODI)

bool logicalToVisualBiDi(const std::u32string& stringSrc, std::u32string& stringDst, FriBidiCharType base /*= FRIBIDI_TYPE_LTR*/, const bool failOnBadString /*= false*/)
  {
    stringDst.clear();

    const size_t srcLen = stringSrc.length();
    if (srcLen == 0)
      return true;

    stringDst.reserve(srcLen);
    size_t lineStart = 0;

    // libfribidi is not threadsafe, so make sure we make it so
    // CSingleLock lock(m_critSectionFriBiDi);
    do
    {
      size_t lineEnd = stringSrc.find('\n', lineStart);
      if (lineEnd >= srcLen) // equal to 'lineEnd == std::string::npos'
        lineEnd = srcLen;
      else
        lineEnd++; // include '\n'

      const size_t lineLen = lineEnd - lineStart;

      FriBidiChar* visual = (FriBidiChar*) malloc((lineLen + 1) * sizeof(FriBidiChar));
      if (visual == NULL)
      {
        free(visual);
        printf("%s: can't allocate memory", __FUNCTION__);
        return false;
      }

      bool bidiFailed = false;
      FriBidiCharType baseCopy = base; // preserve same value for all lines, required because fribidi_log2vis will modify parameter value
      if (fribidi_log2vis((const FriBidiChar*)(stringSrc.c_str() + lineStart), lineLen, &baseCopy, visual, NULL, NULL, NULL))
      {
        // Removes bidirectional marks
        const int newLen = fribidi_remove_bidi_marks(visual, lineLen, NULL, NULL, NULL);
        if (newLen > 0)
          stringDst.append((const char32_t*)visual, (size_t)newLen);
        else if (newLen < 0)
          bidiFailed = failOnBadString;
      }
      else
        bidiFailed = failOnBadString;

      free(visual);

      if (bidiFailed)
        return false;

      lineStart = lineEnd;
    } while (lineStart < srcLen);

    return !stringDst.empty();
  }

Testing the below code and printing the converted string to console shows that the fribidi works perfectly :

    std::u32string b;
    // the following string is considered a complex string (RTL and LTR texts in one string)
    std::u32string testStr = U"hi سلام";


    logicalToVisualBiDi(testStr, b, FRIBIDI_PAR_RTL, false);

    // convert the u32string to string in order to be printable
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
    std::string b2 = conv.to_bytes(b);

   printf("%s", b2); // works just fine

Using below code :

const Shaping GlyphSet::getShaping(const std::u32string &string, const float maxWidth,
                                    const float lineHeight, const float horizontalAlign,
                                    const float verticalAlign, const float justify,
                                    const float spacing, const Point<float> &translate) const {
    Shaping shaping(translate.x * 24, translate.y * 24, string);

    // the y offset *should* be part of the font metadata
    const int32_t yOffset = -17;

    float x = 0;
    const float y = yOffset;

    // the bidi corrected string
    std::u32string bidiCorrected;
    logicalToVisualBiDi(string, bidiCorrected, FRIBIDI_PAR_RTL, false);

    // Loop through all characters of this label and shape.
    for (uint32_t chr : bidiCorrected) {
        auto it = sdfs.find(chr);
        if (it != sdfs.end()) {
            shaping.positionedGlyphs.emplace_back(chr, x, y);
            x += it->second.metrics.advance + spacing;
        }
    }

    if (shaping.positionedGlyphs.empty())
        return shaping;

    lineWrap(shaping, lineHeight, maxWidth, horizontalAlign, verticalAlign, justify, translate);

    return shaping;
}

Will result all empty lables on the map, even Latin characters will be vanished!

I tried to convert a simple string with only 3 chars, to see if the characters are actually being converted to something else (Another character with different numerical value):

std::u32string bidiCorrected;
std::u32string testStr = U"ساa";
logicalToVisualBiDi(testStr, bidiCorrected, FRIBIDI_PAR_RTL, false);


std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
std::string b2 = conv.to_bytes(bidiCorrected);
printf("%d, %d, %d, ---- %d, %d, %d\n", 
                bidiCorrected.c_str()[0], bidiCorrected.c_str()[1], bidiCorrected.c_str()[2],
                testStr.c_str()[0], testStr.c_str()[1], testStr.c_str()[2]);

97, 65166, 65203, ---- 1587, 1575, 97
As expected, the Persian characters are being converted in order to form the connected letters in the correct order and correct form.

The characters in **testStr** are as bellow and with the order as they are numbered:
image

And characters in "bidiCorrected" string are as bellow with the order as they are numbered:
image

As you can see, the FriBidi converts the complex text just fine, but when I replace the converted string int the rest of function, Persian characters in labels will not be shown, but Latin characters are ok.

My guess is that somewhere, the needed glyphs are inaccessible and thus the Persian characters will not be shown.

If only we could fix the missing glyphs problem, I think it's done.

Note that FriBiDi does primitive shaping; it does not cover many languages using the Arabic script (e.g. Sindhi), does not handle glyph positioning (so vowel mark positioning will be poor to unreadable), fails with many fonts (e.g. Noto Nastqliq Urdu), and still does not handle any of the Indic scripts.

Proper text layout will involve mainly handling bidirectional text (e.g. using FriBiDi without its shaping part), and shaping (with HarfBuzz being essentially the only viable freely licensed library that does it correctly).

So what do you suggest? Where does Harfbuzz actually comes to play?

The general process is usually:

  • Determine bidi run boundaries and order (using FriBiDi or another bidi implementation).
  • Determine script run boundaries.
  • Determine font, language or other styles run boundaries (e.g. if you are doing styles text).
  • Split the text into runs of characters that has the same direction, script, language and font.
  • Pass the runs to HarfBuzz for shaping, and get back glyph indices and positions from HarfBuzz and use them to draw the text.

From earlier replies I think the problem here is the APIs used for text drawing takes only text strings not glyphs indics, this is really a deal breaker and no complex text layout can be done with such APIs (unless they are doing the above processing internally, and apparently they don’t).

If someone is looking for a C library that does the above (either for use or inspiration), please check Rqm.

Brett from Mapzen here. As @mushon points out, Mapzen's Tangram renderer does support complex text shaping in both native/C++ (https://github.com/tangrams/tangram-es) and web/JS (https://github.com/tangrams/tangram) variants, though the implementation is quite different in each.

Tangram ES uses a combination of HarfBuzz, ICU, and FreeType for its shaping pipeline, with the final glyphs rendered using Signed Distance Fields (SDF) (the latter is not specific to complex text shaping but is a similarity with Mapbox GL). Tangram loads the font files and renders SDFs for glyphs at run-time, rather than doing a server-side preprocessing step (I believe Mapbox GL preprocesses into its own format). The shaping logic is consolidated in the Alfons (https://github.com/hjanetzek/alfons) library, and @hjanetzek and @karimnaaji have done the bulk of the work on this process -- I am sure they would be happy to answer any questions or point to more specifics.

Tangram JS uses a simpler method, relying on the browser's Canvas element to render text strings that are then used as WebGL textures. The benefit of this is that we get all the text shaping support already in the browser "for free", and can use any browser font rather than requiring a custom format. The downside is we have greater constraints on curved text and general quality (we're not using SDF though we could). Different set of trade-offs :) Having seen the work that went into our C++ code (where well-established libraries for much of this already exist), I definitely sympathize with the complexity of implementing the whole pipeline "from scratch" in JS (therefore we would also consider this approach but not in the near future).

Thanks @bcamper that's a very generous overview. I hope the team here finds it helpful. I wonder indeed how they see it. @tmcw @mikemorris

Thanks @bcamper! I'll leave the in-depth technical rundown to @mikemorris, from my perspective:

2016-08-27 at 5 36 pm

Tangrams

2016-08-27 at 5 38 pm

Mapbox

  • The lack of curved text in Tangrams is a cartographic bummer - in my opinion a significant one but others might think differently.
  • I think Canvas rendering in-browser is a solid option - afaik browsers have pretty robust rendering support. I can see this working as:
    • If the characters in the text are entirely non-shaped, we use the current algorithm
    • If there are shaped characters, we change the text placement algorithm to be basically "~30 degree tolerance, and then straight line on the tangent"
  • Canvas can write from webfonts, but distributing webfonts is potentially different legally from distributing SDFs, which are derived rather than raw. Since Mapbox ingests and distributes user content, we'd have to get that lawyer-checked.
  • This'd mean "two separate systems" for rendering fonts, which, if the Canvas technique is fairly simple, would probably be fine.

I wonder indeed how they see it.

I certainly appreciate @bcamper's input, and want to turn it into a practical result! And hope you trust that I and others are doing so, and assume good intentions on mine and others parts. Turning intentions into results is tough, and rather thankless. Or, well, favless.

Agreed, the lack of curved text is a significant bummer :) But I think this
can be improved and achieve reasonable results with Canvas text by breaking
the label into multiple segments (could potentially be per-character, or
small groups of characters), taking care of course to not break text
shaping -- possibly doing a simple unicode scan and limiting text that
contains characters outside of a "simple Latin" whitelist to a simpler
path, either with no curving, or only breaking on word boundaries (if that
is ok for all shaped text? I am not sure). Anyway this is what we'll
be working on soon for Tangram so we'll see how well it works! And I'll be
interested to see what you come up with if you adopt some Canvas rendering
for Mapbox GL as well.

It's critical for us to have properly scaling text as you zoom, so even if we use the Canvas hack for text shaping, we will have to dynamically generate SDFs from the rendered labels (unlike Tangram). This can get very tricky quickly, but worth considering.

@tmcw

I wonder indeed how they see it.

I certainly appreciate @bcamper's input, and want to turn it into a practical result! And hope you trust that I and others are doing so, and assume good intentions on mine and others parts. Turning intentions into results is tough, and rather thankless. Or, well, favless.

First of all, Thank you for everything you're doing and for your responsiveness here. (not thankless anymore)
Second, I really 👍 your comment. (so technically, not favless anymore either)
But seriously, I definitely appreciate the work that is constantly coming out of @mapbox and I admire your professional and social ethics as a company. We all have blindspots, in this case yours was the RTL/Complex scripts bug which was put on the back burner for quite some time in favor of other (complex, impressive, important and admirable) features. My impression is that your responses here are 100% sincere and committed, and so are mine. It's undoubtedly a hairy problem and it's heartwarming to see how caring and generous the responses here are, both from you guys, from some of your users and even from people working on possibly competing technologies. So if anything I think you should feel encouraged by the responses here.

I for one think this thread is inspiring. Keep it up.

@mushon for what it's worth, Mapbox GL JS used to have support for Complex Text Layout, including Arabic and Hebrew in a very early development version. Back then, it worked by parsing the style on the server, and requesting style-specific vector tiles that contained text shaping information processed on the server. However, this setup requires too many server-side components to work (and a massive increase in CDN load) and renders one of the advantages of the GL ecosystem (switching styles without redownloading data) moot, so we removed it.

Can't we have a simple text shaping functionality for now? Curved text and proper text scaling while zooming is not "that" critical, critical is to have simple text shaping so the labels will be readable.

I second @Arman92 comment. while research and development on proper solutions goes on, we need an interim solution to show RTL texts correctly at least now. May be for curved text, more months of work is needed. Also I think more developers should pay attention to this issue. This needs inviting others to this thread and asking them to help. I remember when I was using Mapnik almost 8 years ago, it had same issues. I contacted Artem and he released a new version which was showing RTL script correctly. I rememebr in that time, Artem did it like in less than 1 or 2 weeks. I believe this is not a hard technological barrier. This is only a matter of concentration on specific big bug.

I can't believe this issue is still not fixed after 1 YEAR!
I'll start a new map based app in the coming week and giving that we can't event have labels on map, I'm going to have to choose Google Maps over MapBox despite my wish...

commented

unfortunately looks like RTL languages is not very important to mapbox team !

@RezaOruji It is not true that mapbox team does not care about RTL languages. It just because implementing full compatible text rendering functionality in a WebGL context is so damn hard. Even in 2016 after over two decades of OpenGL, rendering text is not easy since OpenGL can only draw triangles and lines. You can checkout these posts:
https://www.mapbox.com/blog/text-signed-distance-fields/
https://www.mapbox.com/blog/placing-labels/

I would rather optimistic about mapbox geniuses will get this problem solved. Just keep patient and be kind to them.

@jingsam I know that solving this issue is really a challenge but I wish someone from MapBox would've reported the current progress and if possible determine an estimate date for fixing this issue.

Hey all -- thanks for the continued feedback, and for your patience. We know that this is a critical feature for many people. I want to reiterate that the length of time it's taking is a reflection of the difficulty of the problem, not because we don't think it's an important feature.

I have some good news to report: we've hired an additional developer whose first and primary responsibility will be to improve support for complex text rendering across all the Mapbox GL SDKs. While we can't make concrete estimates on availability, you can expect to see development work commencing in November.

In other good news, I'm the new developer! I'm really excited to get started working with the excellent Mapbox team and also learning from all of you out there. @mikemorris's summary of this issue still describes our overall strategy. I'm starting by looking for a useful subset of features we can get out quickly, and I'm currently focused on these two:

  • Applying the Bidirectional Algorithm (or a subset of it) to labels. This at least gets glyphs in the right order, if not shaped correctly. We believe this helps a lot for languages like Hebrew.
  • Performing Arabic shaping using the Unicode "presentation forms" for the initial/medial/final variants of Arabic characters. This approach is attractive as a stop-gap because it doesn't require any significant changes to our glyph rendering, but it would only work for languages that use the Arabic script.

On the gl-native side, we are still looking at using ICU and HarfBuzz. Our primary concern is making sure they don't increase the SDK size too much and making sure they're performant.

On the gl-js side, we're still searching for reliable libraries we could use. We're also considering porting functionality from C/C++ libraries, or using Canvas to render labels. Any tips very welcome!

We've just merged our first PR for this issue on the gl-native side: mapbox/mapbox-gl-native#6984

Our first step was to use ICU for bidirectional text support and Arabic text shaping. Here's a screenshot of my wife's home town before:

Mashhad Before

... and after:

Mashhad After

One limitation of the current implementation is that it doesn't correctly handle line breaks in labels that combine LTR and RTL text (mapbox/mapbox-gl-native#7112). We plan to address that soon.

I'm currently working on porting these changes to mapbox-gl-js using Emscripten. So far, the results are promising -- it has not been too difficult to get ICU to run inside the browser, but I'm still working on slimming down the resulting javascript bundle.

After these changes are complete, I plan to turn to using HarfBuzz for the "full solution" (which will include support for Indic scripts, as well as better typography support for Latin scripts).

Please reach out to me with questions or comments!

  • Support bidirectional text and Arabic shaping in mapbox-gl-native
  • Support bidirectional text and Arabic shaping in mapbox-gl-js
  • Improved line breaking support for diglossic labels
  • Use HarfBuzz for complete complex text shaping support

@ChrisLoer we're very happy to see this.
Do you have an estimation of when can we expect this to become production-ready?

@mushon The current gl-native changes should go out in the Android SDK 4.3.0 and iOS SDK 3.5.0 releases. The timing of those releases depends on several other features, but the target is in the next few months. The gl-js changes aren't ready to merge yet, but I'm hopeful that within a few weeks I'll be able to merge them, at which point we'll know which release they'll go into.

As for the HarfBuzz support for Indic text, all I can say for sure is that I'll start working on it as soon as the current round of gl-js changes go in. So by the end of the year I should at least be able to provide a better estimate.

I've merged the gl-native fix for line-breaking diglossic text (mapbox/mapbox-gl-native#7112), and I'm continuing to work on porting the gl-native changes to gl-js.

mapbox/mapbox-gl-js#3758 contains an implementation of Arabic shaping and bidirectional layout for gl-js. We're not ready to merge it yet as we work through the performance implications, but if you're interested in seeing our progress or providing feedback, please take a look!

commented

@ChrisLoer it's been more then a month since last update on this issue. Can you provide an update as for where it stands, what's remaining etc?

Also, relating to performance:
Is there a way to get some solution out there, then iterate and improve performance?
Perhaps this should be open for discussion, as quite a few people & projects are waiting for this fix (it's been more then a year since this issue was created). All projects which consider the middle east as important are pretty much prevented from relying on mapbox-gl. How much of a problem that is for mapbox, I'm not sure, but I do think the priority of this issue should be re-evaluated

@knigal For gl-js, we've settled on the idea of loading the support as a plugin (as a way to get the functionality out there as we still keep working to improve the performance). The changes and documentation are ready, we're just finalizing the relatively minor issue of how to name the plugin (mapbox-gl-arabic-text is the leading contender right now). This should go in very soon and be available in our February release of gl-js.

For gl-native, the changes are still waiting to go out as part of the Android SDK 4.3.0 and iOS SDK 3.5.0 releases, still targeted for the early months of this year.

If you’re interested in trying out this functionality ahead of time on a mobile or desktop platform, check out our instructions for building the SDKs yourself:

https://github.com/mapbox/mapbox-gl-native/tree/master/platform/android#contributing-to-the-sdk
https://github.com/mapbox/mapbox-gl-native/blob/master/platform/ios/INSTALL.md
https://github.com/mapbox/mapbox-gl-native/blob/master/platform/macos/INSTALL.md

Please file any issues you see in the mapbox-gl-native repository.

@ChrisLoer perhaps your team could consider mapbox-gl-rtl-text as the plugin name if it isn't already. Save a few letters.

@1ec5 @ChrisLoer how can we test the update to gl-js (that's slated to come in February)?

Thanks for this update and the effort as well. We can now map plans for 2017.

commented

Q: @ChrisLoer Will the upcoming release display correctly Hebrew in addition to Arabic? If so, mapbox-gl-rtl-text would be a better name (sincemapbox-gl-arabic-text implies Arabic-only)

btw only mapbox-gl.js is relevant to my projects. Thus adding a +1 on @jimmont 's request for a way to preview the .js plugin. I'd be happy to try it and post feedback

Closing this ticket as part of an effort to merge this repo into the mapbox-gl-js repo. Work on this project is nearing completion. Status updates and conversation will continue at mapbox/mapbox-gl-js#3708

The initial phase of support for right-to-left and Arabic script is now complete.

The primary tracking issue for the remaining complex text challenges is now: mapbox/mapbox-gl-native#7774

Great!!!

Any consideration of using Graphite2 to handle this on top of FT?

@johnnybegood7 Our assumption is that if we were adding support for fonts that used Graphite, it would be through the Harfbuzz wrapper of Graphite (see discussion in mapbox/mapbox-gl-native#7774).

was this problem solved for android SDK ? :)

@3bo0o0odee We integrated ICU to support bidirectional text layout and Arabic text shaping with mapbox/mapbox-gl-native#6984 which is available in the Android SDK 5.x series. The remaining complex text work is ongoing and tracked on mapbox/mapbox-gl-native#7774.

Is there any body to create a step by step fix for android? can we use Mapbox on android and USE for Persian or any Arabic countries?

@MahdiAstanei Upgrading Mapbox to the latest released version (currently 5.1) will include this fix. You don't have to do any special configuration.

Thanks for making progress there but what about supporting this at Mapbox Studio and supporting the static (bitmap) tiles? We really can't design like this… @ChrisLoer can you give us an update?

@mushon The plan right now is that Studio support won't come until we've integrated RTL text into the core library. We are exploring using web assembly as a way to integrate more "native" code into GL JS -- any solution we come up with there will probably include RTL text. I know that "just enable the plugin in Studio" would be a more immediate solution for you, and that's still an open discussion, but it's not our current plan. An alternative short term solution we've discussed is having Studio preview raster tiles generated by api-gl, which would include the shaping support.

When you say static (bitmap) tiles, do you mean tiles generated by api-gl? They should already be fixed.

@ChrisLoer I would really appreciate at least a browser plugin that would give us something to work with in studio until you implement a more robust solution.
As for the bitmap tiles, you are right, they are processed correctly both for static images and for leaflet. It's just that the static image interface is inconsistent as it currently shows the wrong preview to what it actually generates.