Complex Text Rendering

Question

Complex Text Rendering

mikemorris opened this issue 9 years ago · comments

What is the state of text rendering in Mapbox GL?

We currently do not render scripts that require bidirectional support or complex text shaping correctly in mapbox-gl-native or mapbox-gl-js. This ticket will track adding proper support to both projects.

What's missing?

We need to add the following functionality for proper text rendering:

Unicode Bidi Algorithm to cut labels into logical segments and flip the display order of RTL (right-to-left) text.
- Necessary for proper rendering of RTL scripts and mixed script labels containing RTL scripts interspersed with LTR text runs like numerals.
Complex text shaping
- Necessary for proper display of scripts where adjacent glyphs should be transformed into new glyphs or rendered as combined glyphs or ligatures.

In terms of scripts affected by this, Hebrew requires bidi support, Indic scripts like Hindi can require complex text shaping, and Arabic requires both bidi and complex text shaping support. Additionally, implementing the Unicode line-breaking algorithm should improve support for cases like smarter line breaking in Chinese.

How do we currently handle fontstack fallbacks?

Currently, the Protobuf-encoded "glyph tiles" we create with node-fontnik are a composited "fontstack" with missing glyphs in fonts higher in the stack being filled in by glyphs from fonts further down the stack and we therefore end up with a combined Helvetica, Arial Unicode fontstack with per-glyph fallbacks in rendered text.

Fontstack	Coverage
"Helvetica"	Latin
"Arial Unicode"	Latin, Arabic
"Helvetica, Arial Unicode"	Helvetica Latin, Arial Unicode Arabic

Why will this not work for complex text shaping?

Because shaping tables are specific to a font file, to apply shaping properly we will need to work exclusively with glyphs from a single font. Instead of using "fontstack" glyph tiles, we will need tiles which contain all the glyphs in a given range for a single font. This approach should also limit glyph atlas duplication for multiple fontstacks with a common fallback.

How will we do this?

We will first need to segment each label into text runs (splitting words into individual segments, and splitting Arabic text segments from numerical segments for example) with the Unicode bidi algorithm. Then, for each segment, we will attempt, with each font in the fontstack until a match is found, to shape the text segment with a single font's shaping table and check whether all characters in the shaped result can be rendered by that font (using a glyph coverage file). If coverage is incomplete, we will fall back to the next font in the stack.

(It's possible we could check glyph coverage first, but the necessary glyphs may change after shaping, and the glyph coverage check would have to be repeated. We should test performance to determine whether a possibly inaccurate initial coverage check is faster than redundant shaping passes for fonts lacking glyph coverage.)

Example

For the fontstack "Open Sans, Arial Unicode", no glyphs change when shaped with Open Sans/gsub.sfnt - do all characters in résumé exist in Open Sans.coverage.json? NO? Missing é? Reshape with Arial Unicode/gsub.sfnt, then check if all characters in résumé exist in Arial Unicode.coverage.json

Once a font with matching coverage has been determined, we can request glyph tiles from a single font containing the necessary glyphs, like Arial Unicode Regular/0-255.pbf.

How will we get/use these "shaping tables"?

Shaping tables are contained in font files as GSUB (glyph substitution), GPOS (glyph positioning) and KERN (kerning) tables, which can be read by the FreeType function FT_Load_Sfnt_Table. We will need to extract these tables from from uploaded font files, then request them from the client through an API. We've started work on extracted shaping tables but it isn't quite functional yet.

To use these shaping tables, we will need to pass them into HarfBuzz for mapbox-gl-native, or an emscripten port for mapbox-gl-js. I'm not sure if HarfBuzz currently has an interface for reading raw shaping tables (it generally works with full font files). If this interface doesn't currently exist, we'll need to add it.

Resources

Universal

http://www.unicode.org/Public/8.0.0/ucd/
Unicode Bidirectional Algorithm http://www.unicode.org/reports/tr9/
Unicode Line Breaking Algorithm http://www.unicode.org/reports/tr14/
Unicode Text Segmentation http://www.unicode.org/reports/tr29/
Unicode test data http://www.unicode.org/reports/tr41/tr41-17.html#Tests14

C++

Unicode Bidi Reference Sample http://unicode.org/Public/PROGRAMS/BidiReferenceCpp/v26/
HarfBuzz API Design https://mail.gnome.org/archives/gtk-i18n-list/2009-August/msg00025.html
Mozilla HarfBuzz integration https://bugzilla.mozilla.org/show_bug.cgi?id=449292
http://site.icu-project.org/

JavaScript

https://github.com/twitter/twitter-cldr-js#handling-bidirectional-text

/cc @mapbox/gl

Mushon Zer-Aviv commented 9 years ago

👍

PikaHacker commented 8 years ago

Great!!!

Justin Miller · Answer 1 · Thu Dec 10 2015 05:46:38 GMT+0800 (China Standard Time)

Awesome summary @mikemorris.

Mike Morris · Answer 2 · Thu Apr 14 2016 05:09:02 GMT+0800 (China Standard Time)

Made some initial progress on integrating Harfbuzz in mapbox-gl-native in https://github.com/mapbox/mapbox-gl-native/compare/harfbuzz, but the biggest stumbling block I've hit so far has been the requirement for using glyph indices (as opposed to Unicode points) for layout.

From the ICU docs (but Harfbuzz shares the same pattern here):

Since many of the contextual forms, ligatures, and split characters needed to display complex text do not have Unicode code points, they can only be referred to by their glyph indices. Because of this, the LayoutEngine's output is a list of glyph indices. This means that the output must be displayed using an interface where the characters are specified by glyph indices rather than code points.
http://userguide.icu-project.org/layoutengine

This is complicated by our current SDF spec only tagging glyphs by char code, not glyph index, and will be another consideration in how a v2 SDF spec will need to be structured.

Jim Montgomery · Answer 3 · Tue Apr 19 2016 14:13:56 GMT+0800 (China Standard Time)

As noted Hebrew labels are backward/flipped, in Tel Aviv for example http://localhost:9966/#19/32.09430/34.78352
Had someone who understands the language looking at my mapbox-gl-js map say "they're nonsense" until I pointed out the labels are just backward. I added a few name:en values via OSM which hides some of this for me but for anyone using Mapbox in Israel to get around this issue makes matching the map to signs (the actual wayfinding) an awkward exercise--assuming, and hoping they note the pattern in the first place. Is there a solution I can implement now?

Konstantin Käfer · Answer 4 · Tue Apr 19 2016 16:44:09 GMT+0800 (China Standard Time)

@mikemorris is currently working on fixing this

Mushon Zer-Aviv · Answer 5 · Tue Apr 19 2016 17:17:58 GMT+0800 (China Standard Time)

@mikemorris do let us know if you need some help with testing as this is definitely a pressing issue for many of us. Thanks!

Mike Morris · Answer 6 · Tue Apr 19 2016 23:31:21 GMT+0800 (China Standard Time)

A little extra help would certainly be appreciated @mushon! I'll continue to post updates here as I get a better idea of how to break this project down into concrete chunks to build and test.

@jimmont My initial work in mapbox/mapbox-gl-js#1841 may be an option for you. It only handles bidirectional text (not complex shaping), but it sounds like that might be all you need currently?

Arman Safikhani · Answer 7 · Thu Jun 02 2016 14:15:06 GMT+0800 (China Standard Time)

We are anxiously waiting for any update of the issue status.

Deleted user · Answer 8 · Tue Jun 07 2016 20:51:23 GMT+0800 (China Standard Time)

This issue is a fatal.(To use in CJK)
Is there a temporary workaround?

Mike Morris · Answer 9 · Tue Jun 07 2016 23:45:26 GMT+0800 (China Standard Time)

@epsg3857 Can you explain how/which CJK scripts are affected by the lack of bidirectional text or complex shaping? Are you referring to the line-breaking issue originally reported in mapbox/mapbox-gl-native#1223, vertical label support or something else?

Minh Nguyễn · Answer 10 · Wed Jun 08 2016 02:41:55 GMT+0800 (China Standard Time)

@epsg3857, mapbox/mapbox-gl-native#5077 was incorrectly linked to this issue. This ticket tracks complex font shaping and right-to-left text support, not CJK. The issue you’re running into is mapbox/mapbox-gl-native#1681, possibly exacerbated by mapbox/mapbox-gl-native#1444.

Deleted user · Answer 11 · Wed Jun 08 2016 06:38:53 GMT+0800 (China Standard Time)

I see.
understood.

Arman Safikhani · Answer 12 · Wed Aug 17 2016 12:26:02 GMT+0800 (China Standard Time)

Any updates? It's been a while :-(

Mushon Zer-Aviv · Answer 13 · Wed Aug 17 2016 13:00:19 GMT+0800 (China Standard Time)

I second that.

Just as an FYI, this is not some "nice to have" feature, it is a very serious bug. Right now every Mapbox GL map no matter what label language it uses shows many meaningless reversed text all around the Middle East and North Africa as many many OSM labels don't have an English name.

And I must add, I haven't found myself having to beg a company for RTL support since the early days of Macromedia Flash. Somehow Mapbox always seemed to me like a company with a different image of the world, and a different vision of how different cultures and places should be represented on the web. It is quite frustrating and frankly insulting to see how an issue that should be a blocker for any beta release and affects hundreds of millions users is continuously disregarded. While reversing our exotic letters for our cities and streets on every GL map makes them equally meaningless to you, for us this is the image of technological colonialism. On a map.

Forgive my harsh words, but I hope this helps you finally see this long ignored blind spot and address this issue more urgently.

With otherwise utter admiration and respect,

Mushon Zer-Aviv
Mushon.com | Shual.com | @mushon

On Aug 17, 2016, at 07:26, Arman notifications@github.com wrote:

Any updates? It's been a while :-(

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Tom MacWright · Answer 14 · Wed Aug 17 2016 13:25:47 GMT+0800 (China Standard Time)

Hi Mushon,

We absolutely understand how this is an important issue, both in terms of parity with other technology and being able to represent all languages equally. I hope that, as this ticket's intro lays out and the many referencing issues explain: this is a very difficult issue. Flash and desktop applications can take advantage of existing C++ text-shaping logic, as well as fonts-on-disk. Mapbox GL JS loads fonts incrementally (yay) which means that this sort of problem is incredibly, deeply, months-long-difficult hard (boo).

We understand that this is a big issue - many Mapboxers aren't native English speakers and we want our maps to be a tool for equality and understanding. Unfortunately, this is, simply, incredibly hard, and thus it's taken an extremely long time to even get a prototype off the ground. If you know any tricks or want to connect us with people who know a shorter way to a solution, contributions or connections would be incredibly appreciated. But please, for the time being, understand that this isn't disregard or colonialism or etc., it's just unvarnished "difficulty".

Tom

Mahdi astanei · Answer 15 · Wed Aug 17 2016 19:20:57 GMT+0800 (China Standard Time)

Now.. can I fix this on android? is there a way?

Arun Ganesh · Answer 16 · Wed Aug 17 2016 19:38:29 GMT+0800 (China Standard Time)

@tmcw as someone part of a very large crowd interested in using multilingual GL maps for the 1 billion in India who don't use English, how can someone external to Mapbox put in some effort to try and find a solution? Is there a list of open tasks that folks can start working on toward finding a solution?

Jim Montgomery · Answer 17 · Wed Aug 17 2016 23:37:02 GMT+0800 (China Standard Time)

@mikemorris, @tmcw Is the appropriate place to solve this problem in the mapbox-gl-js repo and is the mapbox/mapbox-gl-js/pull/1841/ PR a good place to start? Does the contributing doc have all the criteria (seems to cover setting up a development env for new devs, less conventions in the codebase)? I assume the only thing missing from it (for a PR) is passing tests?

Is anyone actively working on solving this? Was going to start taking a look but I'm not familiar with the details, at all, and don't want to be wasting time.

Minh Nguyễn · Answer 18 · Thu Aug 18 2016 03:21:27 GMT+0800 (China Standard Time)

mapbox/mapbox-gl-native#6057 has a simple proof of concept of right-to-left mirroring with naïve bidirectional support. To be clear, it does not implement contextual forms or ligatures, but it does appear to be sufficient for rendering Hebrew correctly.

Can a speaker of Arabic, Persian, Urdu, etc. comment on whether mirroring alone would at least improve readability, even if it remains painful to read this unshaped text? I’m hopeful we could at least land a change like this as a stopgap, along the same lines as mapbox/mapbox-gl-js#1841.

خالد حسني (Khaled Hosny) · Answer 19 · Sat Aug 20 2016 07:36:19 GMT+0800 (China Standard Time)

It won’t make much of a difference for Arabic, the text will be equally unreadable, and it makes absolutely no difference for Indic scripts as they are left-to-right scripts.

Mushon Zer-Aviv · Answer 20 · Sun Aug 21 2016 15:39:25 GMT+0800 (China Standard Time)

I agree with @khaledhosny. This fix might only address simpler RTL cases like Hebrew, which is not enough.
I am not in a position to help much with code or to even assess how complex fixing this might be, but I do know Mapbox is not the only company rolling GL based vector maps. Both big companies like Google and small ones like Mapzen provide GL vector maps that respect complex RTL scripts. While I don't think Google will be interested in sharing techniques, @mapzen which shares the Open Data / Open Source creed might.
Here's a comparison of how Google (👍), Mapzen (👍) and Mapbox (👎) render the label for Beirut:

@dphiffer do you or anyone on your team at @mapzen care to share how you solved this issue in your Tangram rendering engine?

Arman Safikhani · Answer 21 · Sun Aug 21 2016 21:41:21 GMT+0800 (China Standard Time)

My colleague and I tried the GNU FriBidi with the source code, in the file "src/mbgl/text/glyph_set.cpp" function "GlyphSet::getShaping()", We tried to first convert the labels string into a correct bidi text and then use the corrected string after on.
We added a function to convert raw label text to bidi corrected text (Credit goes to XBMC KODI)

bool logicalToVisualBiDi(const std::u32string& stringSrc, std::u32string& stringDst, FriBidiCharType base /*= FRIBIDI_TYPE_LTR*/, const bool failOnBadString /*= false*/)
  {
    stringDst.clear();

    const size_t srcLen = stringSrc.length();
    if (srcLen == 0)
      return true;

    stringDst.reserve(srcLen);
    size_t lineStart = 0;

    // libfribidi is not threadsafe, so make sure we make it so
    // CSingleLock lock(m_critSectionFriBiDi);
    do
    {
      size_t lineEnd = stringSrc.find('\n', lineStart);
      if (lineEnd >= srcLen) // equal to 'lineEnd == std::string::npos'
        lineEnd = srcLen;
      else
        lineEnd++; // include '\n'

      const size_t lineLen = lineEnd - lineStart;

      FriBidiChar* visual = (FriBidiChar*) malloc((lineLen + 1) * sizeof(FriBidiChar));
      if (visual == NULL)
      {
        free(visual);
        printf("%s: can't allocate memory", __FUNCTION__);
        return false;
      }

      bool bidiFailed = false;
      FriBidiCharType baseCopy = base; // preserve same value for all lines, required because fribidi_log2vis will modify parameter value
      if (fribidi_log2vis((const FriBidiChar*)(stringSrc.c_str() + lineStart), lineLen, &baseCopy, visual, NULL, NULL, NULL))
      {
        // Removes bidirectional marks
        const int newLen = fribidi_remove_bidi_marks(visual, lineLen, NULL, NULL, NULL);
        if (newLen > 0)
          stringDst.append((const char32_t*)visual, (size_t)newLen);
        else if (newLen < 0)
          bidiFailed = failOnBadString;
      }
      else
        bidiFailed = failOnBadString;

      free(visual);

      if (bidiFailed)
        return false;

      lineStart = lineEnd;
    } while (lineStart < srcLen);

    return !stringDst.empty();
  }

Testing the below code and printing the converted string to console shows that the fribidi works perfectly :

    std::u32string b;
    // the following string is considered a complex string (RTL and LTR texts in one string)
    std::u32string testStr = U"hi سلام";


    logicalToVisualBiDi(testStr, b, FRIBIDI_PAR_RTL, false);

    // convert the u32string to string in order to be printable
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
    std::string b2 = conv.to_bytes(b);

   printf("%s", b2); // works just fine

Using below code :

const Shaping GlyphSet::getShaping(const std::u32string &string, const float maxWidth,
                                    const float lineHeight, const float horizontalAlign,
                                    const float verticalAlign, const float justify,
                                    const float spacing, const Point<float> &translate) const {
    Shaping shaping(translate.x * 24, translate.y * 24, string);

    // the y offset *should* be part of the font metadata
    const int32_t yOffset = -17;

    float x = 0;
    const float y = yOffset;

    // the bidi corrected string
    std::u32string bidiCorrected;
    logicalToVisualBiDi(string, bidiCorrected, FRIBIDI_PAR_RTL, false);

    // Loop through all characters of this label and shape.
    for (uint32_t chr : bidiCorrected) {
        auto it = sdfs.find(chr);
        if (it != sdfs.end()) {
            shaping.positionedGlyphs.emplace_back(chr, x, y);
            x += it->second.metrics.advance + spacing;
        }
    }

    if (shaping.positionedGlyphs.empty())
        return shaping;

    lineWrap(shaping, lineHeight, maxWidth, horizontalAlign, verticalAlign, justify, translate);

    return shaping;
}

Will result all empty lables on the map, even Latin characters will be vanished!

I tried to convert a simple string with only 3 chars, to see if the characters are actually being converted to something else (Another character with different numerical value):

std::u32string bidiCorrected;
std::u32string testStr = U"ساa";
logicalToVisualBiDi(testStr, bidiCorrected, FRIBIDI_PAR_RTL, false);


std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
std::string b2 = conv.to_bytes(bidiCorrected);
printf("%d, %d, %d, ---- %d, %d, %d\n", 
                bidiCorrected.c_str()[0], bidiCorrected.c_str()[1], bidiCorrected.c_str()[2],
                testStr.c_str()[0], testStr.c_str()[1], testStr.c_str()[2]);

97, 65166, 65203, ---- 1587, 1575, 97
As expected, the Persian characters are being converted in order to form the connected letters in the correct order and correct form.

The characters in **testStr** are as bellow and with the order as they are numbered:

And characters in "bidiCorrected" string are as bellow with the order as they are numbered:

As you can see, the FriBidi converts the complex text just fine, but when I replace the converted string int the rest of function, Persian characters in labels will not be shown, but Latin characters are ok.

My guess is that somewhere, the needed glyphs are inaccessible and thus the Persian characters will not be shown.

If only we could fix the missing glyphs problem, I think it's done.

خالد حسني (Khaled Hosny) · Answer 22 · Sun Aug 21 2016 22:26:27 GMT+0800 (China Standard Time)

Note that FriBiDi does primitive shaping; it does not cover many languages using the Arabic script (e.g. Sindhi), does not handle glyph positioning (so vowel mark positioning will be poor to unreadable), fails with many fonts (e.g. Noto Nastqliq Urdu), and still does not handle any of the Indic scripts.

Proper text layout will involve mainly handling bidirectional text (e.g. using FriBiDi without its shaping part), and shaping (with HarfBuzz being essentially the only viable freely licensed library that does it correctly).

Arman Safikhani · Answer 23 · Mon Aug 22 2016 00:44:33 GMT+0800 (China Standard Time)

So what do you suggest? Where does Harfbuzz actually comes to play?

خالد حسني (Khaled Hosny) · Answer 24 · Mon Aug 22 2016 01:40:27 GMT+0800 (China Standard Time)

The general process is usually:

Determine bidi run boundaries and order (using FriBiDi or another bidi implementation).
Determine script run boundaries.
Determine font, language or other styles run boundaries (e.g. if you are doing styles text).
Split the text into runs of characters that has the same direction, script, language and font.
Pass the runs to HarfBuzz for shaping, and get back glyph indices and positions from HarfBuzz and use them to draw the text.

From earlier replies I think the problem here is the APIs used for text drawing takes only text strings not glyphs indics, this is really a deal breaker and no complex text layout can be done with such APIs (unless they are doing the above processing internally, and apparently they don’t).

خالد حسني (Khaled Hosny) · Answer 25 · Mon Aug 22 2016 01:43:39 GMT+0800 (China Standard Time)

If someone is looking for a C library that does the above (either for use or inspiration), please check Rqm.

Brett Camper · Answer 26 · Tue Aug 23 2016 06:12:46 GMT+0800 (China Standard Time)

Brett from Mapzen here. As @mushon points out, Mapzen's Tangram renderer does support complex text shaping in both native/C++ (https://github.com/tangrams/tangram-es) and web/JS (https://github.com/tangrams/tangram) variants, though the implementation is quite different in each.

Tangram ES uses a combination of HarfBuzz, ICU, and FreeType for its shaping pipeline, with the final glyphs rendered using Signed Distance Fields (SDF) (the latter is not specific to complex text shaping but is a similarity with Mapbox GL). Tangram loads the font files and renders SDFs for glyphs at run-time, rather than doing a server-side preprocessing step (I believe Mapbox GL preprocesses into its own format). The shaping logic is consolidated in the Alfons (https://github.com/hjanetzek/alfons) library, and @hjanetzek and @karimnaaji have done the bulk of the work on this process -- I am sure they would be happy to answer any questions or point to more specifics.

Tangram JS uses a simpler method, relying on the browser's Canvas element to render text strings that are then used as WebGL textures. The benefit of this is that we get all the text shaping support already in the browser "for free", and can use any browser font rather than requiring a custom format. The downside is we have greater constraints on curved text and general quality (we're not using SDF though we could). Different set of trade-offs :) Having seen the work that went into our C++ code (where well-established libraries for much of this already exist), I definitely sympathize with the complexity of implementing the whole pipeline "from scratch" in JS (therefore we would also consider this approach but not in the near future).

Mushon Zer-Aviv · Answer 27 · Sat Aug 27 2016 18:10:27 GMT+0800 (China Standard Time)

Thanks @bcamper that's a very generous overview. I hope the team here finds it helpful. I wonder indeed how they see it. @tmcw @mikemorris

Tom MacWright · Answer 28 · Sun Aug 28 2016 05:53:05 GMT+0800 (China Standard Time)

Thanks @bcamper! I'll leave the in-depth technical rundown to @mikemorris, from my perspective:

Tangrams

Mapbox

The lack of curved text in Tangrams is a cartographic bummer - in my opinion a significant one but others might think differently.
I think Canvas rendering in-browser is a solid option - afaik browsers have pretty robust rendering support. I can see this working as:
- If the characters in the text are entirely non-shaped, we use the current algorithm
- If there are shaped characters, we change the text placement algorithm to be basically "~30 degree tolerance, and then straight line on the tangent"
Canvas can write from webfonts, but distributing webfonts is potentially different legally from distributing SDFs, which are derived rather than raw. Since Mapbox ingests and distributes user content, we'd have to get that lawyer-checked.
This'd mean "two separate systems" for rendering fonts, which, if the Canvas technique is fairly simple, would probably be fine.

I wonder indeed how they see it.

I certainly appreciate @bcamper's input, and want to turn it into a practical result! And hope you trust that I and others are doing so, and assume good intentions on mine and others parts. Turning intentions into results is tough, and rather thankless. Or, well, favless.

Brett Camper · Answer 29 · Sun Aug 28 2016 06:07:10 GMT+0800 (China Standard Time)

Agreed, the lack of curved text is a significant bummer :) But I think this
can be improved and achieve reasonable results with Canvas text by breaking
the label into multiple segments (could potentially be per-character, or
small groups of characters), taking care of course to not break text
shaping -- possibly doing a simple unicode scan and limiting text that
contains characters outside of a "simple Latin" whitelist to a simpler
path, either with no curving, or only breaking on word boundaries (if that
is ok for all shaped text? I am not sure). Anyway this is what we'll
be working on soon for Tangram so we'll see how well it works! And I'll be
interested to see what you come up with if you adopt some Canvas rendering
for Mapbox GL as well.

Volodymyr Agafonkin · Answer 30 · Sun Aug 28 2016 14:52:42 GMT+0800 (China Standard Time)

It's critical for us to have properly scaling text as you zoom, so even if we use the Canvas hack for text shaping, we will have to dynamically generate SDFs from the rendered labels (unlike Tangram). This can get very tricky quickly, but worth considering.

Mushon Zer-Aviv · Answer 31 · Sun Aug 28 2016 15:52:51 GMT+0800 (China Standard Time)

@tmcw

I wonder indeed how they see it.

I certainly appreciate @bcamper's input, and want to turn it into a practical result! And hope you trust that I and others are doing so, and assume good intentions on mine and others parts. Turning intentions into results is tough, and rather thankless. Or, well, favless.

First of all, Thank you for everything you're doing and for your responsiveness here. (not thankless anymore)
Second, I really 👍 your comment. (so technically, not favless anymore either)
But seriously, I definitely appreciate the work that is constantly coming out of @mapbox and I admire your professional and social ethics as a company. We all have blindspots, in this case yours was the RTL/Complex scripts bug which was put on the back burner for quite some time in favor of other (complex, impressive, important and admirable) features. My impression is that your responses here are 100% sincere and committed, and so are mine. It's undoubtedly a hairy problem and it's heartwarming to see how caring and generous the responses here are, both from you guys, from some of your users and even from people working on possibly competing technologies. So if anything I think you should feel encouraged by the responses here.

I for one think this thread is inspiring. Keep it up.

Konstantin Käfer · Answer 32 · Sun Aug 28 2016 18:19:38 GMT+0800 (China Standard Time)

@mushon for what it's worth, Mapbox GL JS used to have support for Complex Text Layout, including Arabic and Hebrew in a very early development version. Back then, it worked by parsing the style on the server, and requesting style-specific vector tiles that contained text shaping information processed on the server. However, this setup requires too many server-side components to work (and a massive increase in CDN load) and renders one of the advantages of the GL ecosystem (switching styles without redownloading data) moot, so we removed it.

Arman Safikhani · Answer 33 · Sat Sep 03 2016 14:45:22 GMT+0800 (China Standard Time)

Can't we have a simple text shaping functionality for now? Curved text and proper text scaling while zooming is not "that" critical, critical is to have simple text shaping so the labels will be readable.

Alireza Kashian · Answer 34 · Thu Sep 15 2016 18:34:17 GMT+0800 (China Standard Time)

I second @Arman92 comment. while research and development on proper solutions goes on, we need an interim solution to show RTL texts correctly at least now. May be for curved text, more months of work is needed. Also I think more developers should pay attention to this issue. This needs inviting others to this thread and asking them to help. I remember when I was using Mapnik almost 8 years ago, it had same issues. I contacted Artem and he released a new version which was showing RTL script correctly. I rememebr in that time, Artem did it like in less than 1 or 2 weeks. I believe this is not a hard technological barrier. This is only a matter of concentration on specific big bug.

Arman Safikhani · Answer 35 · Sat Oct 15 2016 21:18:56 GMT+0800 (China Standard Time)

I can't believe this issue is still not fixed after 1 YEAR!
I'll start a new map based app in the coming week and giving that we can't event have labels on map, I'm going to have to choose Google Maps over MapBox despite my wish...

Reza · Answer 36 · Sun Oct 16 2016 04:50:53 GMT+0800 (China Standard Time)

unfortunately looks like RTL languages is not very important to mapbox team !

jingsam · Answer 37 · Sun Oct 16 2016 09:52:56 GMT+0800 (China Standard Time)

@RezaOruji It is not true that mapbox team does not care about RTL languages. It just because implementing full compatible text rendering functionality in a WebGL context is so damn hard. Even in 2016 after over two decades of OpenGL, rendering text is not easy since OpenGL can only draw triangles and lines. You can checkout these posts:
https://www.mapbox.com/blog/text-signed-distance-fields/
https://www.mapbox.com/blog/placing-labels/

I would rather optimistic about mapbox geniuses will get this problem solved. Just keep patient and be kind to them.

Arman Safikhani · Answer 38 · Sun Oct 16 2016 18:51:59 GMT+0800 (China Standard Time)

@jingsam I know that solving this issue is really a challenge but I wish someone from MapBox would've reported the current progress and if possible determine an estimate date for fixing this issue.

Chris Loer · Answer 39 · Fri Oct 28 2016 09:19:15 GMT+0800 (China Standard Time)

Hey all -- thanks for the continued feedback, and for your patience. We know that this is a critical feature for many people. I want to reiterate that the length of time it's taking is a reflection of the difficulty of the problem, not because we don't think it's an important feature.

I have some good news to report: we've hired an additional developer whose first and primary responsibility will be to improve support for complex text rendering across all the Mapbox GL SDKs. While we can't make concrete estimates on availability, you can expect to see development work commencing in November.

In other good news, I'm the new developer! I'm really excited to get started working with the excellent Mapbox team and also learning from all of you out there. @mikemorris's summary of this issue still describes our overall strategy. I'm starting by looking for a useful subset of features we can get out quickly, and I'm currently focused on these two:

Applying the Bidirectional Algorithm (or a subset of it) to labels. This at least gets glyphs in the right order, if not shaped correctly. We believe this helps a lot for languages like Hebrew.
Performing Arabic shaping using the Unicode "presentation forms" for the initial/medial/final variants of Arabic characters. This approach is attractive as a stop-gap because it doesn't require any significant changes to our glyph rendering, but it would only work for languages that use the Arabic script.

On the gl-native side, we are still looking at using ICU and HarfBuzz. Our primary concern is making sure they don't increase the SDK size too much and making sure they're performant.

On the gl-js side, we're still searching for reliable libraries we could use. We're also considering porting functionality from C/C++ libraries, or using Canvas to render labels. Any tips very welcome!

Chris Loer · Answer 40 · Fri Nov 18 2016 07:11:40 GMT+0800 (China Standard Time)

We've just merged our first PR for this issue on the gl-native side: mapbox/mapbox-gl-native#6984

Our first step was to use ICU for bidirectional text support and Arabic text shaping. Here's a screenshot of my wife's home town before:

... and after:

One limitation of the current implementation is that it doesn't correctly handle line breaks in labels that combine LTR and RTL text (mapbox/mapbox-gl-native#7112). We plan to address that soon.

I'm currently working on porting these changes to mapbox-gl-js using Emscripten. So far, the results are promising -- it has not been too difficult to get ICU to run inside the browser, but I'm still working on slimming down the resulting javascript bundle.

After these changes are complete, I plan to turn to using HarfBuzz for the "full solution" (which will include support for Indic scripts, as well as better typography support for Latin scripts).

Please reach out to me with questions or comments!

Support bidirectional text and Arabic shaping in mapbox-gl-native
Support bidirectional text and Arabic shaping in mapbox-gl-js
Improved line breaking support for diglossic labels
Use HarfBuzz for complete complex text shaping support

Mushon Zer-Aviv · Answer 41 · Mon Nov 21 2016 15:47:58 GMT+0800 (China Standard Time)

@ChrisLoer we're very happy to see this.
Do you have an estimation of when can we expect this to become production-ready?

Chris Loer · Answer 42 · Tue Nov 22 2016 03:27:08 GMT+0800 (China Standard Time)

@mushon The current gl-native changes should go out in the Android SDK 4.3.0 and iOS SDK 3.5.0 releases. The timing of those releases depends on several other features, but the target is in the next few months. The gl-js changes aren't ready to merge yet, but I'm hopeful that within a few weeks I'll be able to merge them, at which point we'll know which release they'll go into.

As for the HarfBuzz support for Indic text, all I can say for sure is that I'll start working on it as soon as the current round of gl-js changes go in. So by the end of the year I should at least be able to provide a better estimate.

Chris Loer · Answer 43 · Thu Dec 01 2016 03:58:53 GMT+0800 (China Standard Time)

I've merged the gl-native fix for line-breaking diglossic text (mapbox/mapbox-gl-native#7112), and I'm continuing to work on porting the gl-native changes to gl-js.

Chris Loer · Answer 44 · Thu Dec 08 2016 04:23:58 GMT+0800 (China Standard Time)

mapbox/mapbox-gl-js#3758 contains an implementation of Arabic shaping and bidirectional layout for gl-js. We're not ready to merge it yet as we work through the performance implications, but if you're interested in seeing our progress or providing feedback, please take a look!

Igal · Answer 45 · Sun Jan 15 2017 19:21:27 GMT+0800 (China Standard Time)

@ChrisLoer it's been more then a month since last update on this issue. Can you provide an update as for where it stands, what's remaining etc?

Also, relating to performance:
Is there a way to get some solution out there, then iterate and improve performance?
Perhaps this should be open for discussion, as quite a few people & projects are waiting for this fix (it's been more then a year since this issue was created). All projects which consider the middle east as important are pretty much prevented from relying on mapbox-gl. How much of a problem that is for mapbox, I'm not sure, but I do think the priority of this issue should be re-evaluated

Chris Loer · Answer 46 · Mon Jan 16 2017 00:49:49 GMT+0800 (China Standard Time)

@knigal For gl-js, we've settled on the idea of loading the support as a plugin (as a way to get the functionality out there as we still keep working to improve the performance). The changes and documentation are ready, we're just finalizing the relatively minor issue of how to name the plugin (mapbox-gl-arabic-text is the leading contender right now). This should go in very soon and be available in our February release of gl-js.

For gl-native, the changes are still waiting to go out as part of the Android SDK 4.3.0 and iOS SDK 3.5.0 releases, still targeted for the early months of this year.

Minh Nguyễn · Answer 47 · Mon Jan 16 2017 02:19:06 GMT+0800 (China Standard Time)

If you’re interested in trying out this functionality ahead of time on a mobile or desktop platform, check out our instructions for building the SDKs yourself:

https://github.com/mapbox/mapbox-gl-native/tree/master/platform/android#contributing-to-the-sdk
https://github.com/mapbox/mapbox-gl-native/blob/master/platform/ios/INSTALL.md
https://github.com/mapbox/mapbox-gl-native/blob/master/platform/macos/INSTALL.md

Please file any issues you see in the mapbox-gl-native repository.

Jim Montgomery · Answer 48 · Mon Jan 16 2017 02:23:04 GMT+0800 (China Standard Time)

@ChrisLoer perhaps your team could consider mapbox-gl-rtl-text as the plugin name if it isn't already. Save a few letters.

@1ec5 @ChrisLoer how can we test the update to gl-js (that's slated to come in February)?

Thanks for this update and the effort as well. We can now map plans for 2017.

Igal · Answer 49 · Mon Jan 16 2017 06:43:26 GMT+0800 (China Standard Time)

Q: @ChrisLoer Will the upcoming release display correctly Hebrew in addition to Arabic? If so, mapbox-gl-rtl-text would be a better name (sincemapbox-gl-arabic-text implies Arabic-only)

btw only mapbox-gl.js is relevant to my projects. Thus adding a +1 on @jimmont 's request for a way to preview the .js plugin. I'd be happy to try it and post feedback

Lucas Wojciechowski · Answer 50 · Wed Jan 18 2017 04:58:54 GMT+0800 (China Standard Time)

Closing this ticket as part of an effort to merge this repo into the mapbox-gl-js repo. Work on this project is nearing completion. Status updates and conversation will continue at mapbox/mapbox-gl-js#3708

Lucas Wojciechowski · Answer 51 · Thu Jan 19 2017 05:35:02 GMT+0800 (China Standard Time)

The initial phase of support for right-to-left and Arabic script is now complete.

The primary tracking issue for the remaining complex text challenges is now: mapbox/mapbox-gl-native#7774

johnnybegood7 · Answer 52 · Thu Mar 30 2017 02:46:24 GMT+0800 (China Standard Time)

Any consideration of using Graphite2 to handle this on top of FT?

Chris Loer · Answer 53 · Thu Mar 30 2017 03:37:31 GMT+0800 (China Standard Time)

@johnnybegood7 Our assumption is that if we were adding support for fonts that used Graphite, it would be through the Harfbuzz wrapper of Graphite (see discussion in mapbox/mapbox-gl-native#7774).

Abd Alrhman Bazrtwo · Answer 54 · Tue Apr 25 2017 22:16:35 GMT+0800 (China Standard Time)

was this problem solved for android SDK ? :)

Antonio Zugaldia · Answer 55 · Tue Apr 25 2017 22:40:56 GMT+0800 (China Standard Time)

@3bo0o0odee We integrated ICU to support bidirectional text layout and Arabic text shaping with mapbox/mapbox-gl-native#6984 which is available in the Android SDK 5.x series. The remaining complex text work is ongoing and tracked on mapbox/mapbox-gl-native#7774.

Mahdi astanei · Answer 56 · Sat Aug 05 2017 13:56:29 GMT+0800 (China Standard Time)

Is there any body to create a step by step fix for android? can we use Mapbox on android and USE for Persian or any Arabic countries?

Konstantin Käfer · Answer 57 · Mon Aug 07 2017 16:15:31 GMT+0800 (China Standard Time)

@MahdiAstanei Upgrading Mapbox to the latest released version (currently 5.1) will include this fix. You don't have to do any special configuration.

Mushon Zer-Aviv · Answer 58 · Mon Aug 07 2017 16:30:16 GMT+0800 (China Standard Time)

Thanks for making progress there but what about supporting this at Mapbox Studio and supporting the static (bitmap) tiles? We really can't design like this… @ChrisLoer can you give us an update?

Chris Loer · Answer 59 · Tue Aug 08 2017 00:21:11 GMT+0800 (China Standard Time)

@mushon The plan right now is that Studio support won't come until we've integrated RTL text into the core library. We are exploring using web assembly as a way to integrate more "native" code into GL JS -- any solution we come up with there will probably include RTL text. I know that "just enable the plugin in Studio" would be a more immediate solution for you, and that's still an open discussion, but it's not our current plan. An alternative short term solution we've discussed is having Studio preview raster tiles generated by api-gl, which would include the shaping support.

When you say static (bitmap) tiles, do you mean tiles generated by api-gl? They should already be fixed.

Mushon Zer-Aviv · Answer 60 · Tue Aug 08 2017 01:19:38 GMT+0800 (China Standard Time)

@ChrisLoer I would really appreciate at least a browser plugin that would give us something to work with in studio until you implement a more robust solution.
As for the bitmap tiles, you are right, they are processed correctly both for static images and for leaflet. It's just that the static image interface is inconsistent as it currently shows the wrong preview to what it actually generates.