elixir-cldr / cldr

Elixir implementation of CLDR/ICU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cldr.Territory.from_subdivision_code!/2

tomciopp opened this issue · comments

I'm uncertain whether this should be posted here, or within the Cldr Territories library. But I'm running into an issue and I'm not certain whether it is a code issue, a data issue, or an integration issue.

The basic problem that I'm having is that I have a list of countries and I need to present their subdivisions to users so that can input their address consistently.

However, I run into an issue when I get the Cldr.known_territory_subdivisions/2 of a country and attempt to lookup all of the subdivisions from the codes provided. We can use the United States as an example, if I lookup the known territory subdivisions I will get a list that includes "uspr" which should return "Puerto Rico" but instead it returns a Cldr.UnknownSubdivisionError

iex(1)> Cldr.Territory.from_subdivision_code!("uspr", Myapp.Cldr)
** (Cldr.UnknownSubdivisionError) The locale :en has no translation for :uspr.

I fixed the problem for my particular app with a hacky workaround, but there should be a better solution and it's likely that other people will run into this problem. I think the source of the error is the structure of the data from unicode itself. I was able to find the missing entries by referencing the following page: https://www.unicode.org/cldr/charts/43/supplemental/territory_subdivisions.html

It looks like they do some weird pointer system for outlying territories instead of just putting them in the stupid table. So theoretically this could be fixed but I'm sure it would be a pain in the ass. Also I ran into a few entries that literally don't exist using that pointer system ("usum" / U.S. Outlying Islands and "frwf" / Wallis & Futuna) so I'm not even certain this could work.

If you need any more info or have any other suggestions on how to solve this problem, please let me know.

@tomciopp, thanks for the issue - which is more "interesting" that I expected. But like pretty much everything in CLDR, logical. Also copying @Schultzer since I believe the resolution will need to be made in ex_cldr_territories.

Issue Summary

  • Cldr.known_territory_subdivisions/0 returns the list of valid subdivisions, including uspr. So far so good.
  • uspr, being Puerto Rico, actually refers to a self-governing territory and as such has its own ISO 3166 territory code, PR.
  • CLDR provides an alias facility, the data of which is returned by Cldr.Config.aliases/0 and if we examine it we can see that all of the US (and other territories) self-governing regions have aliases (these being in order for the US: Northern Marianas Islands, Guam, US Virgin Islands, American Samoa, Puerto Rico, US Minor Outlying Islands):
iex> Cldr.Config.aliases[:subdivision] |> Enum.filter(fn {k, _} -> String.starts_with?(k, "us") end)
[
  {"usmp", "MP"},
  {"usgu", "GU"},
  {"usvi", "VI"},
  {"usas", "AS"},
  {"uspr", "PR"},
  {"usum", "UM"}
]

Suggested Resolution

Therefore the correct resolution is for ex_cldr_territories to refer to the subdivision aliases when resolving Cldr.Territory.from_subdivision_code/2. This would involve:

  1. Caching the subdivision aliases in the library (Cldr.Config.aliases/0 should only be called at compile-time)
  2. If there is a failure in resolving a valid subdivision code, check the alias to see if there is a territory alias and if so, resolve its name in the appropriate locale.

I'm sure @Schultzer would appreciate a PR. If that's not practical for you, hopefully he can add this without too much effort. Worst case I will try for a PR this weekend.

@tomciopp and @kipcole9 thanks for this detailed report, I’m planning to cut a new release by the end of this week, so I’ll make sure this is included as well!

Thanks very much @Schultzer, that sounds great.

Note that the values returned from Cldr.Config.aliases[:subdivision] are, unfortunately, String.t/0 territory codes. A territory code is required to be an atom (the type t:Cldr.Locale.territory_code/0). I will fix this in the next ex_cldr release. But it means that your code using these aliases should convert the territory code to an atom if its a string, but also check if its an atom (as it will be in the next release) and if so leave it alone.

That is:

Current ex_cldr: {"usmp", "MP"} is returned and should be interpreted as {"usmp", :MP}
Next release ex_cldr: {"usmp", :MP} will be returned.

I'll leave this issue open until. you get a chance to publish a new release that closes this issue.

Another implementation note. The following code will correctly embed the subdivision aliases into the appropriate module. It will convert territory codes (2 letters) into atoms but leave the rest alone. It will work in current ex_cldr and in future ex_cldr:

  @subdivision_aliases Cldr.Config.aliases()
    |> Map.fetch!(:subdivision)
    |> Enum.map(fn 
      {k, v} when is_binary(v) ->
        if String.length(v) == 2, do: {k, String.to_atom(v)}, else: {k, v}
      other -> other
    end)
    |> Map.new()
    
  def subdivision_aliases do
    @subdivision_aliases
  end

Last implementation note for @Schultzer:

The alias mechanism is also appropriate for all subdivision lookups. ie recursively attempt to resolve a subdivision code through the subdivision aliases until:

  • there is no alias (use the subdivision code that was used for the lookup),
  • there is a subdivision alias so treat the alias as the subdivision code - then keep recursively attempting to resolve aliases. If the resolved subdivision alias is a list, use the first element in the list (this is possibly not the best solution but it is pragmatic).
  • there is an subdivision alias and its a territory code (because its an atom) - use the territory code