uber / h3-js

h3-js provides a JavaScript version of H3, a hexagon-based geospatial indexing system.

Home Page:https://uber.github.io/h3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] How to efficiently query using H3

mrousavy opened this issue · comments

Hi!

I'm creating an app which shows nearby posts in a customizeable radius (with accuracy of ~100m). I've found h3 to be a very efficient library, but I couldn't quite figure out how to efficiently query my firestore NoSQL database using H3, since firestore is very limited in querying capabilities.

I've currently come up with this solution:

export async function loadNearbyPosts(coordinates: Coordinates, radius: number): Promise<Post[]> {
  const h3 = geoToH3(coordinates.latitude, coordinates.longitude, H3_RESOLUTION);
  console.log(`${JSON.stringify(coordinates)} -> ${h3}`);
  const neighbours = kRing(h3, 10); // <-- how do I convert my 'radius' in metres to the k-ring range ('10')?
  console.log(`Neighbours of ${h3} include: ${JSON.stringify(neighbours)}`);

  const batchedNeighbours: string[][] = [];
  for (let i = 0; i < neighbours.length; i += 10) batchedNeighbours.push(neighbours.splice(i, 10));

  console.log(`Batched to size of 10s: ${JSON.stringify(batchedNeighbours)}`);
  console.log(`Running ${batchedNeighbours.length} queries...`);

  const start = global.nativePerformanceNow();
  // how do I remove this batching and instead use range checks? something like `greater than this h3 and smaller than this h3`
  const queries = batchedNeighbours.map((n) => firestore().collection('posts').where('location.h3', 'in', n).get());
  const results = await Promise.all(queries);
  const end = global.nativePerformanceNow();

  const docs: Post[] = [];
  results.forEach((r) => docs.push(...r.docs.map((d) => build<Post>(d))));

  console.log(`Executed ${batchedNeighbours.length} queries and received ${docs.length} results, all within ${end - start}ms.`);
  return docs;
}

While this does indeed return results for me, it is very inefficient. For this simple query, it actually executes 17 queries (!!) because firestore has a limit of maximum 10 items in an array in the in query (that's why I'm batching the neighbours into arrays of 10 elements), and I'm comparing for exact matches, so I'm forced to using the same precision for all my h3 hexagons.

Using geohashes, I can find by range since they're alphabetically sorted. E.g. I can filter for bc -> bc~ which gives me all squares that start with bc.

Now let's get to my actual questions:

  1. Is it possible to "find by range" using h3, similar to geohashes? That way I can remove the batching code and don't have to run 17 queries for a simple "nearby" lookup. I couldn't really find a good explanation of the h3 algorithm on how the tiles are sorted, but if I could reduce it to some smaller amount of queries (e.g. everything in range 871e064d5ffffff -> 871e06498ffffff, plus everything in range of 871e0649bffffff -> 871e06509ffffff... or in other words "larger than this h3 but smaller than this h3")
  2. How can I actually convert a range in metres to k-ring/hex-ring ranges?

Thanks for your help!

You can do something similar with H3. If you look at the bit layout of an H3 index at the bottom of this page you can see that the highest precision bits for the indexing are at the end. And looking at the resolution table resolution 10 has a long radius of ~66 meters (short radius would be ~57m), so the diameter would be ~132m at max and ~114m at min, so I think that would be the appropriate sizing.

This is equivalent to the geohashing approach: indexing your data at resolution 10 and masking all of the bits except the actual resolution bits that you care about for the radius you've decided on based on the resolution table, but the centerpoint would be the center of the fixed hexagon that your query point happened to fall into, which could be up to 66 meters away (if the point just barely fell in the range). If that's fine with you, you can stop here.

So I'd recommend indexing your data at resolution 11 and then querying at resolution 11 plus the 1 k-ring around it to reconstruct the same area as a resolution 10 hexagon, but with a better centering around your query point (it will only be off by up to ~25 meters with this approach). If you could use resolution 12 you'd get that error down to just ~1 meter, but that requires too many indexes for firestore, apparently, so you'd have to break your query into two. This is assuming that you can do a list of range queries, though. If possible then you can improve the accuracy for all of the radii (for the chosen resolution query one resolution below plus the k-ring, then bitmask it for your query).

If you're further taking the returned list and filtering it to the exact radius the user queried, I would even more highly recommend the k-ring approach -- you only have to filter the outer ring of results, not the inner hexagon (or any inner rings). Here I would recommend going to resolution 12 for your base indexing and then do two queries: the k-ring of 1 around the centerpoint and then the "hexRing" (the hollow ring) of 2 around that; only the second query needs the actual radius filter pass and the amount of post-query filtering you have to do has been cut in ~half.

In reference to the indexing order, you can see a demonstration here: https://observablehq.com/@nrabinowitz/h3-indexing-order

You shouldn't rely on indexing order for range queries, because if you cross the boundary of a coarse-resolution ancestor cell you could get one or more cells in your k-ring with substantially different index numbers, and you wouldn't want all the indexes in between. The rule for H3 is that indexes that are numerically close are geographically close, but the reverse is not necessarily true.

Thanks for your quick and detailed responses. I've tried to play around with this approach in a sample environment, but I couldn't get it working. Note:

  1. I cannot do something like "starts with string" in firestore, it's a NoSQL database. I can only use direct comparisons (==), includes (array-includes, in) or range operations (startsAt, endsAt). The range operations are useful because they're not really direct comparisons, e.g. I can query for everything between 626528546153906175 and 626528546154536959 (those are h3 indices in BigInt representation, but it also works for strings using ASCII-order comparisons). (see query limitations in firestore) I've tried to use the range operations similar to geohashes, which leads to my second point:
  2. So I have my kring around my current coordinates, the idea is to sort this in ascending order, and for every element that comes directly after the previous element I append it to my query's endsAt. If there's a "gap" in between the indices, I start with a new query with that startsAt, did I get that right?

e.g. those are my krings:

  8b1e064ab2edfff
  8b1e064ab2ecfff
  8b1e064ab2e8fff
  8b1e064ab2e9fff
  8b1e064ab25afff
  8b1e064ab25efff
  8b1e064ab253fff

which represent the following numbers:

  626528546154536959
  626528546154532863
  626528546154516479
  626528546154520575
  626528546153934847
  626528546153951231
  626528546153906175

I don't really get where I need compact in this case, since again, I can't use something like string startswith in my database.

So you would need to bitmask, because "before" (higher in the numeric representation) the set of index values in the number, there's also a 4-bit field that stores the resolution itself. That would need to be bitmasked out, or replaced with the resolution that you're indexing the data into firestore. The "b" in "8b1e..." is that resolution segment, which is resolution 11.

Since firestore is so limited, I would recommend dropping the k-ring idea and querying for a single "index", but storing the indexes in a "chopped" form. Compute the index for each datapoint you're storing, then bitmask everything to zero except the resolution bits, and then only from resolutions 0 to 11. Now when you query for whichever radius is appropriate, you snap the query to the resolution one level "up" from the specified radius and bitmask that H3 index to only the resolution bits that are set. Then you copy that trimmed H3 index and add a bitmask for all of the resolutions below the search resolution down to resolution 11 inclusive and you flip those bits all to 1s. This creates your min-max range that you can use in a compound query: https://firebase.google.com/docs/firestore/query-data/queries#compound_queries

Basically myTable.where('trimmedIndex', '>=', lowerBoundTrimmedIndex).where('trimmedIndex', '<=', upperBoundTrimmedIndex)

So the whole thing is kinda like:

const h3Index = h3.geoToH3(lat, lng, res)
let queryMask = 0
for (let i = 0; i <= res; i++) {
  queryMask |= resMasks[i] // The resMasks array would be constants, I can figure them out if you want to take this approach
}
const lowerBoundTrimmedIndex = h3Index & queryMask
let upperBoundTrimmedIndex = lowerBoundTrimmedIndex
for (let i = res + 1; i <= 11; i++) { // Assuming the data is indexed at resolution 11
  upperBoundTrimmedIndex |= resMasks[res + 1]
}
const myData = await myTable.where('trimmedIndex', '>=', lowerBoundTrimmedIndex).where('trimmedIndex', '<=', upperBoundTrimmedIndex)

This has the large offset issue. You can resolve that by doing 7 queries with a k-ring one resolution finer, but it looks like it has to be 7 separate queries in firestore because of their strange query limitations.

Yes, it sucks that they're still so limited after such a long time of "being in beta". @nrabinowitz and @dfellis thank you so much for your explanations, those look like very clever solutions/workarounds to my problem, I'm going to try this now. 👍

EDIT: What resMask would you use for this?

So the resMask is an array and these are the constants:

const resMask = [
  246290604621824n,
  30786325577728n,
  3848290697216n,
  481036337152n,
  60129542144n,
  7516192768n,
  939524096n,
  117440512n,
  14680064n,
  1835008n,
  229376n,
  28672n,
  3584n,
  448n,
  56n,
  7n,
]

The n at the end of those numbers is on purpose; the only way for this to work (I forgot until just now) is if it's all BigInt in the browser, converting to a string at the last moment.

So you'll want to modify that const h3Index ... above like so:

const h3Index = BigInt('0x' + h3.geoToH3(lat, lng, res))

After that put an n at the end of all of the digits to use BigInt mode for them and then when you run the query you turn them back into a string like this:

const myData = await myTable.where('trimmedData', '>=', lowerBoundTrimmedIndex.toString()).where('trimmedIndex', '<=', upperBoundTrimmedIndex.toString())

The trimmedIndex needs to be populated by bitwise ORing all of the resMask values together and then bitwise ANDing that with the H3Index output (after converting to BigInt). This mask is a constant, though, so you can calculate it once and be done with it. Eg, for res 11 as the base index it would be: 281474976706560n (or 0xFFFFFFFFF000n). This is what you would use to calculate the numbers to store in firestore for this query.