uber / h3

Hexagonal hierarchical geospatial indexing system

Home Page:https://h3geo.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bboxHexEstimate: Calculated area is way too big

sebisteiner opened this issue · comments

I noticed that maxPolygonToCellsSize is giving me numbers way too big for my input geometries. I could track it down to bboxHexEstimate. There the calculated area is way to big.

Here is an example geometry:
Polygon ((7.54402069689999966 47.13895667900000319, 7.54419258430000017 47.13895667900000319, 7.54419258430000017 47.15093149700000197, 7.54402069689999966 47.15093149700000197, 7.54402069689999966 47.13895667900000319))

CleanShot 2022-10-07 at 16 05 38

It is a street. So it is quite long but narrow.

bboxHexEstimate calculates a distance (bbox diagonal) of 1.33km. This sounds about right. But the calculated area then is 103.87km2 which is way too big. We call polygonToCells with an index resolution of 15 and this causes memory problems in our application.

I do not understand your formula to calculate the bbox area:
double a = d * d / fmin(3.0, fabs((p1.lng - p2.lng) / (p1.lat - p2.lat)));

But it seems to work really bad for bboxes where one side is much much longer than the other.

Do you have any idea how to fix this?

As it happens, I was also looking at this function recently ^^

From my understanding there are two issues here:

  • the quotient of the bbox sides is wrong
  • the capping to 3 seems to be hurting (at least in this case).

The ratio is always computed as lng diff/lat diff, which gives value < 1 for vertical narrow shapes like yours (and since we divide the squarred diagonal by this ratio, the result is a larger number).

With the current formula we get:

  • estimated area: ~123km²
  • 351,145,612 cells

Which is way too big, indeed...

If we fix the formula by making sure we always compute ratio as length/width:

double d1 = fabs(p1.lng - p2.lng);
double d2 = fabs(p1.lat - p2.lat);
double length = fmax(d1, d2);
double width = fmin(d1, d2);
double ratio = length/width;
// Derived constant based on: https://math.stackexchange.com/a/1921940
// Clamped to 3 as higher values tend to rapidly drag the estimate to zero.
double a = (d * d) / fmin(3, ratio);

We get better result:

  • estimated area: ~0.6km²
  • 1,680,136 cells

But we can get an even better estimation, by removing the capping to 3:

double d1 = fabs(p1.lng - p2.lng);
double d2 = fabs(p1.lat - p2.lat);
double length = fmax(d1, d2);
double width = fmin(d1, d2);
double ratio = length/width;
// Derived constant based on: https://math.stackexchange.com/a/1921940
double a = (d * d) / ratio;

Now we get

  • estimated area: ~0.025km²
  • 72,362 cells

Given that the polyfill of your shape actually returns ~20k cells at res 15, we only over-estimate by a factor 3.5 (instead of ~x17,000)

Now those are just my 2 cents, I'm not an expert on H3 internals though ^^
(like maybe removing the capping to 3 have adverse effects in other cases).

Thanks for your detailed analysis @grim7reaper.

I would be more than happy even with your first example where capping to 3 is still present.

Shall I create a pull request containing these changes?

At first I didn't understand how the formula from the linked Stackexchange issue was reduced from $d^2 \over \sqrt{2 + r^2 + {1 \over r^2}}$ to $d^2 \over r$. But I think that's just a (rough) approximation, right?

Yes, I think you can make a PR (not sure if it breaks any existing tests though).
You can probably also add your shape as a unit test to avoid future regression.

But I think that's just a (rough) approximation, right?

Yup, that's my guess too, probably considered that $2+\frac{1}{r^2}$ were negligible in this case
To be confirmed by the authors.