`predict_tile()` output is incorrect when multiple GPUs are present
ethanwhite opened this issue · comments
When running predict_tile()
in the presence of multiple GPUs the multiple GPUs are automatically detected and used. The output from these runs differs from runs with a single GPU in terms of the numbers and positions of the predicted boxes. This can be confirmed with the following reprex:
from deepforest import main
from deepforest import get_data
model = main.deepforest()
model.use_release()
raster_path = get_data("OSBS_029.tif")
predicted_raster = model.predict_tile(raster_path, return_plot = False, patch_size=300, patch_overlap=0.25)
predicted_raster.to_csv("boxes.csv")
Running on an interactive instance with 1 GPU (on the HiPerGator) we get:
In [20]: one_gpu.sort(by = ['xmin', 'ymin'])
Out[20]:
shape: (94, 8)
┌─────┬───────┬───────┬───────┬───────┬───────┬──────────┬──────────────┐
│ ┆ xmin ┆ ymin ┆ xmax ┆ ymax ┆ label ┆ score ┆ image_path │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str ┆ f64 ┆ str │
╞═════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪══════════════╡
│ 86 ┆ 0.0 ┆ 0.0 ┆ 16.0 ┆ 18.0 ┆ Tree ┆ 0.27725 ┆ OSBS_029.tif │
│ 50 ┆ 0.0 ┆ 203.0 ┆ 7.0 ┆ 223.0 ┆ Tree ┆ 0.430017 ┆ OSBS_029.tif │
│ 54 ┆ 0.0 ┆ 260.0 ┆ 21.0 ┆ 287.0 ┆ Tree ┆ 0.422261 ┆ OSBS_029.tif │
│ 78 ┆ 0.0 ┆ 357.0 ┆ 11.0 ┆ 387.0 ┆ Tree ┆ 0.297332 ┆ OSBS_029.tif │
│ 36 ┆ 2.0 ┆ 156.0 ┆ 42.0 ┆ 205.0 ┆ Tree ┆ 0.543535 ┆ OSBS_029.tif │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 87 ┆ 386.0 ┆ 77.0 ┆ 400.0 ┆ 104.0 ┆ Tree ┆ 0.274974 ┆ OSBS_029.tif │
│ 75 ┆ 387.0 ┆ 174.0 ┆ 400.0 ┆ 200.0 ┆ Tree ┆ 0.309699 ┆ OSBS_029.tif │
│ 37 ┆ 388.0 ┆ 377.0 ┆ 400.0 ┆ 398.0 ┆ Tree ┆ 0.529458 ┆ OSBS_029.tif │
│ 43 ┆ 392.0 ┆ 135.0 ┆ 399.0 ┆ 155.0 ┆ Tree ┆ 0.482293 ┆ OSBS_029.tif │
│ 88 ┆ 393.0 ┆ 352.0 ┆ 400.0 ┆ 375.0 ┆ Tree ┆ 0.274659 ┆ OSBS_029.tif │
└─────┴───────┴───────┴───────┴───────┴───────┴──────────┴──────────────┘
Running on an interactive instance with 2 GPU's we get:
In [21]: two_gpu.sort(by = ['xmin', 'ymin'])
Out[21]:
shape: (86, 8)
┌─────┬───────┬───────┬───────┬───────┬───────┬──────────┬──────────────┐
│ ┆ xmin ┆ ymin ┆ xmax ┆ ymax ┆ label ┆ score ┆ image_path │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str ┆ f64 ┆ str │
╞═════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪══════════════╡
│ 45 ┆ 0.0 ┆ 103.0 ┆ 7.0 ┆ 123.0 ┆ Tree ┆ 0.430017 ┆ OSBS_029.tif │
│ 55 ┆ 0.0 ┆ 200.0 ┆ 17.0 ┆ 226.0 ┆ Tree ┆ 0.406657 ┆ OSBS_029.tif │
│ 64 ┆ 0.0 ┆ 250.0 ┆ 11.0 ┆ 275.0 ┆ Tree ┆ 0.341502 ┆ OSBS_029.tif │
│ 67 ┆ 0.0 ┆ 279.0 ┆ 15.0 ┆ 312.0 ┆ Tree ┆ 0.337557 ┆ OSBS_029.tif │
│ 84 ┆ 0.0 ┆ 367.0 ┆ 6.0 ┆ 389.0 ┆ Tree ┆ 0.257209 ┆ OSBS_029.tif │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 39 ┆ 288.0 ┆ 38.0 ┆ 300.0 ┆ 57.0 ┆ Tree ┆ 0.480631 ┆ OSBS_029.tif │
│ 32 ┆ 288.0 ┆ 377.0 ┆ 300.0 ┆ 398.0 ┆ Tree ┆ 0.529458 ┆ OSBS_029.tif │
│ 46 ┆ 289.0 ┆ 2.0 ┆ 300.0 ┆ 24.0 ┆ Tree ┆ 0.428087 ┆ OSBS_029.tif │
│ 38 ┆ 292.0 ┆ 135.0 ┆ 299.0 ┆ 155.0 ┆ Tree ┆ 0.482293 ┆ OSBS_029.tif │
│ 81 ┆ 293.0 ┆ 352.0 ┆ 300.0 ┆ 375.0 ┆ Tree ┆ 0.274659 ┆ OSBS_029.tif │
└─────┴───────┴───────┴───────┴───────┴───────┴──────────┴──────────────┘
This shows two issues:
- There are fewer boxes on 2 GPUs than on 1 GPU
- The boxes are shifted. Compare the last two rows. They have the same scores and y coordinates, but the x coordinates are reduced by 100 when running on 2 GPUs. Line 2 on 1 GPU and Line 1 on 2 GPUs are likely also the same tree and demonstrate a more complex shift in positions near 0,0.
This issue was discovered by @henrykironde, I'm just helping by putting together the reprex.
This has been addressed by forcing a single GPU for prediction (#653), but if anyone is familiar enough with multiple-GPU Pytorch lightning inference to help make it work we'd appreciate the help.