[GraphBolt] TorchData Pytorch support
mfbalin opened this issue · comments
🔨Work Item
IMPORTANT:
- This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
- DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
pytorch/pytorch#124907 (comment)
Here, torch developers say that future versions of pytorch may not support torchdata properly. It might become a problem to support later PyTorch versions.
Previously we're trying to deprecate torchdata
with torch.utils.data
for datapipe-related operations as active development and release of torchdata
have been paused(mentioned here).
So for now, both pytorch and torchdata team are deprecating torchdata?
I don't know the exact details. We need to look into it as it is a crucial dependency.
https://discuss.dgl.ai/t/importerror-cannot-import-name-dill-available-from-torch-utils-data-datapipes-utils-common/4363/2 might be a related problem, I saw a PR in torch repo that fix this issue.
The way we implement DataLoader (https://github.com/dmlc/dgl/blob/658b2086b09bbd76c3d3f488af2b155a1c921052/python/dgl/graphbolt/dataloader.py#L79C7-L79C17) right now isn't perfect. It makes a lot of assumption that might cause problems later. Once those problems hit, we should redesign it. We held off because the torch.data already does a good job, but if we have to, we'll tackle it then.