pyg-team / pytorch-frame

Tabular Deep Learning Library for PyTorch

Home Page:https://pytorch-frame.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dataset download errors

huyuaaaray opened this issue · comments

Hi,
I came across a dataset download error when I use the script that this readme file provided:

from torch_frame.datasets import Yandex
from torch_frame.data import DataLoader

dataset = Yandex(root='/Users/huyu/Github/test/pytorch-frame/adult', name='adult')
dataset.materialize()
train_dataset = dataset[:0.8]
train_loader = DataLoader(train_dataset.tensor_frame, batch_size=128,
                          shuffle=True)

then got the error:

gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
Cell In[7], [line 4](vscode-notebook-cell:?execution_count=7&line=4)
      [1](vscode-notebook-cell:?execution_count=7&line=1) from torch_frame.datasets import Yandex
      [2](vscode-notebook-cell:?execution_count=7&line=2) from torch_frame.data import DataLoader
----> [4](vscode-notebook-cell:?execution_count=7&line=4) dataset = Yandex(root='[/Users/huyu/Github/test/pytorch-frame/adult](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/adult)', name='adult')
      [5](vscode-notebook-cell:?execution_count=7&line=5) dataset.materialize()
      [6](vscode-notebook-cell:?execution_count=7&line=6) train_dataset = dataset[:0.8]

File [~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:215](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:215), in Yandex.__init__(self, root, name)
    [213](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:213) self.root = root
    [214](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:214) self.name = name
--> [215](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:215) path = self.download_url(osp.join(self.base_url, self.name + '.zip'),
    [216](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:216)                          root)
    [217](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:217) df, col_to_stype = get_df_and_col_to_stype(path)
    [218](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:218) if name in self.regression_datasets:

File [~/Github/test/pytorch-frame/torch_frame/data/dataset.py:472](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:472), in Dataset.download_url(url, root, filename, log)
    [453](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:453) @staticmethod
    [454](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:454) def download_url(
    [455](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:455)     url: str,
   (...)
    [459](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:459)     log: bool = True,
    [460](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:460) ) -> str:
    [461](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:461)     r"""Downloads the content of :obj:`url` to the specified folder
    [462](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:462)     :obj:`root`.
    [463](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:463) 
   (...)
    [470](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:470)             the console. (default: :obj:`True`)
    [471](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:471)     """
--> [472](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:472)     return torch_frame.data.download_url(url, root, filename, log=log)

File [~/Github/test/pytorch-frame/torch_frame/data/download.py:44](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:44), in download_url(url, root, filename, log)
     [41](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:41) os.makedirs(root, exist_ok=True)
     [43](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:43) context = ssl._create_unverified_context()
---> [44](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:44) data = urllib.request.urlopen(url, context=context)
     [46](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:46) with open(path, 'wb') as f:
     [47](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:47)     while True:

File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:216](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:216), in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    [214](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:214) else:
    [215](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:215)     opener = _opener
--> [216](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:216) return opener.open(url, data, timeout)

File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:519](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:519), in OpenerDirector.open(self, fullurl, data, timeout)
    [516](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:516)     req = meth(req)
    [518](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:518) sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> [519](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:519) response = self._open(req, data)
    [521](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:521) # post-process response
    [522](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:522) meth_name = protocol+"_response"

File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:536](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:536), in OpenerDirector._open(self, req, data)
    [533](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:533)     return result
    [535](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:535) protocol = req.type
--> [536](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:536) result = self._call_chain(self.handle_open, protocol, protocol +
    [537](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:537)                           '_open', req)
    [538](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:538) if result:
    [539](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:539)     return result

File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:496](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:496), in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    [494](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:494) for handler in handlers:
    [495](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:495)     func = getattr(handler, meth_name)
--> [496](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:496)     result = func(*args)
    [497](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:497)     if result is not None:
    [498](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:498)         return result

File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1391](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1391), in HTTPSHandler.https_open(self, req)
   [1390](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1390) def https_open(self, req):
-> [1391](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1391)     return self.do_open(http.client.HTTPSConnection, req,
   [1392](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1392)         context=self._context, check_hostname=self._check_hostname)

File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1351](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1351), in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   [1348](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1348)         h.request(req.get_method(), req.selector, req.data, headers,
   [1349](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1349)                   encode_chunked=req.has_header('Transfer-encoding'))
   [1350](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1350)     except OSError as err: # timeout error
-> [1351](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1351)         raise URLError(err)
   [1352](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1352)     r = h.getresponse()
   [1353](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1353) except:

URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>

how could I resolve this problem?

Thanks,
Yu

I can download the dataset to my local laptop

>>> from torch_frame.datasets import Yandex
>>> dataset = Yandex("/tmp/", name='adult')
Downloading https://data.pyg.org/datasets/tables/revisiting_data/adult.zip
>>> dataset.materialize()
Yandex(name='adult')
>>> from torch_frame.data import DataLoader
>>> train_dataset = dataset[:0.8]
>>> train_loader = DataLoader(train_dataset.tensor_frame, batch_size=128, shuffle=True)

IMO it's some kinds of network issue.

@huyuaaaray Your Yandex(root=...) doesn't look correct to me:

dataset = Yandex(root='[/Users/huyu/Github/test/pytorch-frame/adult](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/adult)', name='adult')

I'd suggest double-checking the code you're running in the notebook cell.

Closing this for now. @huyuaaaray feel free to re-open if it still does not work for you.