dataset download errors
huyuaaaray opened this issue · comments
Yu Hu commented
Hi,
I came across a dataset download error when I use the script that this readme file provided:
from torch_frame.datasets import Yandex
from torch_frame.data import DataLoader
dataset = Yandex(root='/Users/huyu/Github/test/pytorch-frame/adult', name='adult')
dataset.materialize()
train_dataset = dataset[:0.8]
train_loader = DataLoader(train_dataset.tensor_frame, batch_size=128,
shuffle=True)
then got the error:
gaierror: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
URLError Traceback (most recent call last)
Cell In[7], [line 4](vscode-notebook-cell:?execution_count=7&line=4)
[1](vscode-notebook-cell:?execution_count=7&line=1) from torch_frame.datasets import Yandex
[2](vscode-notebook-cell:?execution_count=7&line=2) from torch_frame.data import DataLoader
----> [4](vscode-notebook-cell:?execution_count=7&line=4) dataset = Yandex(root='[/Users/huyu/Github/test/pytorch-frame/adult](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/adult)', name='adult')
[5](vscode-notebook-cell:?execution_count=7&line=5) dataset.materialize()
[6](vscode-notebook-cell:?execution_count=7&line=6) train_dataset = dataset[:0.8]
File [~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:215](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:215), in Yandex.__init__(self, root, name)
[213](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:213) self.root = root
[214](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:214) self.name = name
--> [215](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:215) path = self.download_url(osp.join(self.base_url, self.name + '.zip'),
[216](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:216) root)
[217](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:217) df, col_to_stype = get_df_and_col_to_stype(path)
[218](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/datasets/yandex.py:218) if name in self.regression_datasets:
File [~/Github/test/pytorch-frame/torch_frame/data/dataset.py:472](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:472), in Dataset.download_url(url, root, filename, log)
[453](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:453) @staticmethod
[454](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:454) def download_url(
[455](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:455) url: str,
(...)
[459](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:459) log: bool = True,
[460](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:460) ) -> str:
[461](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:461) r"""Downloads the content of :obj:`url` to the specified folder
[462](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:462) :obj:`root`.
[463](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:463)
(...)
[470](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:470) the console. (default: :obj:`True`)
[471](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:471) """
--> [472](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/dataset.py:472) return torch_frame.data.download_url(url, root, filename, log=log)
File [~/Github/test/pytorch-frame/torch_frame/data/download.py:44](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:44), in download_url(url, root, filename, log)
[41](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:41) os.makedirs(root, exist_ok=True)
[43](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:43) context = ssl._create_unverified_context()
---> [44](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:44) data = urllib.request.urlopen(url, context=context)
[46](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:46) with open(path, 'wb') as f:
[47](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/Github/test/pytorch-frame/torch_frame/data/download.py:47) while True:
File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:216](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:216), in urlopen(url, data, timeout, cafile, capath, cadefault, context)
[214](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:214) else:
[215](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:215) opener = _opener
--> [216](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:216) return opener.open(url, data, timeout)
File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:519](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:519), in OpenerDirector.open(self, fullurl, data, timeout)
[516](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:516) req = meth(req)
[518](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:518) sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> [519](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:519) response = self._open(req, data)
[521](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:521) # post-process response
[522](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:522) meth_name = protocol+"_response"
File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:536](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:536), in OpenerDirector._open(self, req, data)
[533](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:533) return result
[535](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:535) protocol = req.type
--> [536](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:536) result = self._call_chain(self.handle_open, protocol, protocol +
[537](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:537) '_open', req)
[538](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:538) if result:
[539](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:539) return result
File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:496](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:496), in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
[494](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:494) for handler in handlers:
[495](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:495) func = getattr(handler, meth_name)
--> [496](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:496) result = func(*args)
[497](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:497) if result is not None:
[498](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:498) return result
File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1391](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1391), in HTTPSHandler.https_open(self, req)
[1390](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1390) def https_open(self, req):
-> [1391](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1391) return self.do_open(http.client.HTTPSConnection, req,
[1392](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1392) context=self._context, check_hostname=self._check_hostname)
File [~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1351](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1351), in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
[1348](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1348) h.request(req.get_method(), req.selector, req.data, headers,
[1349](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1349) encode_chunked=req.has_header('Transfer-encoding'))
[1350](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1350) except OSError as err: # timeout error
-> [1351](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1351) raise URLError(err)
[1352](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1352) r = h.getresponse()
[1353](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/~/anaconda3/envs/pytorch/lib/python3.10/urllib/request.py:1353) except:
URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
how could I resolve this problem?
Thanks,
Yu
Zecheng Zhang commented
I can download the dataset to my local laptop
>>> from torch_frame.datasets import Yandex
>>> dataset = Yandex("/tmp/", name='adult')
Downloading https://data.pyg.org/datasets/tables/revisiting_data/adult.zip
>>> dataset.materialize()
Yandex(name='adult')
>>> from torch_frame.data import DataLoader
>>> train_dataset = dataset[:0.8]
>>> train_loader = DataLoader(train_dataset.tensor_frame, batch_size=128, shuffle=True)
IMO it's some kinds of network issue.
Akihiro Nitta commented
@huyuaaaray Your Yandex(root=...)
doesn't look correct to me:
dataset = Yandex(root='[/Users/huyu/Github/test/pytorch-frame/adult](https://file+.vscode-resource.vscode-cdn.net/Users/huyu/Github/test/pytorch-frame/adult)', name='adult')
I'd suggest double-checking the code you're running in the notebook cell.
Weihua Hu commented
Closing this for now. @huyuaaaray feel free to re-open if it still does not work for you.