WinError 32 The process cannot access the file during load_dataset
elwe-2808 opened this issue · comments
Describe the bug
When I try to load the opus_book from hugging face (following the guide on the website)
from datasets import load_dataset, Dataset
dataset = load_dataset("Helsinki-NLP/opus_books", "en-fr", features=["id", "translation"])
I get an error:
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:/Users/Me/.cache/huggingface/datasets/Helsinki-NLP___parquet/ca-de-a39f1ef185b9b73b/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec.incomplete\\parquet-train-00000-00000-of-NNNNN.arrow'
Full stacktrace
AttributeError Traceback (most recent call last)
File c:\Users\Me\.conda\envs\ia\lib\site-packages\datasets\builder.py:1858, in ArrowBasedBuilder._prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, job_id)
[1857](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/builder.py:1857) _time = time.time()
-> [1858](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/builder.py:1858) for _, table in generator:
[1859](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/builder.py:1859) if max_shard_size is not None and writer._num_bytes > max_shard_size:
File c:\Users\Me\.conda\envs\ia\lib\site-packages\datasets\packaged_modules\parquet\parquet.py:59, in Parquet._generate_tables(self, files)
[58](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/packaged_modules/parquet/parquet.py:58) def _generate_tables(self, files):
---> [59](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/packaged_modules/parquet/parquet.py:59) schema = self.config.features.arrow_schema if self.config.features is not None else None
[60](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/packaged_modules/parquet/parquet.py:60) if self.config.features is not None and self.config.columns is not None:
AttributeError: 'list' object has no attribute 'arrow_schema'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
File c:\Users\Me\.conda\envs\ia\lib\site-packages\datasets\builder.py:1882, in ArrowBasedBuilder._prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, job_id)
[1881](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/builder.py:1881) num_shards = shard_id + 1
-> [1882](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/builder.py:1882) num_examples, num_bytes = writer.finalize()
[1883](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/builder.py:1883) writer.close()
File c:\Users\Me\.conda\envs\ia\lib\site-packages\datasets\arrow_writer.py:584, in ArrowWriter.finalize(self, close_stream)
[583](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/arrow_writer.py:583) # If schema is known, infer features even if no examples were written
--> [584](file:///C:/Users/Me/.conda/envs/ia/lib/site-packages/datasets/arrow_writer.py:584) if self.pa_writer is None and self.schema:
...
--> [627](file:///C:/Users/Me/.conda/envs/ia/lib/shutil.py:627) os.unlink(fullname)
[628](file:///C:/Users/Me/.conda/envs/ia/lib/shutil.py:628) except OSError:
[629](file:///C:/Users/Me/.conda/envs/ia/lib/shutil.py:629) onerror(os.unlink, fullname, sys.exc_info())
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:/Users/Me/.cache/huggingface/datasets/Helsinki-NLP___parquet/ca-de-a39f1ef185b9b73b/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec.incomplete\\parquet-train-00000-00000-of-NNNNN.arrow'
Steps to reproduce the bug
Steps to reproduce:
Just execute these lines
from datasets import load_dataset, Dataset
dataset = load_dataset("Helsinki-NLP/opus_books", "en-fr", features=["id", "translation"])
Expected behavior
I expect the dataset to be loaded without any errors.
Environment info
Package | Version |
---|---|
transformers | 4.37.2 |
python | 3.9.19 |
pytorch | 2.3.0 |
datasets | 2.12.0 |
arrow | 1.2.3 |
I am using Conda on Windows 11.