piskvorky / smart_open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow specifying metadata for azure uploads

ddelange opened this issue · comments

Problem description

I would like to upload an Azure blob with metadata. This is currently possible with s3 and gcs:

metadata = {"name": "bla"}
bucket = "test"
key = "test"
open(
    f"s3://{bucket}/{key}",
    "wb",
    transport_params={
        "client_kwargs": {"S3.Client.create_multipart_upload": {"Metadata": metadata},
    },
)
open(
    f"gs://{bucket}/{key}",
    "wb",
    transport_params={
        "blob_properties": {"metadata": metadata},
    },
)

For Azure, smart_open only has the possibility of passing a custom BlobClient but it looks like metadata needs to be explicitly passed to BlobClient.commit_block_list.

I suggest introducing blob_kwargs:

open(
    f"azure://{container}/{path}",
    "wb",
    transport_params={
        "blob_kwargs": {"metadata": metadata},
    },
)

Steps/code to reproduce the problem

N/A

Versions

Please provide the output of:

>>> import platform, sys, smart_open
>>> print(platform.platform())
>>> print("Python", sys.version)
>>> print("smart_open", smart_open.__version__)
macOS-10.15.7-x86_64-i386-64bit
Python 3.8.10 (default, Sep  1 2021, 10:14:33)
[Clang 12.0.0 (clang-1200.0.32.29)]
smart_open 6.0.0

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software