ansible-collections / amazon.aws

Ansible Collection for Amazon AWS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`amazon.aws.s3_object` copy mode repeatedly updates objects that were uploaded in multiple parts

colin-nolan opened this issue · comments

Summary

Using amazon.aws.s3_object copy mode with objects that were uploaded in multiple parts (e.g. as happens with uploads via the web UI) results in the objects being copied every time the module is used - including when the corresponding objects exists in the source bucket with the same content.

I suspect that the issue is due to how the Etag is generated for the source vs how it gets generated for the copy.

Issue Type

Bug Report

Component Name

s3_object

Ansible Version

$ ansible --version
ansible [core 2.16.4]
  config file = <redacted>
  configured module search path = <redacted>
  ansible python module location = <redacted>
  ansible collection location = <redacted>
  executable location = <redacted>
  python version = 3.12.1 (main, Mar  5 2024, 15:57:44) [Clang 15.0.0 (clang-1500.1.0.2.5)] (<redacted>)
  jinja version = 3.1.3
  libyaml = True

Collection Versions

$ ansible-galaxy collection list

# <redacted>/ansible_collections
Collection                               Version
---------------------------------------- -------
amazon.aws                               7.4.0  

# <redacted>/ansible_collections
Collection                               Version
---------------------------------------- -------
amazon.aws                               7.3.0  
ansible.netcommon                        5.3.0  
ansible.posix                            1.5.4  
ansible.utils                            2.12.0 
ansible.windows                          2.2.0  
arista.eos                               6.2.2  
awx.awx                                  23.8.1 
azure.azcollection                       1.19.0 
check_point.mgmt                         5.2.2  
chocolatey.chocolatey                    1.5.1  
cisco.aci                                2.8.0  
cisco.asa                                4.0.3  
cisco.dnac                               6.11.0 
cisco.intersight                         2.0.7  
cisco.ios                                5.3.0  
cisco.iosxr                              6.1.1  
cisco.ise                                2.7.0  
cisco.meraki                             2.17.2 
cisco.mso                                2.5.0  
cisco.nxos                               5.3.0  
cisco.ucs                                1.10.0 
cloud.common                             2.1.4  
cloudscale_ch.cloud                      2.3.1  
community.aws                            7.1.0  
community.azure                          2.0.0  
community.ciscosmb                       1.0.7  
community.crypto                         2.18.0 
community.digitalocean                   1.26.0 
community.dns                            2.8.1  
community.docker                         3.8.0  
community.general                        8.4.0  
community.grafana                        1.8.0  
community.hashi_vault                    6.1.0  
community.hrobot                         1.9.0  
community.library_inventory_filtering_v1 1.0.0  
community.libvirt                        1.3.0  
community.mongodb                        1.7.1  
community.mysql                          3.9.0  
community.network                        5.0.2  
community.okd                            2.3.0  
community.postgresql                     3.4.0  
community.proxysql                       1.5.1  
community.rabbitmq                       1.2.3  
community.routeros                       2.13.0 
community.sap                            2.0.0  
community.sap_libs                       1.4.2  
community.sops                           1.6.7  
community.vmware                         4.2.0  
community.windows                        2.1.0  
community.zabbix                         2.3.1  
containers.podman                        1.12.0 
cyberark.conjur                          1.2.2  
cyberark.pas                             1.0.25 
dellemc.enterprise_sonic                 2.4.0  
dellemc.openmanage                       8.7.0  
dellemc.powerflex                        2.1.0  
dellemc.unity                            1.7.1  
f5networks.f5_modules                    1.28.0 
fortinet.fortimanager                    2.4.0  
fortinet.fortios                         2.3.5  
frr.frr                                  2.0.2  
gluster.gluster                          1.0.2  
google.cloud                             1.3.0  
grafana.grafana                          2.2.5  
hetzner.hcloud                           2.5.0  
hpe.nimble                               1.1.4  
ibm.qradar                               2.1.0  
ibm.spectrum_virtualize                  2.0.0  
ibm.storage_virtualize                   2.2.0  
infinidat.infinibox                      1.4.3  
infoblox.nios_modules                    1.6.1  
inspur.ispim                             2.2.0  
inspur.sm                                2.3.0  
junipernetworks.junos                    5.3.1  
kubernetes.core                          2.4.1  
lowlydba.sqlserver                       2.3.1  
microsoft.ad                             1.4.1  
netapp.aws                               21.7.1 
netapp.azure                             21.10.1
netapp.cloudmanager                      21.22.1
netapp.elementsw                         21.7.0 
netapp.ontap                             22.10.0
netapp.storagegrid                       21.12.0
netapp.um_info                           21.8.1 
netapp_eseries.santricity                1.4.0  
netbox.netbox                            3.17.0 
ngine_io.cloudstack                      2.3.0  
ngine_io.exoscale                        1.1.0  
openstack.cloud                          2.2.0  
openvswitch.openvswitch                  2.1.1  
ovirt.ovirt                              3.2.0  
purestorage.flasharray                   1.26.0 
purestorage.flashblade                   1.15.0 
purestorage.fusion                       1.6.1  
sensu.sensu_go                           1.14.0 
splunk.es                                2.1.2  
t_systems_mms.icinga_director            2.0.1  
telekom_mms.icinga_director              1.35.0 
theforeman.foreman                       3.15.0 
vmware.vmware_rest                       2.3.1  
vultr.cloud                              1.12.1 
vyos.vyos                                4.1.0  
wti.remote                               1.0.5  

AWS SDK versions

$ pip show boto boto3 botocore
WARNING: Package(s) not found: boto
Name: boto3
Version: 1.34.59
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: <redacted>/.venv/lib/python3.12/site-packages
Requires: botocore, jmespath, s3transfer
Required-by: 
---
Name: botocore
Version: 1.34.59
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: <redacted>/.venv/lib/python3.12/site-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: boto3, s3transfer

Configuration

$ ansible-config dump --only-changed
CONFIG_FILE() = <redacted>/ansible.cfg
DEFAULT_INVENTORY_PLUGIN_PATH(<redacted>/ansible.cfg) = ['<redacted>/ansible/plugins/inventory']
DUPLICATE_YAML_DICT_KEY(<redacted>/ansible/ansible.cfg) = ignore
INVENTORY_IGNORE_EXTS(<redacted>/ansible/ansible.cfg) = ["{{(REJECT_EXTS + ('.orig'", '.cfg', "'.retry'))}}"]

OS / Environment

MacOS 14.3 (23D56)

Steps to Reproduce

  1. Create a file, where size(file) < 5GB (the copy limit), e.g. head -c 64MB < /dev/zero > 64MB-zero.bin
  2. Perform a multipart upload of the file to an S3 bucket. If you use the web UI, that appears to upload it in 16MB parts.
  3. Observe the Etag, e.g. for the 64MB zero file, I have 05c46bd967d2892191397a04e43821b9-4. According to Amazon:

Amazon S3 calculates the MD5 digest of each individual part. MD5 digests are used to determine the ETag for the final object. Amazon S3 concatenates the bytes for the MD5 digests together and then calculates the MD5 digest of these concatenated values. The final step in creating the ETag is when Amazon S3 adds a dash with the total number of parts to the end.

  1. Use amazon.aws.s3_object to copy the file to another bucket.
- name: copy file that was uploaded in parts
  amazon.aws.s3_object:
    bucket: target-bucket
    mode: copy
    copy_src:
      bucket: source-bucket
      prefix: 64MB-zero.bin
  1. Repeat the above and observe the task shows change each time (i.e. it's not idempotent). The timestamp on the target object is updated but the contents are not, suggesting a copy operation occurred needlessly.
  2. Observe the etag on the target bucket does not match that of the source, namely it is not in multi-part form, e.g. the zeros file has etag: e78585b8bfda6036cfd818710a210f23 (MD5 of 64MB of zeros).

Expected Results

Module is idempotent, and does not repeatidly copy identical files.

Actual Results

Copy operation performed on files uploaded in multipart, regardless of their state in the target bucket.

Code of Conduct

  • I agree to follow the Ansible Code of Conduct

Thank you for taking the time to work on this @abikouo.

@colin-nolan could you please give a try using #2024? Thanks