minio / minio

The Object Store for AI Data Infrastructure

Home Page:https://min.io/download

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

feat: Completing Multipart Uploads with Checksum That Contains Number of Parts Fails

jakub300 opened this issue · comments

Expected Behavior

When completing multipart upload checksum in format {checksum}-{numberOfParts} should be supported.

Current Behavior

When completing multipart upload checksum in format {checksum}-{numberOfParts}, InvalidArgument: Invalid arguments provided for checksum-bug/minio-checksum-bug/2024-05-04T17:52:34.793Z.mp4: (invalid/unknown checksum sent: invalid checksum) error is returned.

Various scenarios:

  1. both s3 and minio allow completing without checksum
  2. both s3 and minio fail on invalid checkum
  3. both s3 and minio allow completing with checksum that does not contain number of parts, eg. ejt070PMd2pF/mFNTgP6LDtAo2px3L+i/l91VTgvRSc=.
  4. only s3 allows completing when checkusm with number of parts as a suffix is provided, eg. ejt070PMd2pF/mFNTgP6LDtAo2px3L+i/l91VTgvRSc=-2.

Possible Solution

Fix somewhere in erasure-multipart.go.

Steps to Reproduce (for bugs)

  1. Create multipart upload with ChecksumAlgorithm: "SHA256" (likely all other algorithms are also impacted)
  2. Upload parts with checksum
  3. Complete multipart upload with checksum combined with number of parts
Sample code (JS)
// tested with node@20.12.2, minio version RELEASE.2024-05-01T01-11-10Z
// save as .mjs file
// set flowing env variables before starting or here:
// process.env.S3_BUCKET_NAME = "";
// process.env.AWS_REGION = ""; // only for s3
// process.env.AWS_ACCESS_KEY_ID = "";
// process.env.AWS_SECRET_ACCESS_KEY = "";
// process.env.AWS_ENDPOINT_URL = ""; // only for minio

import {
  CompleteMultipartUploadCommand,
  CreateMultipartUploadCommand,
  ListPartsCommand,
  S3Client,
  UploadPartCommand,
} from "@aws-sdk/client-s3"; // tested with version 3.569.0
import crypto from "node:crypto";

function getChecksum(buffer) {
  const hash = crypto.createHash("sha256");
  hash.update(new Uint8Array(buffer));
  return hash.digest();
}

const video = await fetch(
  "https://github.com/minio/minio/assets/610941/2653d680-2c87-42ea-98d0-8feab199d3ef"
).then((response) => response.arrayBuffer());

const s3 = new S3Client({
  forcePathStyle: !!process.env.AWS_ENDPOINT_URL,
});

const MB_5 = 5 * 1024 * 1024;
const KEY = `minio-checksum-bug/${new Date().toISOString()}.mp4`;
const PARTS = Math.ceil(video.byteLength / MB_5);

console.log({ KEY, PARTS });

const multipartUpload = await s3.send(
  new CreateMultipartUploadCommand({
    Bucket: process.env.S3_BUCKET_NAME,
    Key: KEY,
    ChecksumAlgorithm: "SHA256",
  })
);

const uploadId = multipartUpload.UploadId;

const checksums = [];

for (let i = 0; i < PARTS; i++) {
  const start = i * MB_5;
  const end = Math.min((i + 1) * MB_5, video.byteLength);
  const body = video.slice(start, end);
  const checksum = getChecksum(body);
  const checksumBase64 = checksum.toString("base64");
  checksums.push(checksum);
  console.log({ i, checksumBase64 });

  await s3.send(
    new UploadPartCommand({
      Bucket: process.env.S3_BUCKET_NAME,
      Key: KEY,
      UploadId: uploadId,
      PartNumber: i + 1,
      Body: body,
      ChecksumSHA256: checksum.toString("base64"),
    })
  );
}

const partsList = await s3.send(
  new ListPartsCommand({
    Bucket: process.env.S3_BUCKET_NAME,
    Key: KEY,
    UploadId: uploadId,
  })
);

const checksumTotal = getChecksum(Buffer.concat(checksums));
const checksumTotalBase64 = checksumTotal.toString("base64");
console.log({ checksumTotalBase64 });

await s3.send(
  new CompleteMultipartUploadCommand({
    Bucket: process.env.S3_BUCKET_NAME,
    Key: KEY,
    UploadId: uploadId,
    MultipartUpload: {
      Parts: partsList.Parts.map((part) => ({
        PartNumber: part.PartNumber,
        ETag: part.ETag,
        ChecksumSHA256: part.ChecksumSHA256,
      })),
    },
    ChecksumSHA256: "a" + checksumTotalBase64, // GOOD does not work in both s3 and minio
    ChecksumSHA256: checksumTotalBase64, // GOOD works in both s3 and minio
    ChecksumSHA256: `${checksumTotalBase64}-${PARTS}`, // BAD works in s3, does not work in minio
  })
);
Test file storage
bbb_sunflower_1080p_30fps_normal_cut_small.mp4

Context

Not a big deal for us at this point, we can remove part numbers from the checksum, but reporting this as a it is compatibility issue.

Regression

Probably not.

Your Environment

Testing locally on Windows WSL, not really relevant for this issue.

minio version RELEASE.2024-05-01T01-11-10Z (commit-id=7926401cbd5cceaacd9509f2e50e1f7d636c2eb8)
Runtime: go1.21.9 linux/amd64
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Copyright: 2015-2024 MinIO, Inc.
Linux KUBA-PC 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Yeah. Seems like S3 added the part count later.

#19680 added.