"Keep getting: open `': Change reported by S3 during open at position 0. ETag was unavailable" when reading from S3

Question

"Keep getting: open `': Change reported by S3 during open at position 0. ETag was unavailable" when reading from S3

iwb-vhuysmans opened this issue a year ago · comments

Hi,

I'm currently using following maven dependencies in my project:

       <dependency>
            <groupId>com.dimafeng</groupId>
            <artifactId>testcontainers-scala_2.12</artifactId>
            <version>0.40.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.dimafeng</groupId>
            <artifactId>testcontainers-scala-dynalite_2.12</artifactId>
            <version>0.40.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.dimafeng</groupId>
            <artifactId>testcontainers-scala-localstack_2.12</artifactId>
            <version>0.40.12</version>
            <scope>test</scope>
        </dependency>

I have some code where I setup AmazonS3 client using LocalStackContainer:

  override val container: LocalStackContainer = new LocalStackContainer(services = List(S3))
  implicit var client: AmazonS3 = null
  var sparkCsvReader: SparkCsvReader = null

  override protected def beforeAll(): Unit = {
    container.start()
    client = AmazonS3ClientBuilder
      .standard()
      .withEndpointConfiguration(
        new AwsClientBuilder.EndpointConfiguration(
          container.container.getEndpointOverride(S3).toString,
          container.container.getRegion
        )
      )
      .withCredentials(
        new AWSStaticCredentialsProvider(
          new BasicAWSCredentials(container.container.getAccessKey, container.container.getSecretKey)
        )
      )
      .build()

    ss.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", container.container.getEndpointOverride(S3).toString)
    ss.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", container.container.getAccessKey)
    ss.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", container.container.getSecretKey)
    ss.sparkContext.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
    ss.sparkContext.hadoopConfiguration.set("fs.s3.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")

    sparkCsvReader = new SparkCsvReader() // Class I would like to test

  }

When I use the client and create a bucket and upload two files everything works fine. But when I try to read them back from the S3 bucket, I keep getting following error:

open `s3a://bucket1/file1.csv': Change reported by S3 during open at position 0. ETag 079a45cc9a4cda24698dddf8f6263cdd was unavailable