googleapis / java-bigquerystorage

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Big Query - Conversion for repeated fields of BigDecimal and Array[Byte] are handled differently by the JSONStreamWriter

agrawal-siddharth opened this issue · comments

Environment details

  1. Specify the API at the beginning of the title. For example, "BigQuery: ...").
    General, Core, and Other are also allowed as types
  2. OS type and version: Linux
  3. Java version: 1.8
  4. version(s): 2.31.0

Steps to reproduce

Conversion for repeated fields of BigDecimal and Array[Byte] are handled differently by the JSONStreamWriter. Repeated fields should use the same serialization path as non-repeated.

Code example

Table schema is:

0 = name: "a", type: NUMERIC, mode: REQUIRED
1 = name: "b", type: BYTES, mode: REQUIRED
2 = name: "c", type: BYTES, mode: REQUIRED
3 = name: "aa", type: NUMERIC, mode: REPEATED
4 = name: "bb", type: BYTES, mode: REPEATED
5 = name: "cc", type: BYTES, mode: REPEATED

Sample Scala code is:

case class SerializationIssuesRow(
a: BigDecimal,
b: Array[Byte],
c: ByteString,
aa: List[BigDecimal],
bb: List[Array[Byte]],
cc: List[ByteString],
)

val streamName = s"projects/${tableId.getProject}/datasets/${tableId.getDataset}/tables/${tableId.getTable}/streams/_default"
val schema: TableSchema = serializationIssuesFmt.schema.toStorageSchema
val bigqueryWriteClient: BigQueryWriteClient = ctx.bqw
val streamWriter: JsonStreamWriter = JsonStreamWriter.newBuilder(streamName, schema, bigqueryWriteClient,).build()

val data = SerializationIssuesRow(
BigDecimal(1.1),
"b".getBytes(StandardCharsets.UTF_8),
ByteString.copyFrom("c", StandardCharsets.UTF_8),
List(BigDecimal(1.1), BigDecimal(2.2)),
List("bb1".getBytes(StandardCharsets.UTF_8), "bb2".getBytes(StandardCharsets.UTF_8)),
List(ByteString.copyFrom("cc1", StandardCharsets.UTF_8), ByteString.copyFrom("cc2", StandardCharsets.UTF_8)),
)

// SUCCEEDS
val jsonArrayDifferentInnerSerialization = new JSONArray()
jsonArrayDifferentInnerSerialization.put({
val obj = new JSONObject()
obj.put("a",
bigquery.storage.v1.BigDecimalByteStringEncoder.encodeToNumericByteString(data.a.bigDecimal))
obj.put("b", ByteString.copyFrom(data.b))
obj.put("c", data.c)

obj.put("aa", {
val arr = new JSONArray()
data.aa.foreach { a =>
arr.put(a.bigDecimal.toString)
}
arr
})

obj.put("bb", {
val arr = new JSONArray()
data.bb.foreach { b =>
arr.put(b)
}
arr
})

obj.put("cc", {
val arr = new JSONArray()
data.cc.foreach { c =>
arr.put(c.toByteArray)
}
arr
})

obj
})
streamWriter.append(jsonArrayDifferentInnerSerialization)

// FAILS
val jsonArraySameSerialization = new JSONArray()
jsonArraySameSerialization.put({
val obj = new JSONObject()

obj.put("a",
bigquery.storage.v1.BigDecimalByteStringEncoder.encodeToNumericByteString(data.a.bigDecimal))
obj.put("b", ByteString.copyFrom(data.b))
obj.put("c", data.c)

obj.put("aa", {
val arr = new JSONArray()
data.aa.foreach { a =>
arr.put(bigquery.storage.v1.BigDecimalByteStringEncoder.encodeToNumericByteString(a.bigDecimal))
}
arr
})

obj.put("bb", {
val arr = new JSONArray()
data.bb.foreach { b =>
arr.put(ByteString.copyFrom(b))
}
arr
})

obj.put("cc", {
val arr = new JSONArray()
data.cc.foreach { c =>
arr.put(c)
}
arr
})

obj
})
streamWriter.append(jsonArraySameSerialization)

Notice that the SUCCEEDS example uses different serialization methods for the base types and nested inner types.