Big Query - Conversion for repeated fields of BigDecimal and Array[Byte] are handled differently by the JSONStreamWriter
agrawal-siddharth opened this issue · comments
Environment details
- Specify the API at the beginning of the title. For example, "BigQuery: ...").
General, Core, and Other are also allowed as types - OS type and version: Linux
- Java version: 1.8
- version(s): 2.31.0
Steps to reproduce
Conversion for repeated fields of BigDecimal and Array[Byte] are handled differently by the JSONStreamWriter. Repeated fields should use the same serialization path as non-repeated.
Code example
Table schema is:
0 = name: "a", type: NUMERIC, mode: REQUIRED
1 = name: "b", type: BYTES, mode: REQUIRED
2 = name: "c", type: BYTES, mode: REQUIRED
3 = name: "aa", type: NUMERIC, mode: REPEATED
4 = name: "bb", type: BYTES, mode: REPEATED
5 = name: "cc", type: BYTES, mode: REPEATED
Sample Scala code is:
case class SerializationIssuesRow(
a: BigDecimal,
b: Array[Byte],
c: ByteString,
aa: List[BigDecimal],
bb: List[Array[Byte]],
cc: List[ByteString],
)
val streamName = s"projects/${tableId.getProject}/datasets/${tableId.getDataset}/tables/${tableId.getTable}/streams/_default"
val schema: TableSchema = serializationIssuesFmt.schema.toStorageSchema
val bigqueryWriteClient: BigQueryWriteClient = ctx.bqw
val streamWriter: JsonStreamWriter = JsonStreamWriter.newBuilder(streamName, schema, bigqueryWriteClient,).build()
val data = SerializationIssuesRow(
BigDecimal(1.1),
"b".getBytes(StandardCharsets.UTF_8),
ByteString.copyFrom("c", StandardCharsets.UTF_8),
List(BigDecimal(1.1), BigDecimal(2.2)),
List("bb1".getBytes(StandardCharsets.UTF_8), "bb2".getBytes(StandardCharsets.UTF_8)),
List(ByteString.copyFrom("cc1", StandardCharsets.UTF_8), ByteString.copyFrom("cc2", StandardCharsets.UTF_8)),
)
// SUCCEEDS
val jsonArrayDifferentInnerSerialization = new JSONArray()
jsonArrayDifferentInnerSerialization.put({
val obj = new JSONObject()
obj.put("a",
bigquery.storage.v1.BigDecimalByteStringEncoder.encodeToNumericByteString(data.a.bigDecimal))
obj.put("b", ByteString.copyFrom(data.b))
obj.put("c", data.c)
obj.put("aa", {
val arr = new JSONArray()
data.aa.foreach { a =>
arr.put(a.bigDecimal.toString)
}
arr
})
obj.put("bb", {
val arr = new JSONArray()
data.bb.foreach { b =>
arr.put(b)
}
arr
})
obj.put("cc", {
val arr = new JSONArray()
data.cc.foreach { c =>
arr.put(c.toByteArray)
}
arr
})
obj
})
streamWriter.append(jsonArrayDifferentInnerSerialization)
// FAILS
val jsonArraySameSerialization = new JSONArray()
jsonArraySameSerialization.put({
val obj = new JSONObject()
obj.put("a",
bigquery.storage.v1.BigDecimalByteStringEncoder.encodeToNumericByteString(data.a.bigDecimal))
obj.put("b", ByteString.copyFrom(data.b))
obj.put("c", data.c)
obj.put("aa", {
val arr = new JSONArray()
data.aa.foreach { a =>
arr.put(bigquery.storage.v1.BigDecimalByteStringEncoder.encodeToNumericByteString(a.bigDecimal))
}
arr
})
obj.put("bb", {
val arr = new JSONArray()
data.bb.foreach { b =>
arr.put(ByteString.copyFrom(b))
}
arr
})
obj.put("cc", {
val arr = new JSONArray()
data.cc.foreach { c =>
arr.put(c)
}
arr
})
obj
})
streamWriter.append(jsonArraySameSerialization)
Notice that the SUCCEEDS example uses different serialization methods for the base types and nested inner types.