circles-learning-labs / ecto_adapters_dynamodb

DynamoDB adapter for Elixir's Ecto Database layer.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

** (MatchError) when you try to insert binary field

vshev4enko opened this issue · comments

Prerequisites:

  1. table with a binary field
  2. appropriate schema with binary or UUID field
  3. :info should be in log_levels config list or default(because error raises in logging place)

Steps:

  1. call Repo.insert/1

Actual result:

** (MatchError) no match of right hand side value: {:error, %Jason.EncodeError{message: "invalid byte 0xA2 in <<1, 162, 7, 200, 253, 74, 75, 218, 145, 239, 108, 158, 187, 104, 245, 96>>"}}
    (ecto_adapters_dynamodb) lib/ecto_adapters_dynamodb.ex:1189: Ecto.Adapters.DynamoDB.ecto_dynamo_log/4
    (ecto_adapters_dynamodb) lib/ecto_adapters_dynamodb.ex:533: Ecto.Adapters.DynamoDB.insert/6
    (ecto) lib/ecto/repo/schema.ex:661: Ecto.Repo.Schema.apply/4
    (ecto) lib/ecto/repo/schema.ex:263: anonymous fn/15 in Ecto.Repo.Schema.do_insert/4

Expected result:
{:ok, struct}

I'm using the following config as a workaround:

config :ecto_adapters_dynamodb,
  log_levels: []

Hi viacheslavshevchenko,

Sorry we are so busy here...do you know what is causing the issue? We would gladly review a PR request if the fix seems reasonable for you to implement.

@ViacheslavShevchenko could you provide an example of the data that you are trying to insert?

Ah, I think I see the issue here - JSON encoding tools like Poison and Jason can't encode raw binary -

iex> raw = <<1, 162, 7, 200, 253, 74, 75, 218, 145, 239, 108, 158, 187, 104, 245, 96>>
iex> Jason.encode(raw)
{:error,
 %Jason.EncodeError{
   message: "invalid byte 0xA2 in <<1, 162, 7, 200, 253, 74, 75, 218, 145, 239, 108, 158, 187, 104, 245, 96>>"
 }}
iex> Poison.encode(raw)
** (FunctionClauseError) no function clause matching in Poison.Encoder.BitString.chunk_size/3
# more Poison error info...

So one possible fix would be to convert the binary to a string prior to encoding -

iex> raw = <<1, 162, 7, 200, 253, 74, 75, 218, 145, 239, 108, 158, 187, 104, 245, 96>>
iex> str = Enum.join(for <<c::utf8 <- raw>>, do: <<c::utf8>>)
iex> Jason.encode(str)
{:ok, "\"\\u0001\""}
iex> Poison.encode(str)
{:ok, "\"\\u0001\""}

This example assumes that UTF-8 is the proper encoding, obviously that may not always be the case.

It seems like it'd be easy enough for us to detect and encode binaries to strings prior to the Jason.encode() step, but I'm not familiar with best practices in this realm. I'd be interested in hearing from @bernardd, @alhambra1, and @ViacheslavShevchenko about their thoughts on the best way to proceed here.

Don't overthink it :) The issue is that we're creating the message to be stuffed into json using code of the form:

"Data: #{data}"

When what we really want is

"Data: #{inspect data}".

The former jams a binary straight into what was previously a legitimate string (creating a non-string binary which, as you note, Jason/Poison can't encode), the latter stringifies the binary first.

I've pushed a PR to fix this.

Hey. Sorry guys didn't see any notification from github. The "Data: #{inspect data}" solution is Great. I just turned off the logging to overcome this issue and it works well. Sorry didn't prepare any PR also have a lot of work.

Ok, this patch has been released as version 2.0.2 - thanks all!