Binary string support
Ankk98 opened this issue · comments
Data Insertion
- Why do we have this line?
- In case our string has data that it not valid UTF-8, this line throws error. Due to this we loose ability to work with binary data. Clickhouse DB supports binary data. So does all other parts of the system like ActiveRecord, HTTP(POST request).
ArgumentError: invalid byte sequence in UTF-8
from /home/user/.rvm/gems/ruby-2.7.4/gems/clickhouse-activerecord-0.5.7/lib/active_record/connection_adapters/clickhouse/schema_statements.rb:14:in `sub'
Data fetching
-
As a format to get back data, we use JSONCompact format. But this format only supports valid UTF-8 characters so any character that is invalid UTF-8 gets replaced with a placeholder.
-
Can we allow usage of JSONEachRow?
-
This format does not replace invalid UTF-8 chars and ensures data integrity.
-
To do this we will have to write a function to parse this data in insert into ActiveRecord::Result.
-
This function can be added in SchemaStatements.
-
In ruby binary strings are represented as strings with ASCII-8BIT encodings.
-
We can provide support for binary data similar to how MySql does. We can add a config of encoding.
-
If encoding is binary, we can use ASCII-8BIT encoding else we can use standard UTF-8 encodings.
-
If you want I can provide more links regarding these.
-
I can contribute also.
Is there any timetable for this problem to be fixed?
It wasn't in the plan. You can open PR.
Hi @PNixx, I can open the PR. Do you have any recommendations on where and what would probably need to be modified?