y-scope / clp

Compressed Log Processor (CLP) is a free log management tool capable of compressing text logs and searching the compressed logs without decompression.

Home Page:https://yscope.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

clp-s is truncating the json bytes while compression

satya256 opened this issue · comments

Bug

I am trying to compress the json input file which contains the larger json (may be larger than 1MB )per line and the total file size is around 100 MB and outputting the following error lines

[error] Truncated JSON (323 bytes) at end of file

CLP version

a7368cf

Environment

Ubuntu 22.04

Reproduction steps

Input the json file with more than 30MB in size and also keep larger json lines which are more than 1MB

If I change the buff size to say 300 MB from 1 MB the above issue is not observed

https://github.com/y-scope/clp/blob/main/components/core/src/clp_s/JsonFileIterator.hpp#L25

Hi @satya256 thanks for the report. We're looking into this and putting together some changes to make the JSON parser a bit more robust.

I just have some clarifying questions that should help us narrow down the specific issue you've run into.

  1. Does your JSON log data contain UTF-8 characters?
  2. Is your JSON log data new-line delimited or delimited another way, and do JSON records ever contain a newline in the middle?

hi @gibber9809. To answer your questions on behalf of @satya256,

  1. Yes all characters in the JSONs are UTF-8
  2. Yes our JSON data is new line delimited and we do have newlines in the middle which are escaped

Hey @bb-rajakarthik and @satya256, we merged #310 which significantly improves error handling and error reporting during compression. The issue you ran into should be fixed, but please let us know if you're still encountering any issues with compression.