halaxa / json-machine

Efficient, easy-to-use, and fast PHP JSON stream parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not decoding JSON

JohnnyWalkerDigital opened this issue · comments

I guess this sounds strange, but I need to process JSON files of GB in size... and I don't want to decode the nodes to arrays or objects. I just want the JSON!

Obviously I could json_encode the array that's produced, but with millions of transactions it's worrying to put them through an unnecessary step where an error could be introduced in the decode/encode process.

(The background is this: I have millions of user transactions to feed into a webhook. The webhook is expecting JSON formatted exactly in the way the nodes in the JSON blob are formatted. I just need to take each node, feed it to the webhook, check the response, and move onto the next one.)

Any options here?

Quick and dirty? Fork it, go to https://github.com/halaxa/json-machine/blob/master/src/Parser.php#L177 and get rid of the json_decode call. You will be getting pure value then.

On the other hand, if you json_encode it as you suggest, minor overhead might be there, but you don't have to maintain your fork of JSON Machine. Also bear in mind that tests will render useless.

Thanks halaxa, that makes sense!

Also bear in mind that tests will render useless.

What do you mean by this?

Also, stupid question I'm sure, but why don't you change the behaviour of json_decode on that line, rather than defaulting to arrays? Wouldn't it be simpler to allow the user to pass in a value and you change that line to $value = json_decode($jsonBuffer, $userValue);? Obviously I've not looked at the rest of the code, and I'm sure there's a great reason for it.

What do you mean by this?

That belongs probably to the previous paragraph. I mean that when you fork it and decide to manage your own fork, almost all the current tests will break unless you fix them. But that's obvious of course.

Also, stupid question I'm sure, but why don't you change the behaviour of json_decode on that line, rather than defaulting to arrays?

Not stupid question at all. It's been there since the dawn of this library and now it needs to be thought through very well not to make huge BC break and make it easy for user to specify. It also relates to what I wrote earlier, that I have a swappable decoder in mind. So should I add another argument to the API (bool $assoc) or make swappable decoder? How to implement it to make minor BC break and keep friendly API or better not to change current API at all? Those are the things I think about. It is not unsolvable I know. Do you have an idea?

Ah, yes. I didn't think about BC, or what other future plans you might have. Hmm. I suppose I lean towards breaking BC in favour of a cleaner API. Just my personal taste. I suppose if you make it a major version number then people can easily specify the older API requirement in their composer.json if they need to?

Resolved with custom decoders on master. Can you check it? There is also PassThruDecoder which might do what you need. There may be some additional tweaks before version 0.4.0 comes out in short time.

Exactly the feature I need, and the PassThruDecoder seems to work fine.

I consume an API which responds with ~12 MB of json data. I deserialize the chunks with Symfony serializer and use yield to process the objects. Memory usage is about 36 MB (instead of 162 MB) and it‘s even faster (~1:30 min faster).

Do you know the release date for 0.4.0 yet? 😇

Resolved in 0.4.0