rurban / Cpanel-JSON-XS

Improved fork of JSON-XS

Home Page:http://search.cpan.org/dist/Cpanel-JSON-XS/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RFE: Don’t upgrade unless needed

FGasper opened this issue · comments

> perl -Mblib -MCpanel::JSON::XS -MDevel::Peek -e'Dump( Cpanel::JSON::XS->new()->allow_nonref()->decode( q<"é"> ) )'
SV = PV(0x7f9e5c00c480) at 0x7f9e5c81d3c8
  REFCNT = 1
  FLAGS = (TEMP,POK,pPOK,UTF8)
  PV = 0x7f9e5bc093f0 "\303\203\302\251"\0 [UTF8 "\x{c3}\x{a9}"]
  CUR = 4
  LEN = 10

The input string is downgraded, so the PV is only 2 bytes long. The output string is the same logical Perl string, but it takes 4 bytes internally.

The upgrade here seems inefficient. If utf8(1) isn’t set, the decoder should avoid the extra work of upgrading the input string.

This is nontrivial because of stuff like this (from the test suite):

perl -MCpanel::JSON::XS -MDevel::Peek -e'Dump( Cpanel::JSON::XS->new()->allow_nonref()->decode( qq<"\\u0012\x{89}\\u0abc"> ) )'

The decoder needs to store \x89 upgraded because of the succeeding \u0abc, but it can’t know that when it first hits \x89.

This kind of thing is rare enough that it might make sense to rig up a “retry” mechanism that upgrades the SV then restarts the parse. Or, it might not.

I’ll leave this open in case others want to comment.

I think this decision is done by perl5, treating all chars 128-255 in this manner.

@rurban:

> perl -MDevel::Peek -e'Dump "\x80"'
SV = PV(0x7fe11e00c680) at 0x7fe11e028c60
  REFCNT = 1
  FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK)
  PV = 0x7fe11dc0e9d0 "\200"\0
  CUR = 1
  LEN = 10
  COW_REFCNT = 0

Chars 128-255 can be stored either upgraded or downgraded.

This will actually be more feasible if #190 is addressed. Once that happens, if utf8() is off then it’ll always be safe to leave the input PV downgraded.