lrhn / json-tool

Dart Package with experimental JSON tools.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

improve parsing of unknown source values

ekuleshov opened this issue · comments

Currently jsontool package supports two models of iterating through the source values:

  • expectX - expectNull, expectString, expectNum, etc - assumes that the current source value is of a specific type/kind (null, string, num, etc) and then either consumes it or throws an error
  • tryX - tryNull, tryString, tryNum, etc - checks that the current source value is of a specific type/kind and then consumes it, otherwise returns null and don't consume anything.

Unfortunately when json data is not strongly typed or even when it has a lot of nullable string values, you have to wrap each value retrieval with additional checks. E.g. to read nullable string values you have to do tryNull() first and then tryString(). That adds some significant overhead.

Some other json processing libraries (e.g. a popular Java's Jackson) provide API to get value in a desired data type regardless of json source value. So, it would be great if jsontool could provide some similar API. For example:

  • String? getString() - returns a null or a String value - even if json source has null or a non-quoted value, such as boolean, int or a double
  • num? getNum() - returns a null or a num value if json source value is an int, double or a string that can be parsed to an int or a double - num.tryParse()
  • int? getInt() - similar for int - int.tryParse()
  • double? getDouble() - similar for double - double.tryParse()

Then json readers could directly return values without a need for additional checks. For example, the getString() in JsonByteReader could do something like this:

  String? getString([List<String>? candidates]) {
    var next = _nextNonWhitespaceChar();
    if (next == $quot) {
      if (candidates != null) {
        return _tryCandidateString(candidates);
      }
      return _scanString();
    } else if (next == $n) {
      assert(_startsWith("ull", _source, _index + 1));
      _index += 4;
    }
    return _scanString();
  }

This sounds very use-case specific.
Parsing strings in some cases, but not on others, suggests that there is some underlying domain logic or framework logic.

It sounds like a job for extension methods, what are special-tailored to the problem is used for.
I'll leave that to someone who knows what they want.
It should be easy to implement on top of the existing API:

extension on JsonReader {
  int? asIntOrNull() {
    var number = tryNum();
    if (number != null) return number.toInt();
    var str = tryString();
    if (str != null) {
      number = ;
      return num.tryParse(str)?.toInt();
    }
    skipAnyValue();
    return null;
  }
}

The null checks are not domain specific, but quite common.
I already implemented such extensions, but multiple api calls add up the overhead and internally each call repeats similar checks. So, there is significant performance cost.

I hope that having these suggested high level api would allow to reduce that overhead.

The null check makes sense. Give me String or give me null, but don't accept anything else.
So something like:

String? expectStringOrNull();

That's doable, but also easy as an extension:

extension ExpectOrNull<T> on JsonReader<T> {
  String? expectStringOrNull() => tryString() ?? (expectNull() as Null);
}

(Which suggests that expectNull should return Null instead of void.)

The more complicate conversions would be something like:

extension TryAs<T> on JsonReader<T> {
  /// Convert any non-`null`, non-list and non-map value into its string representation.
  String? tryAsString() => (tryString() ?? tryNum() ?? tryBool())?.toString();
  /// The next value as an `int`, if possible.
  ///
  /// If the next value is an a number or a string which can be parsed as a number,
  /// it's converted to an `int`. Otherwise no value is consumed.
  int? tryAsInt() => tryAsNum()?.toInt();
  /// The next value as a `double`, if possible.
  ///
  /// If the next value is an a number or a string which can be parsed as a number,
  /// it's converted to a `double`. Otherwise no value is consumed.
  double? tryAsDouble() => tryAsNum()?.toDouble();
  /// The next value as a `num`, if possible.
  ///
  /// If the next value is an a number or a string which can be parsed as a number,
  /// that is the value. Otherwise no value is consumed.
  num? tryAsNum() => tryNum() ?? _tryStringConvert(num.tryParse);

  /// The next value a `bool`, if possible.
  ///
  /// If the next value is a `bool` that value is used.
  /// If the next value is a number, it's converted to a `bool` by making zero values
  /// and NaN false, and any other value true.
  /// If the next value is a string which can be parsed to a `bool` using [bool.tryParse],
  /// then that is the value.
  /// Otherwise no value is consumed.
  bool? tryAsBool() => tryBool() ?? _tryStringConvert(bool.tryParse) ?? 
      (switch (tryNum()) {null => null, == 0 || double(isNaN: true) => false, _ => true});

  R? _tryStringConvert<R extends Object>(R? Function(String) convert) {
    var s = copy().tryString();
    if (s != null) return null;
    var result = convert(s);
    if (result == null) return null;
    skipAnyValue();
    return result;
  }
}

(Not tested. Only tricky part is checking a string, without consuming it if it doesn't parse.)

For every one of those, except tryAsString, I had doubts about which values to allow and which to reject.
That's why I'm saying that this is a domain specific feature, the precise behavior depends on what the current project wants and how it treats its JSON.

Adding a specific method to this library, is likely to help some people, but just not be exactly what other people need. That's why it should be added on the side.

So, OrNull is possible, but so simple to write (as tryString() ?? expectNull(), at least if I make expectNull() have type Null), that I don't think a built-in is worth it.

The null check makes sense. Give me String or give me null, but don't accept anything else. So something like:

String? expectStringOrNull();

That's doable, but also easy as an extension:

extension ExpectOrNull<T> on JsonReader<T> {
  String? expectStringOrNull() => tryString() ?? (expectNull() as Null);
}

(Which suggests that expectNull should return Null instead of void.)

I did some anecdotal testing with the suggested expectStringOrNull() extension replacing all tryString() calls with it (e.g. the model allows nulls in most places) and it looks like it slows down parsing by about 7..10%.

My biggest struggle is that tryX methods don't consume any values and for nullable strings and numeric fields you have to do this extra work, such as adding own extension methods, which don't work at the same level as tryX or expectX methods.