Note: I was curious to answer this question: "please articulate how you would write a simple json parser in java that reads a json string and transform into collections of java objects"
There are two stages in parsing json:
- lexical analysis (breaking down input string into tokens)
- recognize certain character such as:
{
,}
,[
,]
,,
- look for json string (surrounded by
"
) - for example:
- input string:
{"hello": ["world"]}
- consists of the following tokens:
{
,hello
,:
,[
,world
,]
,}
- note: token must be non-recursive
- input string:
- recognize certain character such as:
- syntactic analysis
- match groups of tokens according to the language grammar (see: http://json.org/)
- to parse a json (see: parse)
- check the first token
- if first token is
{
, parse for object - if first token is
[
, parse for array - otherwise return first token, and the remaining tokens
- to parse array (see: parseArray)
- initialize resulting-array
- call parse on each element
- add to resulting-array
- look for comma
- repeat for remaining until it sees
]
- to parse object (see: parseObject)
- initialize resulting-map
- look for first token (the key of the pair)
- look for
:
- parse the value and set value on resulting-map
- look for
,
- repeat for remaining key-value pairs until it sees
}
- note: many implementations do a single-pass
How to use this library:
import com.dvliman.sjson.JsonLexer;
import com.dvliman.sjson.JsonParser;
String json = "{\"hello\": \"world\"}";
Object result = JsonParser.parseJson(JsonLexer.tokens(json));
HashMap<String, Object> map = (HashMap) result;
System.out.println(result.get("hello")); // => "world"
See more examples on the test case:
- testLexer: parse tokens from input string
- testEmptyJson: parse empty json
- testJsonArray: parse top level json array
- testJsonObject: parse top level json object
- testJsonObjectArray: parse json array values
- testNestedJson: parse nested json object/array
- testInvalidJson: expect colon in json pair
- testInvalidJsonObjectKey: expect json field to be a string
note: run
mvn test
to run all the tests
-
TODO:
- handle unicode characters
- handle escape characters
- parse number with precision
- use
Reader
orPushBackReader
for streaming characters - define type container for
JsonArray
,JsonObject
,JsonValue
, primitives and so on - handle edge cases (see: Parsing JSON is a minefield)
- handle top-level scalars - RFC 7158
-
links:
- stleary/JSON-java: reference implementation in java
- FasterXML/jackson: robust, supports many features
- ralfstx/minimal-json: very simple implementation with buffer and reader, as fast as Jackson with much less featuresets
- clojure/data.json: clojure implementation with
PushBackReader
allows you to 'un-see' character stream - fangyidong/json-simple: generates yylex using parse generator (JFlex)