loadJsonStrict goes into an (apparently) infinite loop at 100% CPU trying to parse a 28MB JSON file
jimwhitson opened this issue · comments
I'm calling the library like this:
λ> import HaskellWorks.Data.Json.LoadCursor
λ> import HaskellWorks.Data.Json.Value
λ> x = loadJsonStrict "AllSets.json"
λ> fmap length <$> x
Read file
Created cursor
After which it start using 100% CPU and doesn't appear to terminate. The JSON in question is:
It does the same thing with other large JSON files, but this is the only one I have a convenient link on the web for.
It appears to work correctly for a trivially small JSON file, I haven't tested progressively larger ones yet - I only have 20MB+ ones on hand.
I'm on version 0.6.0.0 installed via stack, using ghc 8.2.2.
Thanks,
Jim
Hi Jim,
Be careful when you use the repl
. If you run the repl
inside the hw-json
project, then the code will not be optimised. Particularly, none of the functions will be inlined, which kills performance.
Instead, create a new project and reference hw-json
as a library.
Alternatively, write your code in Main.hs
of the hw-json
library and run stack build
.
You should be able to drop the following code into Main.hs
for an example of how to query the document:
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Main where
import Data.Function
import HaskellWorks.Data.Json.LightJson
import HaskellWorks.Data.Json.Load (indexJson)
import HaskellWorks.Data.Json.LoadCursor
import HaskellWorks.Data.Micro
import HaskellWorks.Data.MQuery
import qualified Data.DList as DL
main :: IO ()
main = do
-- You only have to do this once. This creates index files in the same directory
indexJson "data/allsets.json"
-- Load the JSON file with index for speed
!cursor <- loadJsonWithCsPoppyIndex "data/allsets.json"
-- Special type that is similar to JSON Value, but does not force load everything
-- into memory
let !json = lightJsonAt cursor
-- Wrap in a query DSL
let q = MQuery (DL.singleton json)
putStrLn "How many top-level JSON values"
putPretty $ q & count
putStrLn "First ten fields"
putPretty $ q >>= entry & limit 10
putStrLn "Accessing field named 'UGL'"
putPretty $ q >>= entry >>= named "UGL" & limit 10
If there are no further issues, I will be closing this issue.