haskell-works / hw-json

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

loadJsonStrict goes into an (apparently) infinite loop at 100% CPU trying to parse a 28MB JSON file

jimwhitson opened this issue · comments

I'm calling the library like this:

λ> import HaskellWorks.Data.Json.LoadCursor
λ> import HaskellWorks.Data.Json.Value
λ> x = loadJsonStrict "AllSets.json"
λ> fmap length <$> x
Read file
Created cursor

After which it start using 100% CPU and doesn't appear to terminate. The JSON in question is:


It does the same thing with other large JSON files, but this is the only one I have a convenient link on the web for.

It appears to work correctly for a trivially small JSON file, I haven't tested progressively larger ones yet - I only have 20MB+ ones on hand.

I'm on version installed via stack, using ghc 8.2.2.


Hi Jim,

Be careful when you use the repl. If you run the repl inside the hw-json project, then the code will not be optimised. Particularly, none of the functions will be inlined, which kills performance.

Instead, create a new project and reference hw-json as a library.

Alternatively, write your code in Main.hs of the hw-json library and run stack build.

You should be able to drop the following code into Main.hs for an example of how to query the document:

{-# LANGUAGE BangPatterns        #-}
{-# LANGUAGE OverloadedStrings   #-}
{-# LANGUAGE ScopedTypeVariables #-}

module Main where

import Data.Function
import HaskellWorks.Data.Json.LightJson
import HaskellWorks.Data.Json.Load       (indexJson)
import HaskellWorks.Data.Json.LoadCursor
import HaskellWorks.Data.Micro
import HaskellWorks.Data.MQuery

import qualified Data.DList as DL

main :: IO ()
main = do
  -- You only have to do this once.  This creates index files in the same directory
  indexJson "data/allsets.json"

  -- Load the JSON file with index for speed
  !cursor <- loadJsonWithCsPoppyIndex "data/allsets.json"

  -- Special type that is similar to JSON Value, but does not force load everything
  -- into memory
  let !json = lightJsonAt cursor

  -- Wrap in a query DSL
  let q = MQuery (DL.singleton json)

  putStrLn "How many top-level JSON values"
  putPretty $ q & count

  putStrLn "First ten fields"
  putPretty $ q >>= entry & limit 10

  putStrLn "Accessing field named 'UGL'"
  putPretty $ q >>= entry >>= named "UGL" & limit 10

If there are no further issues, I will be closing this issue.