haskell-works / hw-json

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

loadJsonStrict goes into an (apparently) infinite loop at 100% CPU trying to parse a 28MB JSON file

jimwhitson opened this issue · comments

I'm calling the library like this:

λ> import HaskellWorks.Data.Json.LoadCursor
λ> import HaskellWorks.Data.Json.Value
λ> x = loadJsonStrict "AllSets.json"
λ> fmap length <$> x
Read file
Created cursor

After which it start using 100% CPU and doesn't appear to terminate. The JSON in question is:

https://mtgjson.com/json/AllSets.json.zip

It does the same thing with other large JSON files, but this is the only one I have a convenient link on the web for.

It appears to work correctly for a trivially small JSON file, I haven't tested progressively larger ones yet - I only have 20MB+ ones on hand.

I'm on version 0.6.0.0 installed via stack, using ghc 8.2.2.

Thanks,
Jim

Hi Jim,

Be careful when you use the repl. If you run the repl inside the hw-json project, then the code will not be optimised. Particularly, none of the functions will be inlined, which kills performance.

Instead, create a new project and reference hw-json as a library.

Alternatively, write your code in Main.hs of the hw-json library and run stack build.

You should be able to drop the following code into Main.hs for an example of how to query the document:

{-# LANGUAGE BangPatterns        #-}
{-# LANGUAGE OverloadedStrings   #-}
{-# LANGUAGE ScopedTypeVariables #-}

module Main where

import Data.Function
import HaskellWorks.Data.Json.LightJson
import HaskellWorks.Data.Json.Load       (indexJson)
import HaskellWorks.Data.Json.LoadCursor
import HaskellWorks.Data.Micro
import HaskellWorks.Data.MQuery

import qualified Data.DList as DL

main :: IO ()
main = do
  -- You only have to do this once.  This creates index files in the same directory
  indexJson "data/allsets.json"

  -- Load the JSON file with index for speed
  !cursor <- loadJsonWithCsPoppyIndex "data/allsets.json"

  -- Special type that is similar to JSON Value, but does not force load everything
  -- into memory
  let !json = lightJsonAt cursor

  -- Wrap in a query DSL
  let q = MQuery (DL.singleton json)

  putStrLn "How many top-level JSON values"
  putPretty $ q & count

  putStrLn "First ten fields"
  putPretty $ q >>= entry & limit 10

  putStrLn "Accessing field named 'UGL'"
  putPretty $ q >>= entry >>= named "UGL" & limit 10

If there are no further issues, I will be closing this issue.