google / zoekt

Fast trigram based code search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Occasional Invalid JSON From Ctags

dharesign opened this issue · comments

I'm seeing occasional errors when indexing git repos. I switched the debug flag in ctags/json.go, and saw the following:

2021/04/08 03:18:38 post "{\"command\":\"generate-tags\",\"filename\":\"flang/test/Preprocessing/pp003.F\",\"size\":338}\n"
2021/04/08 03:18:38 * function-like macros
      integer function IFLM(x)
        integer :: x
        IFLM = x
      end function IFLM
      program main
#define IFLM(x) ((x)+111)
      integer :: res
      res = IFLM(666)
      if (res .eq. 777) then
        print *, 'pp003.F pass'
      else
        print *, 'pp003.F FAIL: ', res
      end if
      end

2021/04/08 03:18:38 read "{\"_type\": \"tag\", \"name\": \"main\", \"path\": \"flang/test/Preprocessing/pp003.F\", \"pattern\": \"/^      program main$/\", \"language\": \"Fortran\", \"line\": 6, \"kind\": \"program\", \"roles\": \"def\"}"
2021/04/08 03:18:38 read "{\"_type\": \"tag\", \"name\": \"res\", \"path\": \"flang/test/Preprocessing/pp003.F\", \"pattern\": \"/^      integer :: res$/\", \"language\": \"Fortran\", \"line\": 8, \"kind\": \"variable\", \"roles\": \"def\", \"scope\": \"main\", \"scopeKind\": \"program\"}"
2021/04/08 03:18:38 read "path\": \"flang/test/Preprocessing/pp003.F\", \"pattern\": \"/^      integer :: res$/\", \"language\": \"Fortran\", \"line\": 8, \"kind\": \"variable\", \"roles\": \"def\", \"scope\": \"main\", \"scopeKind\": \"program\"}"

As you can see, the third line read is not valid JSON. In fact it appears to be a partial repeat of the second line.

If I run ctags manually, I don't see the same issue:

$ /usr/local/bin/universal-ctags --_interactive=sandbox --fields=*
{"_type": "program", "name": "Universal Ctags", "version": "5.9.0"}
{"command":"generate-tags","filename":"flang/test/Preprocessing/pp003.F","size":338}
* function-like macros
      integer function IFLM(x)
        integer :: x
        IFLM = x
      end function IFLM
      program main
#define IFLM(x) ((x)+111)
      integer :: res
      res = IFLM(666)
      if (res .eq. 777) then
        print *, 'pp003.F pass'
      else
        print *, 'pp003.F FAIL: ', res
      end if
      end
{"_type": "tag", "name": "IFLM", "path": "flang/test/Preprocessing/pp003.F", "pattern": "/^      integer function IFLM(/", "language": "Fortran", "line": 2, "kind": "function", "roles": "def"}
{"_type": "tag", "name": "main", "path": "flang/test/Preprocessing/pp003.F", "pattern": "/^      program main$/", "language": "Fortran", "line": 6, "kind": "program", "roles": "def"}
{"_type": "tag", "name": "res", "path": "flang/test/Preprocessing/pp003.F", "pattern": "/^      integer :: res$/", "language": "Fortran", "line": 8, "kind": "variable", "roles": "def", "scope": "main", "scopeKind": "program"}
{"_type": "completed", "command": "generate-tags"}

Note that the output from zoekt is also missing the first tag. Any idea what the issue is?

Can you confirm that you see a test failure with https://gerrit-review.googlesource.com/c/zoekt/+/302663 ?

Download with

git fetch https://gerrit.googlesource.com/zoekt refs/changes/63/302663/1 && git checkout FETCH_HEAD

it works for me.

I can try. How do I run tests? I'm not familiar with Go.

One thing to note though is that I expect this simple test will work also. The repositories which fail are all on the large side: 4196 files (128M); 92K files (660M); 531K files (4042M).

Maybe have your test pipe the same file over and over until it gets to 100MB of source (or more) piped into Ctags? Interestingly it seemed to be consistently failing at the same place.

I added some quick counters:

2021/04/08 16:50:05 Failure after writing 214372473 file bytes
2021/04/08 16:50:05 Failure after writing 217115443 bytes
2021/04/08 16:50:05 Failure after reading 133479912 bytes
2021/04/08 16:50:05 Failure after exchanging 350595355 bytes

The first number is the sum of the req.Sizes, the second is the total amount of data written to the Ctags pipe, the third is the total amount of data read from the Ctags pipe, and the fourth is the sum of the second and third.

If I run your test, I only see the entries for main and res, not IFLM which I also get when I run Ctags directly. Any idea why?

It seems you had a space prior to the initial *, which was making it not return the IFLM line.

OK, so it seems to just be a bug in Ctags. I modified zoekt to create a new parser if it encountered an error, and when it hit it continued where it left off and failed again shortly thereafter. It seems with the new failure, I can reproduce using Ctags:

$ universal-ctags --_interactive=sandbox --fields=*
{"_type": "program", "name": "Universal Ctags", "version": "5.9.0"}
{"command":"generate-tags","filename":"flang/test/Preprocessing/pp038.F","size":378}
* FLM call with closing ')' on next line (not a continuation)
      integer function IFLM(x)
        integer :: x
        IFLM = x
      end function IFLM
      program main
#define IFLM(x) ((x)+111)
      integer :: res
      res = IFLM(666
)
      if (res .eq. 777) then
        print *, 'pp038.F pass'
      else
        print *, 'pp038.F FAIL: ', res
      end if
      end
{"_type": "tag", "name": "main", "path": "flang/test/Preprocessing/pp038.F", "pattern": "/^      program main$/", "language": "Fortran", "line": 6, "kind": "program", "roles": "def"}
{"_type": "tag", "name": "res", "path": "flang/test/Preprocessing/pp038.F", "pattern": "/^      integer :: res$/", "language": "Fortran", "line": 8, "kind": "variable", "roles": "def", "scope": "main", "scopeKind": "program"}
path": "flang/test/Preprocessing/pp038.F", "pattern": "/^      integer :: res$/", "language": "Fortran", "line": 8, "kind": "variable", "roles": "def", "scope": "main", "scopeKind": "program"}
{"_type": "completed", "command": "generate-tags"}

Seems to be a bug in Ctags.