google / zoekt

Fast trigram based code search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

surface more semantic information

hanwen opened this issue · comments

support operators such as string:literal type:Blabla method:BlaBla etc.

Universal-ctags extracts some of this information (but not very well, see below), and this will need language specific mappings to deal with different notions of things (is an interface different from a type?)

{"_type": "tag", "name": "FileCount", "path": "api.go", "pattern": "/^\tFileCount int$/", "language": "Go", "line": 103, "kind": "member", "scope": "Stats", "scopeKind": "struct"}
{"_type": "tag", "name": "FileMatch", "path": "api.go", "pattern": "/^type FileMatch struct {$/", "language": "Go", "line": 27, "kind": "func"} ## ??
{"_type": "tag", "name": "FileMatch", "path": "api.go", "pattern": "/^type FileMatch struct {$/", "language": "Go", "line": 27, "kind": "struct"}

should this use a separate corpus, or be integrated with the normal string search?

I was discussing this with @bzz for us to do this as part of an OSD by integrating bblfsh using the go-client.

Would you be open to a changeset that does this? It adds a dependency of running bblfsh as a container next to zoekt but opens up a lot of additional search types.

I think it should be straightforward to include support. See

Symbols []DocumentSection

you could start by having your binding generate this data. Optionally, you could abstract away the ctags parser here:

https://gerrit.googlesource.com/zoekt/+/de00d84e19b6761693ea2310f695e13c27433d37/build/builder.go#86

so it can invoke babelfish.

bblfish looks intriguing, but it seems like an involved setup (with docker containers and whatnot). Is there a clean API separation to minimize the dependency from zoekt to bblfish?

This bug here is really about supporting different types of DocumentSections and supporting querying for that, but it should be quite independent from the system that determines where those sections are.

Thank you @hanwen, I see that @mcuadros already answered your question with #37

I'll avoid further discussions on this issue, since I understand it's a bug report.

If I am reading the code properly, it's matching against documentation blocks, and not against the CTags information?

I was expecting a match of type:Method Name:Read or type:Interface Name:Reader but the current PR is type:Reader, right?

In this change series, there is still only one type of block, but you can search for matches inside those blocks quickly. The way universal CTags works is that sym:Read will typically give you both the type and the method.

Another TODO is to change the DocumentSection type to feature a type string or bitmap, and then filter by DocumentSections of a given type. That should be straightforward, but I want to roll this change out first.

FYI src-d / babelfish seems to be dead now, on the other hand GitHub's parser https://github.com/github/semantic is open source (and uses Haskell & Bazel, yay \o/). It parses files in isolation using treesitter grammars, and outputs limited definitions and references information.

Mentioning just in case.