google / codesearch

Fast, indexed regexp search over large file trees

Home Page:http://swtch.com/~rsc/regexp/regexp4.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some files are not indexed

nmklong opened this issue · comments

Some files are not being indexed, is there a way to check or fix this? I tried the same search term on both "csearch" and "git grep", "git grep" can find the files containing the term but csearch cannot, i'm pretty sure that the command i'm using to index is correct:

cindex ~/dev-projects/project-temp/

You can run with -logskip flag to see why the files were skipped.

@junkblocker I ran csearch -logskip "terms" but it's saying it's an incorrect flag

The flag applies at indexing time to cindex and not at search time to csearch.

@junkblocker Tried cindex -logskip ./ but it's still saying incorrect flag

The files that are skipped generally are thought to be binary files -- those with very long lines or too many invalid utf8 characters.

commented

Here is how the file skipping is decided:

// Tuning constants for detecting text files.
// A file is assumed not to be text files (and thus not indexed)
// if it contains an invalid UTF-8 sequences, if it is longer than maxFileLength
// bytes, if it contains a line longer than maxLineLen bytes,
// or if it contains more than maxTextTrigrams distinct trigrams.
const (
	maxFileLen      = 1 << 30
	maxLineLen      = 2000
	maxTextTrigrams = 20000
)

// Tuning constants for detecting text files.

When I faced the same file skipping issue, I just changed maxLineLen to 10000 and recompiled and reindexed and everything worked just fine. Quick and dirty hack as it is.

Still, can’t help a feeling binary file skipping merits to be done via some command line switch, like -maxlinelen defaulting to 2000.

Still, can’t help a feeling binary file skipping merits to be done via some command line switch, like -maxlinelen defaulting to 2000.

You might be interested in using my fork that implements this and some other options https://github.com/junkblocker/codesearch .

commented

@junkblocker: ah, good to know, thanks!

@rns thanks for that, but a noob question: how to recompile go after modifying the code locally?

I tried go get -u ~/localgocode/src/github.com/google/codesearchbut it didn't work, ~/localgocode is my GOPATH

commented

@nmklong go build, go install as described in https://golang.org/cmd/go/#hdr-Compile_packages_and_dependencies after changing the source file locally.

I’m not quite sure about that, but go get -u might well undo any changes you made locally.

@rns thanks for your great help! got it working now