google / codesearch

Fast, indexed regexp search over large file trees

Home Page:http://swtch.com/~rsc/regexp/regexp4.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

index: merge of index fails

frederich opened this issue · comments

I've tried to index the Go sources on Windows. The first run of Cindex is like expected. The second run doesn't remove .csearchindex~ and has created a second file .csearchindex~~. I don't know whether the merge of .csearchindex and .csearchindex~ has been successful.

The expected behaviour is .csearchindex and .csearchindex~ will be merged and .csearchindex~ will removed.

OS: Windows 7
Go: go1.5rc1

First Run

v% cindex go
2015/08/06 09:36:34 index E:\home\visfj\go
2015/08/06 09:36:34 flush index
2015/08/06 09:36:34 merge 0 files + mem
2015/08/06 09:36:35 39563770 data bytes, 9212960 index bytes
2015/08/06 09:36:35 done
v% ls -al
total 9068
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Aug  6 09:36 .
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Jul 30 13:22 ..
-rw-r--r-- 1 visfj Domänen-Benutzer    6213 Aug  6 08:36 .bash_history
-rw-r--r-- 1 visfj Domänen-Benutzer     422 Aug  5 11:29 .bashrc
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex
-rw-r--r-- 1 visfj Domänen-Benutzer     105 Aug  5 14:26 .gitconfig
-rw-r--r-- 1 visfj Domänen-Benutzer     114 Jul 31 13:56 .gitcookies

Second Run

v% cindex
2015/08/06 09:36:45 index E:\home\visfj\go
2015/08/06 09:36:46 flush index
2015/08/06 09:36:46 merge 0 files + mem
2015/08/06 09:36:46 39563770 data bytes, 9212960 index bytes
2015/08/06 09:36:46 merge E:\home\visfj\.csearchindex E:\home\visfj\.csearchindex~
2015/08/06 09:36:46 done
v% ls -al
total 27068
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Aug  6 09:36 .
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Jul 30 13:22 ..
-rw-r--r-- 1 visfj Domänen-Benutzer    6213 Aug  6 08:36 .bash_history
-rw-r--r-- 1 visfj Domänen-Benutzer     422 Aug  5 11:29 .bashrc
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex~
-rw-r--r-- 1 visfj Domänen-Benutzer 9212944 Aug  6 09:36 .csearchindex~~
-rw-r--r-- 1 visfj Domänen-Benutzer     105 Aug  5 14:26 .gitconfig
-rw-r--r-- 1 visfj Domänen-Benutzer     114 Jul 31 13:56 .gitcookies

I'm seeing this precise behavior as well. Windows Server 2012 RS Datacenter. Go version go1.5.2 windows/amd64.

I've looked into this a bit, and the proximal problem seems to be that the os.Rename at the end of cindex.main() (cindex.go:156 on what I've got right now) is failing. If I capture its error and print it, it claims:

2016/05/03 09:30:05 rename error rename C:\Users\mrsdev\.csearchindex~~ C:\Users\mrsdev\.csearchindex: The process cannot access the file because it is being used by another process.

I'm no go expert (or even novice), but I dug into the merge code a bit, and it appears that the index file is being accessed through a mmap (see codesearch/index/read.Open). I guess the mmap is holding on to the file longer than we want, or perhaps there's a missing call to release it.

I'm happy to help sort this out, but I'm a bit out of my depth with go here, and I don't really have a lot of time to devote to this...I would just like codesearch to work! Any ideas?

A bit more digging. The mmap call ultimately resolves to a call to mmapFile which has platform-specific implementations in codesearch/index/mmap_[linux, windows, bsd].go. None of these, as far as I can tell, provides facilities for unmapping the files, though it seems clear that this should be done.

The unmapping support appears to be provided by go in the form of syscall.Munmap. My guess right now is that we should a) expose some unmapping API at the same layer as the codesearch/index/read.mmap and b) call that new API to unmap the files appropriately when merging is complete.

@dgryski You seem to be keeping an eye on this repository. If I put in the time to make a patch for this, what are the chances that it'll be merged in? There doesn't seem to be a lot of life in this project, and I'd rather not spin my wheels for nothing.

I unfortunately don't have a commit bit here, so I can't merge any fixes in :(

However there is at least one other fork with activity that would appreciate the patch: https://github.com/junkblocker/codesearch

And having the patch attached as a PR to this repository means when/if somebody with a commit bit does pay attention, it's easy to merge.

Thanks for the pointer; I've taken the issue up over there. If a patch gets created, I'll see what I can do about getting it over here, too. It seems reasonable that everything on that fork might flow this way.

This issue has actually already been resolved over on junkblocker's fork.

commented

Yes I saw the same issue on Windows 10 and can confirm that junkblocker's fork fixes it.