boyter / searchcode-server

The offical home of searchcode-server where you can run searchcode locally. Note that master is generally unstable in the sense that it is not a release. Check releases for release versions https://github.com/boyter/searchcode-server/releases

Home Page:https://searchcodeserver.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exclude specific directory pattern from search results

opt9 opened this issue · comments

When I search with "some" keyword, so many results are returned and almost all of them are ".../doc/..." directories.

I've found "Path Filter" in UI, but It's disabled and can't use.

How can I exclude a specific directory pattern from search results?

Currently you cannot. However it raises an interesting question. Some may want to remove certain portion of the file from the general search logic. Currently in the indexing pipeline it is hard-coded to add certain things such as path and the like. Being able to control this would allow you to configure the above.

So this is now sitting in issue179 branch.

You can now control what makes its way into the "all" database. By default this would be the following

index_all_fields=content,filename,filenamereverse,path,interesting

However in your case you would want to remove path from the above. Then the path information will not be added to the general index. You will still be able to click on a path inside the UI to narrow to just that directory, but say you have ./examples/ in your codebase a search for examples will only match files which actually have the name or text examples inside them.

Resolved all unit tests. Have verified that this works by setting it to only index the path by setting index_all_fields=filename so only the filename was indexed. Worked as expected. Going to merge into master.

@opt9 For your case pull from master, build and set your searchcode.properties file to have the following index_all_fields=content,filename,filenamereverse,interesting and then start. Path will be removed from the index. A start triggers a full index so it should take effect once indexing has finished.

Thanks for your efforts. :-)

There seems to be a miscommunication.
I'll clarify my use-case.

I'm using so many open source libraries, including AngularJS.
When I search with "Foo" keyword, I get so many results, because I have about 600 repositories.
Almost all results are AngularJS documents inside "my_repo/resources/angularjs/doc/*" directory.

I'm not interested in AngularJS documents.
Just want the result in my custom codes inside "my_repo/src/*" directory.

So I want to exclude "my_repo/resources/angularjs/doc/*" directory from results.

For example, if I want to search "{{{" in our code, I get so many AngularJS example codes in AngularJS documents. but it's not I want and annoying.

In other words, I want to narrow down search scope to my codes, excluding 3rd party libraries.

If you have any question, please do not hesitate to reply.

Thanks ;-)

If you don’t mind, Would you please reopen #179 ?

I think you can already do that actually based on what you have described.

So when you do a search have a look at the results, you can see where it says for each file pig.go in go /doc/codewalk/pig.go | 121 lines | Go each of the folders (other than the last one) is clickable. Click on it and it will be filtered down to the directory you want.

For example, http://demo.searchcodeserver.com/?q=copyright&repo=go&fl=doc_codewalk is filtered down to the /doc/codewalk directory inside Go.

In your case you would need to use something like,

~/?q=foo&fl=my_repo_src

Which should produce what you want. Admittedly there could be a better way to allow this filtering to happen on the UI somehow rather than just allowing you to filter down though the click. Something I will have a think about.

Is your real intention to just limit the search of never index those files?

@opt9 Does that work for you? I have a few ideas on how to make this better consumed through the filters but wondering if at least it unblocks you.

IMO, it’s not related to indexes.

and the inclusion of a specific directory is different from the exclusion of a specific directory pattern

If I can search with “?q=foo&fl<>/doc/”, that would be perfect.

Because I want to search all repositories have a “foo” keyword but exclude all documents directories.

Oops, TIL, GitHub markdown change my one asterisk to italic, double asterisks to bold.

Hmm ill have a think about if that's easily possible. It should be easy to do though the current search I think. Just requires some thinking about it.