Depguard hangs with go modules
jirfag opened this issue · comments
Hi! A user of golangci-lint reported the issue: golangci-lint without depguard runs in 17s and it runs 10m with depguard.
I can reproduce it in golangci-lint
repo with GO111MODULE=on
and config .golangci.yml
:
linters-settings:
depguard:
list-type: blacklist
packages:
- github.com/sirupsen/logrus
linters:
disable-all: true
enable:
- depguard
The problem is in function:
func (dg *Depguard) createImportMap(prog *loader.Program) (map[string][]token.Position, error) {
importMap := make(map[string][]token.Position)
//For the directly imported packages
for _, imported := range prog.InitialPackages() {
//Go through their files
for _, file := range imported.Files {
//And populate a map of all direct imports and their positions
//This will filter out GoRoot depending on the Depguard.IncludeGoRoot
for _, fileImport := range file.Imports {
fileImportPath := cleanBasicLitString(fileImport.Path.Value)
if !dg.IncludeGoRoot {
pkg, err := dg.buildCtx.Import(fileImportPath, dg.cwd, 0)
if err != nil {
return nil, err
}
if pkg.Goroot {
continue
}
}
position := prog.Fset.Position(fileImport.Pos())
positions, found := importMap[fileImportPath]
if !found {
importMap[fileImportPath] = []token.Position{
position,
}
continue
}
importMap[fileImportPath] = append(positions, position)
}
}
}
return importMap, nil
}
It spends the most of time in dg.buildCtx.Import
. I guess we need to find a more lightweight wa to determine whether a package is in a GOROOT.
I agree. I wonder if there is a way to cache all goroot packages in memory first then do map lookups.
Also was developed in 1.10 so maybe there is an improvement in 1.11 that we can do.
I created a PR for this issue #8 . Not sure if this is the way to go (caching repeated calls that check if an import is a root package). In any case, only the implementation of RootChecker.IsRoot
should change.
I want to keep this open because the caching is a step in the right direction. But I would like to investigate if I can pull all GOROOT packages for the currently running installation. Not sure if it is more performant to just cache the results or pull everything.
I'll still investigate the pulling of all packages. But I think the fix @hbandura will significantly decrease the time. Thanks for the PR BTW.
I have also tagged it so pulling it into Golangci-lint should be as simple as go get github.com/OpenPeeDeeP/depguard@v1.0.0
I didn't find a way (to pull all GOROOT packages) by using the build.Package
type: if I'm not mistaken this type is not used anywhere except in the build
package itself, and in a couple of commands (go, compile, doc), but no other packages.
So if there is another way, it's probably not using this API but some other which I have no idea of.
I was considering to see if they have a go1.11 way of doing it but then that ties to only 1.11 and would break others who use it not using 1.11... I'll dig when I have time.
I've just tested golangci-lint
with depguard:1.0
locally in the project that spawned my curiosity on depguard's performance: depguard went down from 2 minutes to 23 seconds.
I still think it's too much for what it's doing.
I wasn't able to run standalone depguard in the same way that it's being run by golangci-lint. Running it in the base folder of my repo would not recursively test all files, while running golangci-lint with depguard would. I'd like to be able to emulate this in order to fully understand where are those remaining 23 seconds coming from.
When running it standalone, I noticed that the conf.Load
call would take a lot of time, but this is inside cmd/main.go
and I suspect that golangci-lint is not using that but probably calling Run
directly.
/edit I originally thought it was 35 seconds but that was the total time of golangci-lint, depguard took 23 seconds out of 35 seconds of golangci-lint.
/edit2 I've just found out I can send ./...
as argument to standalone depguard and it does what I needed. I'll try to see where most of the remaining time is.
Running locally some more, it seems that, yes, most (99.99% ?) of the time spent in the Run
function, is inside the build.Context.Import
call.
Interestingly, in my local environment it takes less than 1 ms for actual root packages (errors
, time
, strings
, io/ioutil
, etc), but 300 ms for non root packages (e.g github.com/golang/protobuf/proto
).
I tested that because I had the idea of maybe hardcoding most common root packages in the RootChecker
, but given these findings, it would be useless.
The only way to speed this up is either completely avoiding calling build.Context.Import
on non-root packages (for example by your suggestion of being able to have the complete exhaustive list), or reduce the number of calls (with the cache we added).
I wonder if there's any other way of also reducing the number of calls. For example, if we know that github.com/golang/protobuf
is a non-root package, then github.com/golang/protobuf/proto
should also be a non root package, right? There's probably many border cases and weird behaviours with and without go modules here, but probably half of the calls to Import
in my project are from project imports, and absolutely all of them start with myrepo/myproject/[package]
. I'm thinking out loud but I wonder if there's any way to avoid all the calls from subpackages.
My previous idea is not a good one:
database/sql
is a golang package, but database
is not. The same happens with some of my local project paths.
I've just made the PR for golangci-lint
: golangci/golangci-lint#589
Not sure what their process is or how fast are they to respond
Going with your train of thought on caching. What if it cached the results of all imports it processes and whether it was flagged or not. Then we could check the cache first. Before calling import. But if it is a new import we have to see if it is in the root.
Your database
vs database/sql
is intresting but database
isn't importable so is it a concern?
The flagged or not flagged part is true, but it seems that the effort is not worth it. The time lost in the build.Context.Import
function is so big that any optimization on the rest will go unnoticed (at least this happened in the project I'm working on, and it still took more than 20 seconds to complete).
Right, database
was never imported, but I tried doing some experiments by checking if, say, the parent path is a root package or not. If it is, then this package is root also, and viceversa (I THINK). But there were a lot more of invalid paths (like database
) than real parent packages. In the end I couldn't make anything out of it. I tried just using the first part of the path, or the parent, etc.
Given this I'd say that, probably, this is our best solution if we maintain the idea of checking via build.Context.Import
.
And I'm most certain that this whole IsRoot
check is the only part that needs any performance boosts right now, at least with the use cases I checked.
I think the next step should be to go with your suggestion, finding a completely different way to check if a path is a root package, without the usage of build.Context.Import
Sorry, I follow now. I was thinking Import
was called every time again... It is not 🤦♂
This has been merged into golangci-lint latest ( golangci/golangci-lint#589 )
Depguard is still slow with go modules. I'me testing it on golangci-lint
self repo: it takes 31s to execute.
Good to note thanks. I'll dig into it soon. I had an internal tool that broke with go modules and had to switch to using the go command directly using go list
and go list
was slow at figuring out dependencies...