OpenPeeDeeP / depguard

Go linter that checks if package imports are in a list of acceptable packages.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Depguard hangs with go modules

jirfag opened this issue · comments

Hi! A user of golangci-lint reported the issue: golangci-lint without depguard runs in 17s and it runs 10m with depguard.

I can reproduce it in golangci-lint repo with GO111MODULE=on and config .golangci.yml:

linters-settings:
  depguard:
    list-type: blacklist
    packages:
      - github.com/sirupsen/logrus

linters:
  disable-all: true
  enable:
    - depguard

The problem is in function:

func (dg *Depguard) createImportMap(prog *loader.Program) (map[string][]token.Position, error) {
	importMap := make(map[string][]token.Position)
	//For the directly imported packages
	for _, imported := range prog.InitialPackages() {
		//Go through their files
		for _, file := range imported.Files {
			//And populate a map of all direct imports and their positions
			//This will filter out GoRoot depending on the Depguard.IncludeGoRoot
			for _, fileImport := range file.Imports {
				fileImportPath := cleanBasicLitString(fileImport.Path.Value)
				if !dg.IncludeGoRoot {
					pkg, err := dg.buildCtx.Import(fileImportPath, dg.cwd, 0)
					if err != nil {
						return nil, err
					}
					if pkg.Goroot {
						continue
					}
				}
				position := prog.Fset.Position(fileImport.Pos())
				positions, found := importMap[fileImportPath]
				if !found {
					importMap[fileImportPath] = []token.Position{
						position,
					}
					continue
				}
				importMap[fileImportPath] = append(positions, position)
			}
		}
	}
	return importMap, nil
}

It spends the most of time in dg.buildCtx.Import. I guess we need to find a more lightweight wa to determine whether a package is in a GOROOT.

I agree. I wonder if there is a way to cache all goroot packages in memory first then do map lookups.

Also was developed in 1.10 so maybe there is an improvement in 1.11 that we can do.

I created a PR for this issue #8 . Not sure if this is the way to go (caching repeated calls that check if an import is a root package). In any case, only the implementation of RootChecker.IsRoot should change.

I want to keep this open because the caching is a step in the right direction. But I would like to investigate if I can pull all GOROOT packages for the currently running installation. Not sure if it is more performant to just cache the results or pull everything.

I'll still investigate the pulling of all packages. But I think the fix @hbandura will significantly decrease the time. Thanks for the PR BTW.

I have also tagged it so pulling it into Golangci-lint should be as simple as go get github.com/OpenPeeDeeP/depguard@v1.0.0

I didn't find a way (to pull all GOROOT packages) by using the build.Package type: if I'm not mistaken this type is not used anywhere except in the build package itself, and in a couple of commands (go, compile, doc), but no other packages.

So if there is another way, it's probably not using this API but some other which I have no idea of.

I was considering to see if they have a go1.11 way of doing it but then that ties to only 1.11 and would break others who use it not using 1.11... I'll dig when I have time.

I've just tested golangci-lint with depguard:1.0 locally in the project that spawned my curiosity on depguard's performance: depguard went down from 2 minutes to 23 seconds.

I still think it's too much for what it's doing.

I wasn't able to run standalone depguard in the same way that it's being run by golangci-lint. Running it in the base folder of my repo would not recursively test all files, while running golangci-lint with depguard would. I'd like to be able to emulate this in order to fully understand where are those remaining 23 seconds coming from.

When running it standalone, I noticed that the conf.Load call would take a lot of time, but this is inside cmd/main.go and I suspect that golangci-lint is not using that but probably calling Run directly.

/edit I originally thought it was 35 seconds but that was the total time of golangci-lint, depguard took 23 seconds out of 35 seconds of golangci-lint.

/edit2 I've just found out I can send ./... as argument to standalone depguard and it does what I needed. I'll try to see where most of the remaining time is.

Running locally some more, it seems that, yes, most (99.99% ?) of the time spent in the Run function, is inside the build.Context.Import call.
Interestingly, in my local environment it takes less than 1 ms for actual root packages (errors, time, strings, io/ioutil, etc), but 300 ms for non root packages (e.g github.com/golang/protobuf/proto).
I tested that because I had the idea of maybe hardcoding most common root packages in the RootChecker, but given these findings, it would be useless.
The only way to speed this up is either completely avoiding calling build.Context.Import on non-root packages (for example by your suggestion of being able to have the complete exhaustive list), or reduce the number of calls (with the cache we added).

I wonder if there's any other way of also reducing the number of calls. For example, if we know that github.com/golang/protobuf is a non-root package, then github.com/golang/protobuf/proto should also be a non root package, right? There's probably many border cases and weird behaviours with and without go modules here, but probably half of the calls to Import in my project are from project imports, and absolutely all of them start with myrepo/myproject/[package]. I'm thinking out loud but I wonder if there's any way to avoid all the calls from subpackages.

My previous idea is not a good one:

database/sql is a golang package, but database is not. The same happens with some of my local project paths.

I've just made the PR for golangci-lint: golangci/golangci-lint#589
Not sure what their process is or how fast are they to respond

Going with your train of thought on caching. What if it cached the results of all imports it processes and whether it was flagged or not. Then we could check the cache first. Before calling import. But if it is a new import we have to see if it is in the root.

Your database vs database/sql is intresting but database isn't importable so is it a concern?

The flagged or not flagged part is true, but it seems that the effort is not worth it. The time lost in the build.Context.Import function is so big that any optimization on the rest will go unnoticed (at least this happened in the project I'm working on, and it still took more than 20 seconds to complete).

Right, database was never imported, but I tried doing some experiments by checking if, say, the parent path is a root package or not. If it is, then this package is root also, and viceversa (I THINK). But there were a lot more of invalid paths (like database) than real parent packages. In the end I couldn't make anything out of it. I tried just using the first part of the path, or the parent, etc.
Given this I'd say that, probably, this is our best solution if we maintain the idea of checking via build.Context.Import.

And I'm most certain that this whole IsRoot check is the only part that needs any performance boosts right now, at least with the use cases I checked.

I think the next step should be to go with your suggestion, finding a completely different way to check if a path is a root package, without the usage of build.Context.Import

Sorry, I follow now. I was thinking Import was called every time again... It is not 🤦‍♂

This has been merged into golangci-lint latest ( golangci/golangci-lint#589 )

Depguard is still slow with go modules. I'me testing it on golangci-lint self repo: it takes 31s to execute.

Good to note thanks. I'll dig into it soon. I had an internal tool that broke with go modules and had to switch to using the go command directly using go list and go list was slow at figuring out dependencies...