oss-review-toolkit / ort

A suite of tools to automate software compliance checks.

Home Page:https://oss-review-toolkit.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analyzer does not allow to have multiple independent projects with the same type / name / version

sschuberth opened this issue · comments

See

// TODO: It might be, e.g. in the case of PIP "requirements.txt" projects, that different projects with
// the same ID exist. We need to decide how to handle that case.
val existingProject = projects.find { it.id == projectAnalyzerResult.project.id }

The same occurs when analyzing e.g. https://github.com/aws/glide-for-redis.git as it contains multiple (independent) Cargo.toml files wit the same content, like

$ head -5 go/Cargo.toml
[package]
name = "glide-rs"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
$ head -5 java/Cargo.toml
[package]
name = "glide-rs"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"

@oss-review-toolkit/core-devs, how about if we simply add parent directory names as suffixes to the project name until the is unique?

A couple of questions:

Do you propose this as a general solution or specific to Cargo?

Why add the directory names as suffixes and not prefixes? That seems unintuitive. I would rather prefix them and always take the full path, as it could otherwise be confusing. So for the example above use the names java/glide-rs and csharp/lib/glide-rs.

Should this always happen or only if there are conflicting names?

Do you propose this as a general solution or specific to Cargo?

As a general solution, see also the PIP case mentioned in the quoted TODO.

Why add the directory names as suffixes and not prefixes?

Because at the Cargo example, I find glide-rs-go / glide-rs-java to read nicer than go-glide-rs / java-glide-rs. (I probably should have said that I envisioned dashes instead of slashes as separators.)

Should this always happen or only if there are conflicting names?

Probably yes, as otherwise names could get unnecessary complicated.

I kind of like this approach, however I'm not sure about the details. For example, this approach could be difficult for package managers that support project dependencies (e.g. Maven), because those references might break if we rename projects.
Could you maybe collect some more examples to show how the naming algorithm would work for repositories that are affected by this issue? That would be good input to further refine the idea.

Some insights here, are we aiming to a common global identification ?
Or if this is too much, maybe instead of dash could go to something like gradle representations:

glide-for-redis.go.glide-rs:0.1.0

Is this a little more logic considering that we have a better tracking from exact folder

I kind of like this approach, however I'm not sure about the details. For example, this approach could be difficult for package managers that support project dependencies (e.g. Maven), because those references might break if we rename projects.

IIRC in GoMod it could be analog.