ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

Home Page:https://ziglang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

package manager

andrewrk opened this issue · comments

Latest Proposal


Zig needs to make it so that people can effortlessly and confidently depend on each other's code.

Depends on #89

commented

My thoughts on Package Managers:

  • Packages should be immutable in the package repository (so the NPM problem doesn't arise).

  • Making a new release with a nice changelog should be simple. Maybe integration for reading releases from GitHub or other popular source hosts?

  • Packages should only depend on packages that are in the package repository (no GitHub urls like Go uses).

  • Private instances of the package repository should be supported from the get go (I believe this was a problem for Rust or some other newer language).


Thoughts that might not be great for Zig:

  • Enforcing Semver is an interesting concept, and projects like Elm have done it.

This is a good reference for avoiding the complexity of package managers like cargo, minimal version selection is a unique approach that avoids lockfiles, .modverify avoids deps being changed out from under you.

https://research.swtch.com/vgo

The features around verifiable builds and library verification are also really neat. Also around staged upgrades of libraries and allowing multiple major versions of the same package to be in a program at once.

Packages should be immutable in the package repository (so the NPM problem doesn't arise).

I assume you mean authors can't unpublish without admin intervention. True immutability conflicts with the hoster's legal responsibilities in most jurisdictions.

minimal version selection

I'd wait a few years to see how that pans out for Go.

Note that by minimal, they mean minimal that the authors said was okay. i.e. the version they actually tested. The author of the root module is always free to increase the minimum. It is just that the minimum isn't some arbitrary thing that changes over time when other people make releases.

My top three things are;

  • No lockfiles
  • KISS, I don't want to fight with the package manager, also should fully be integrated into build.zig no external programs.
  • Allow a file to be on each dependency to allow that dependency to go and download its own dependencies, this file should also maintain a change-log.

A good package manager can break/make a language, one of the reasons why Go has ditched atleast one of its official package managers and completely redid it (it may even be two, I haven't kept up to date with that scene).

The first thing I'm going to explore is a decentralized solution. For example, this is what package dependencies might look like:

const Builder = @import("std").build.Builder;
const builtin = @import("builtin");

pub fn build(b: &Builder) void {
    const mode = b.standardReleaseOptions();

    var exe = b.addExecutable("tetris", "src/main.zig");
    exe.setBuildMode(mode);

    exe.addGitPackage("clap", "https://github.com/Hejsil/zig-clap",
        "0.2.0", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");
    exe.addUrlPackage("png", "http://example.com/zig-png.tar.gz",
        "00e27a29ead4267e3de8111fcaa59b132d0533cdfdbdddf4b0604279acbcf4f4");

    b.default_step.dependOn(&exe.step);
}

Here we provide a mapping of a name and a way for zig to download or otherwise acquire the source files of the package to depend on.

Since the build system is declarative, zig can run it and query the set of build artifacts and their dependencies, and then fetch them in parallel.

Dependencies are even stricter than version locking - they are source-locked. In both examples we provide a SHA-256 hash, so that even a compromised third party provider cannot compromise your build.

When you depend on a package, you trust it. It will run zig build on the dependency to recursively find all of its dependencies, and so on. However, by providing a hash, you trust only the version you intend to; if the author updates the code and you want the updates, you will have to update the hash and potentially the URL.

Running zig build on dependencies is desirable because it provides a package the ability to query the system, depend on installed system libraries, and potentially run the C/C++ compiler. This would allow us to create Zig package wrappers for C projects, such as ffmpeg. You would even potentially use this feature for a purely C project - a build tool that downloads and builds dependencies for you.

and potentially run the C/C++ compiler
cmd/go: arbitrary code execution during “go get” #23672

although you might argue

Dependencies are even stricter than version locking - they are source-locked. In both examples we provide a SHA-256 hash, so that even a compromised third party provider cannot compromise your build.

in that case you'd have to check all the reps of all your reps recursively (manually?) on each shape change though to be really sure

in that case you'd have to check all the reps of all your reps recursively (manually?) on each shape change though to be really sure

This is already true about all software dependencies.

I've been considering how one could do this for the past few days, here is what I generally came up with (this is based off @andrewrk 's idea), I've kept out hashes to make it easier, I'm more talking about architecture then implementation details here;

  • No lock files, purely source driven
  • Have a central repository of all downloaded files (like under /usr/local/zig-vendor and have a built in to access them like @vImport("BraedonWooding/ZigJSON"), or have a unique vendor location for each 'zig' build file or rather each project, in which case we autogenerate a nice index.zig file for you to access like const vendor = @import("vendor/index.zig"); const json = vendor.ZigJSON.
  • Utilise them during building, we can go and download any new dependencies.
  • Automatically download the latest dependency as per the user requirements that is either (for structure x.y.z);
    • Keep major version i.e. upgrade y and z but not x
    • Keep minor and major i.e. upgrade all 'z' changes.
    • Keep version explicitly stated
    • Always upgrade to latest

This would also solve the issue of security fixes as most users would keep the second option which is intended for small bug fixes that don't introduce any new things, whereas the major version is for breaking changes and the minor is for new changes that are typically non-breaking.

Your build file would have something like this in your 'build' function;

...
builder.addDependency(builder.Dependency.Git, "github.com.au", "BraedonWooding", "ZigJSON", builder.Versions.NonMajor);
// Or maybe
builder.addDependency(builder.Dependency.Git, "github.com.au/BraedonWooding/ZigJSON", builder.Versions.NonMajor);
// Or just
builder.addGitDependency("github.com.au/BraedonWooding/ZigJSON", builder.Versions.NonMajor);
...

Keeping in mind that svn and mercurial (as well as plenty more) are also used quite a bit :). We could either use just a folder system of naming to detect what we have downloaded, or have a simple file storing information about all the files downloaded (note: NOT a lock file, just a file with information on what things have been downloaded). Would use tags to determine versions but could also have a simple central repository of versions linking to locations like I believe what other things have.

How would you handle multiple definitions of the same function? I find this to be the most difficult part of C/C++ package management. Or does Zig use some sort of package name prefixing?

@isaachier Well you can't have multiple definitions of a function in Zig, function overloads aren't a thing (intended).

You would import a package like;

const Json = @Import("JSON/index.zig");

fn main() void {
    Json.parse(...);
    // And whatever
}

When you 'include' things in your source Zig file they are exist under a variable kinda like a namespace (but simpler), this means that you should generally never run into multiple definitions :). If you want to 'use' an import like using in C++ you can do something like use Json; which will let you use the contents without having to refer to Json for example in the above example it would just be parse(...) instead of Json.parse(...) if you used use, you still can't use private functions however.

If for some reason you 'use' two 'libraries' that have a dual function definition you'll get an error and will most likely have to put one under a namespace/variable, very rarely should you use use :).

I don't expect a clash in the language necessarily, but in the linker aren't there duplicate definitions for parse if multiple packages define it? Or is it automatically made into Json_parse?

@isaachier If you don't define your functions as export fn a() void, then Zig is allowed to rename the functions to avoid collisions.

OK that makes sense. About package managers, I'm sure I'm dealing with experts here 😄, but wanted to make sure a few points are addressed for completeness.

  • Should multiple versions of the same library be allowed? This can occur when library A relies on libraries B and C. A needs C version 2 and B needs C version 1. How do you handle that scenario? I'm not sure about the symbol exports, but that might be an issue if you intend to link in both versions.
  • Are the packages downloaded independently for each project or cached on the local disk (like maven and Hunter). In the latter case, you have to consider the use of build flags and their effect on the shared build.

These are important questions.

The first question brings up an even more fundamental question which we have to ask ourselves if we go down the decentralized package route: how do you even know that a given package is the same one as another version?

For example, if FancyPantsJson library is mirrored on GitHub and BitBucket, and you have this:

// in main package
exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

// in a nested package
exe.addGitPackage("fancypantsjson", "https://bitbucket.org/mirrors-r-us/zig-fancypants.git",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

Here, we know that the library is the same because the sha-256 matches, and that means we can use the same code for both dependencies. However, consider if one was on a slightly newer version:

// in main package
exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708");

// in a nested package
exe.addGitPackage("fancypantsjson", "https://bitbucket.org/mirrors-r-us/zig-fancypants.git",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

Because this is decentralized, the name "fancypantsjson" does not uniquely identify the package. It's just a name mapped to code so that you can do @import("fancypantsjson") inside the package that depends on it.

But we want to know if this situation occurs. Here's my proposal for how this will work:

comptime {
    // these are random bytes to uniquely identify this package
    // developers compute these once when they create a new package and then
    // never change it
    const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";

    const package_info = @declarePackage(package_id, builtin.SemVer {
        .major = 1,
        .minor = 0,
        .revision = 1,
    });

    // these are the other packages that were not even analyzed because they
    // called @declarePackage with an older, but API-compatible version number.
    for (package_info.superseded) |ver| {
        @compileLog("using 1.0.1 instead of", ver.major, ver.minor, ver.revision);
    }

    // these are the other packages that have matching package ids, but
    // will additionally be compiled in because they do not have compatible
    // APIs according to semver
    for (package_info.coexisting) |pkg| {
        @compileLog("in addition to 1.0.1 this version is present",
            pkg.sem_ver.major, pkg.sem_ver.minor, pkg.sem_ver.revision);
    }
}

The prototype of this function would be:

// thes structs declared in @import("builtin");
pub const SemVer = struct {
    major: @typeOf(1),
    minor: @typeOf(1),
    revision: @typeOf(1),
};
const Namespace = @typeOf(this);
pub const Package = struct {
    namespace: Namespace,
    sem_ver: SemVer,
};
pub const PackageUsage = struct {
    /// This is the list of packages that have declared an older,
    /// but API-compatible version number. So zig stopped analyzing
    /// these versions when it hit the @declarePackage.
    superseded: []SemVer,

    /// This is the list of packages that share a package id, but
    /// due to incompatible versions, will coexist with this version.
    coexisting: []Package,
};

@declarePackage(comptime package_id: [16]u8, comptime version: &const SemVer) PackageUsage

Packages would be free to omit a package declaration. In this case, multiple copies of the
package would always coexist, and zig package manager would be providing no more than
automatic downloading of a resource, verification of its checksum, and caching.

Multiple package declarations would be a compile error, as well as @declarePackage somewhere
other than the first Top Level Declaration in a Namespace.

Let us consider for a moment, that one programmer could use someone else's package id, and then
use a minor version greater than the existing one. Via indirect dependency, they could "hijack"
the other package because theirs would supersede it.

At first this may seem like a problem, but consider:

  • Most importantly, when a Zig programmer adds a dependency on a package and verifies the checksum,
    they are trusting that version of that package. So the hijacker has been approved.
  • If the hijacker provides a compatible API, they are intentionally trying to create a drop-in replacement
    of the package, which may be a reasonable thing to do in some cases. An open source maintainer
    can essentially take over maintenance of an abandoned project, without permission, and offer
    an upgrade path to downstream users as simple as changing their dependency URL (and checksum).
  • In practice, if they fail to provide a compatible API, runtime errors and compile time errors
    will point back to the hijacker's code, not the superseded code.

Really, I think this is a benefit of a decentralized approach.

Going back to the API of @declarePackage, here's an example of power this proposal gives you:

const encoding_table = blk: {
    const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";

    const package_info = @declarePackage(package_id, builtin.SemVer {
        .major = 2,
        .minor = 0,
        .revision = 0,
    });

    for (package_info.coexisting) |pkg| {
        if (pkg.sem_ver.major == 1) {
            break :blk pkg.namespace.FLAC_ENCODING_TABLE;
        }
    }

    break :blk @import("flac.zig").ENCODING_TABLE;
};

// ...

pub fn lookup(i: usize) u32 {
    return encoding_table[i];
}

Here, even though we have bumped the major version of this package from 1 to 2, we know that the FLAC ENCODING TABLE is unchanged, and perhaps it is 32 MB of data, so best to not duplicate it unnecessarily. Now even versions 1 and 2 which coexist, at least share this table.

You could also use this to do something such as:

if (package_info.coexisting.len != 0) {
    @compileError("this package does not support coexisting with other versions of itself");
}

And then users would be forced to upgrade some of their dependencies until they could all agree on a compatible version.

However for this particular use case it would be usually recommended to not do this, since there would be a general Zig command line option to make all coexisting libraries a compile error, for those who want a squeaky clean dependency chain. ReleaseSmall would probably turn this flag on by default.


As for your second question,

Are the packages downloaded independently for each project or cached on the local disk (like maven and Hunter). In the latter case, you have to consider the use of build flags and their effect on the shared build.

Package caching will happen like this:

  • Caching of the source download (e.g. this .tar.gz has this sha-256, so we can skip downloading it if we already have it)
  • The same binary caching strategy that zig uses for every file in every project

Caching is an important topic in the near future of zig, but it does not yet exist in any form. Rest assured that we will not get caching wrong. My goal is: 0 bugs filed in the lifetime of zig's existence where the cause was a false positive cache usage.

One more note I want to make:

In the example above I have:

exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708");

Note however that the "1.0.2" only tells Zig how to download from a git repository ("download the commit referenced by1.0.2"). The actual version you are depending on is the one that is set with @declarePackage in the code that matches the SHA-256.

So the package dependency can be satisfied by any semver-compatible version indirectly or directly depended on.

With that in mind, this decentralized strategy with @declarePackage even works if you do any of the following things:

  • use a git submodule for the package
    • you can use addDirPackage("name", "/path/to/dir", "a3951217c609a5a9c5a100e5f3c37a4e8b14796642138ee613db46daca7d43c7").
  • just copy+paste the package into your own source code
    • same thing, use addDirPackage

You can also force your dependency's dependency's dependency (and so on) to upgrade, simply by adding a direct dependency on the same package id with a minor or revision bump.

And to top it off you can purposefully inject code into your dependency's dependency's dependency (and so on), by:

  • forking or otherwise copy pasting the package in question
  • bumping the minor version
  • adding a direct dependency on your fork
  • do your code edits in the fork

This strategy could be used, for example, to add @optimizeFor(.Debug) in some tricky areas you're trying to troubleshoot in a third party library, or perhaps you found a bottleneck in a third party library and you want to add @optimizeFor(.ReleaseFast) to disable safety in the bottleneck. Or maybe you want to apply a patch while you're waiting for upstream to review and accept it, or a patch that will be coming out in the next version but isn't released yet.

Another note: this proposal does not actually depend on the self hosted compiler. There is nothing big blocking us from starting to implement it. It looks like:

  • Implementation of @declarePackage in the c++ compiler. At this point you could test package management with the CLI.
  • Add the API to zig build system, e.g. addDirPackage. At this point we have a working package manager that you could use with git submodule or copy+pasting the package you want to depend on into your codebase. (But read on - this can be further improved)
  • Networking in the standard library. Probably best to use coroutines and async I/O for this. Related: #910. I also need to document coroutines (#367)
  • Add support for .tar.gz .tar.xz, .zip, etc to standard library
  • Now we can implement addUrlPackage
  • Add support for downloading a particular revision via git / svn
  • Now we can implement addGitPackage / addSvnPackage

maybe worth considering p2p distribution and content addressing with ipfs?

see https://github.com/whyrusleeping/gx for example

just a thought

One important thing to note, especially for adoption by larger organization: think about a packaging format and a repo structure that is proxy/caching/mirroring friendly and that also allows an offline mode.

That way the organization can easily centralize their dependencies instead of having everyone going everywhere on the internet (a big no-no for places such as banks).

Play around a bit with Maven and Artifactory/Nexus if you haven't already 😉

The decentralized proposal I made above is especially friendly to p2p distribution, ipfs, offline modes, mirroring, and all that stuff. The sha-256 hash ensures that software is built according to expectations, and the matter of where to fetch the resources can be provided by any number of "plugins" for how to download something:

  • http (don't even need https because of the sha-256)
  • git
  • svn
  • torrent
  • ipfs
  • nfs
  • some custom thing, it doesn't matter, as long as the bytes can be delivered

Looks good but I'd have to try it out in practice before I can say for sure 😄

I'd have one suggestion: for naming purposes, maybe it would be a good idea to also have a "group" or "groupId" concept?

In many situations it's useful to see the umbrella organization from which the dependency comes. Made up Java examples:

  1. group: org.apache, name: httpclient.
  2. group: org.apache, name: regexutils.

Otherwise what happens is that people basically overload the name to include the group, everyone in their own way (apache-httpclient, regexutils-apache). Or they just don't include it and you end up with super generic names (httpclient).

It also prevents or minimizes "name squatting". I.e. the first comers get the best names and then they abandon them...

Structs provide the encapsulation you are looking for @costincaralvan. They seem to act as namespaces would in C++.

I agree with @costincaraivan. npm has scoped packages for example: https://docs.npmjs.com/getting-started/scoped-packages.

In addition to minimizing name squatting and its practical usefulness (being able to more easily depend on a package if it is coming from an established organization or a well-known developer), honoring the creators of a package besides their creation sounds more respectful in general, and may incentivize people to publish more of their stuff :).

On the other hand, generic package names also come in handy because there is one less thing to remember when installing them.

I didn't want to clutter the issue anymore but just today I bumped into something which is in my opinion relevant for the part I posted about groups (or scoped packages in NPM parlance):

http://bitprophet.org/blog/2012/06/07/on-vendorizing/

Look at their dilemma regarding the options, one of the solutions is forking the library:

Fork and release our own package on PyPI as e.g. fluidity-invoke.

  • This works, but has many the drawbacks of the vendorizing option and offers few of the benefits.
  • It also confuses things re: project ownership and who should receive/act on bug reports. Users new to the space might focus on your fork instead of upstream, forcing you to either handle their problems, or redirect them.

This would be easily solvable with another bit of metadata, the group. In Java world their issue would be solved by forking the library and then publishing it under the new group. Because of the group it's immediately obvious that the library was forked. Even easier to figure out in a repository browser of sorts since the original version would have presumably many versions while the fork will probably have 1 or 2.

importers provide the name of the package that they will use to import the package. It's ok to have everyone try to name their module httpclient. When you want to import the module, give it whatever identifier you want. There are no name collisions unless you do it to yourself.

Name squatting is not meaningful in a distributed package manager situation. There is no central registry of names. Even in an application, there's no central registry of names. Each package has its own registry of names that it has full control over.

The only collisions possible in this proposal are collisions on the package id, which is a large randomly generated number used to identify if one package dependency is an updated version of another. You can only get collisions on package id if someone deliberately does so.

A package manager cannot be detached from social issues. Yes, technically things would ideally be fully distributed, you would pull files from everywhere. But in real life, let's take the 3 most popular distributed protocols on the net:

  • email

  • Bittorrent

  • Git

All of them have a higher level that effectively centralizes them or at least makes some nodes in this decentralized stronger much, much "stronger" than the average node, thereby centralizing the system to a great degree.

Email: Gmail, Microsoft, Yahoo. Probably 80+% of public mail goes through a handful of email hosters.

Bittorrent: torrent trackers, see the outcry when The Pirate Bay went down.

Git: Github 😃 Gitlab, Bitbucket.

A package name tells me what the thing is. Generally it isn't unique, sometimes it's even non-descriptive (utils...). A hash is very precise, but far from human friendly. Any kind of other metadata I can get from the source is greatly appreciated.

What I'm saying is: make the package collection fully distributed but have provisions in the package format for centralization. It will happen anyway if the language becomes popular (Maven, npm.js, Pypi, etc.).

make the package collection fully distributed but have provisions in the package format for centralization.

That's already in the proposal.

I'll work on some more clear documentation on how packages will work in Zig, because there seems to be a lot of confusion and misunderstanding here.

I think it could be a good thing to also support digital signatures in addition to hashes.
For some software authors, Bob might trust Alice for some reason, but not have time to read every diff of Alice's package, and in this situation Bob may add a requirement that Alice's package must be signed with a key with specific fingerprint.

Hey @andrewrk , I just watched your localhost talk (and backed you, good luck!).
You centered the talk around a notion of making perfect software possible.
I agree with this sentiment.
However, relying on other's work in the way you propose (without additional constraints) leads away from that goal.
It seems you've focused primarily on the "how do we have a decentralized store of packages" part of packages,
and less on "what packages are", and what that means for creating stable software.

The assumption seems to be "semver and a competent maintainer will prevent incompatibilities from arising.".
I am asserting that this is incorrect. Even the most judicious projects break downstream software with upgrades.
These "upgrades" that fail are a source of fear and frustration for programmers and laymen alike.
(See also: the phenonmena of single file, no dep C libs, and other language's dependency free libraries and projects)

When you talk about a package:

exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson", "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");
You have decided that the identity of a package is:
    ID == (name: str, url: url, version: semver, id: package_id, sha: hash) + other metadata

As the consumer of a package, the identity of the package is relevant only to find the package.
When working with the package, what matters is only the public API it exposes.
For example:

API|1.0.0: {
    const CONST: 1234
    frobnicate: fn(usize) -> !usize  // throws IO
    unused: fn(u8): u8
}

Let's imagine my project only relies on frobnicate and CONST.
It follows that I only care about these two functions.
No other information about the the version, url, or name matters in the slightest.
Whether an upgrade is safe can be determined by observing the signatures of the things that I rely on.
(Ideally we'd rely not on the signatures, but on the "the exact same behavior given the same inputs", but solving the halting problem is out of scope.)

Some time later, the author of the package releases:

API|1.1.0: {
    const CONST: 1234
    frobnicate: fn(usize) -> !usize // throws IO // now 2x faster
    unused: fn(u8): u8
}
API|1.2.0: { // oops breaking minor version bump, but nobody used frobnicate.. right?
    const CONST: 1234
    unused: fn(u8): u8
}
API|1.3.0: { // added frobnicate back
    const CONST: 1234
    frobnicate: fn(usize) -> !usize // throws IO + BlackMagicError
    unused: fn(u8): u8
}

I cannot safely use API 1.2.0 or API 1.3.0
1.2.0 breaks the API contract with the omission of frobnicate
1.3.0 breaks the contract by adding an error that my project (maybe) doesn't know it needs to handle.

Your note here:

// these are the other packages that have matching package ids, but
// will additionally be compiled in because they do not have compatible
// APIs according to semver

implies that I can trust library author to not make mistakes when evaluating how their library upgrades will proceed in my project.
They cannot know that.
They should not have to know that.
It is the job of the compiler/package manager to understand the relationship between what a package provides and what a project requires.
API 1.2.0 and 1.3.0 might as well be completely alien packages from the perspective of frobnicate, the functions just happen to share a name.
However, if I only relied on CONST, all upgrades would have been safe.

What I am proposing is that package upgrading should be a deterministic proccess.
I should be able to ask the compiler: "Will this upgrade succeed without modification to my codebase".
I should also be able to ask the compiler: "What was incompatible" to be able to understand the impact of a breaking upgrade before biting the bullet.
The compiler needs to look at more than the pointer (id + metadata),
it must also look at the value of the package as determined by its API.
This is not the check that I want:

Author determined API 1.2.0 superceeds API 1.1.0:
    all OK using API 1.2.0

This is:

{CONST,frobnicate: fn(usize) -> !usize + throws IO} != {CONST,frobnicate: fn(usize) -> !usize + throws IO + BlackMagicError}
    API 1.2.0 returns a new unhandled error type BlackMagicError which results in (trace of problem), do you wish to proceed? (y/N)

TL;DR:

  • Humans can't be trusted to do semver (I'm not even sure semver should be a thing, it doesn't solve the problem it's supposed to solve)
  • A package's API makes it what it is.
  • Zig users should be able to upgrade without fear.

If you're proposing only checking the subset of compatibility that is knowable at comptime, then that sounds like #404. If you're proposing a general distrust of software changes, you can control all the dependencies that go into your application, and only upgrade packages when you choose.

@thejoshwolfe I'm basically proposing what Andrew mentioned on #404 an hour after you comment here.

for example when deciding to upgrade you could query the system to see what API changes - that you actually depend on - happened between your current version and the latest.

Major version bump enforcement just means that the API broke for somebody maybe. And it prevents a certain class of error, but crudely.
What's relevant to the consumer of the library is what changed for them.
And that is inexpressible as a version number, but could be part of a package management system.

Edit: I do mean that subset, and yes I do generally distrust software and people, however well intentioned. If rules are not enforced, they will be broken, and their brokenness will become an unassailable part of the system, barring serious effort. Rust seems to be doing an ok job at undoing mistakes without major breakage, but most other projects and languages don't. See also: the linux kernel's vow to never break userspace, with the attending consequences (positive and negative).
Edit2: weird double paste of Edit1? removed.

I think I've seen this discussion before (just joking, but it's a little bit similar) here :D

Given Zig goals of allowing programmers to write reliable software, I agree with @419928194516 's thoughts... I wrote a little bit about the version problem myself, though my own thoughts were and still are rather unpolished, to be honest... anyway, it seems a lot of good ideas coming from many different people and communities are converging... specially, the idea that a version number is really not a good way to handle software evolution (though it still makes sense from a pure "marketing" perspective). I would +1 a proposal to automatically handle version updates and have the compiler (or a compiler plugin?) check that automatically (like japicmp does for Java APIs). This, together with the hash checks, makes Zig capable to offer something quite unique: perfect software evolution ;)

commented

In case this has not been mentioned yet, I strongly recommend reading this blog series on a better dependency management algorithm for Go.

Related to @binary132's earlier post, one of the Go package manager developers posted on Medium about his advice for implementing a package manager: https://medium.com/@sdboyer/so-you-want-to-write-a-package-manager-4ae9c17d9527. Old article, but still has some interesting insights.

so sdboyer is actually as far as I followed the discussion the developer of dep (which is not the official go package manager) and if you look at some really long thread he disagrees with the now accepted vgo and minimum version selection from russ which now is becoming the official go version manager in go 1.11.

anyway its probably worth seeing both sdboyer and russ arguments https://sdboyer.io/blog/vgo-and-dep/ although I found sdboyers hard to follow at times.

Is there an idea of how package discovery would work with this decentralized model? One of the benefits of a centralized system is having a single source for searching packages, accessing docs, etc.

Is there an idea of how package discovery would work with this decentralized model? One of the benefits of a centralized system is having a single source for searching packages, accessing docs, etc.

I agree that this is the biggest downside of a decentralized system. It becomes a community effort to solve the problem of package discovery. But maybe it's not so bad. If there becomes a popular package discovery solution made by the community, people will use it, but won't be locked in to it.

I can imagine, for example, one such implementation where each package is audited by hand for security issues and quality assurance. So you know you're getting a certain level of quality if you search this third party index. At the same time, maybe there's another third party package repository that accepts anything, and so it's a good place to look for more obscure implementations of things.

And you could include dependencies from both at the same time in your zig project, no problem.

Anyway, I at least think it's worth exploring a decentralized model.

commented

I don't think a centralized model is a good idea. Imagine if C had implemented a centralized model in the 1970's or 1980's.

One suggestion, the compiler itself should be a package in the repository so that updating the language is as simple as zig update zig.
Haxe does this and I love their implementation.

New to the project, so forgive me if I am missing a ton of context.

tl;dr Instead of add{Git,Http[s]}Package, how about resolve(URI) since URI is reasonably flexible, and make it easy for people to register resolvers for URIs?


I have seen lots of differing requirements about source code management in general, and most of them completely valid but conflicting with one another.

Are hashes necessary to be part of the spec? What about exposing some convenience functions around uri resolvers, and have resolvers (or the caching layer) decide how to handle integrity?

For example with ipfs, hashes are part of the address, but for https, maybe someone will add subresource integrity like so: https+sri://example-repository.com/package/v1.4/package-debug.zig#dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708.

Or maybe someone will add package.lock-like functionality to their own https resolver.

Or maybe someone will register a stackoverflow:// resolver that just searched for the first .zig with the keywords provided in stackoverflow://how+to+quick+sort.

By providing good hooks into resolving URIs, and managing the artifacts, people can implement exactly the semantics they want for different projects, which they would otherwise be hacking around whatever decisions zig makes now.

There are downsides of course, but even if zig does coalesce around a single way of managing external code, building an api around uri resolution and file/code management will help with building that package manager.

Another idea I had, as an alternative to random ids for decentralized package management:

Packages could be signed, and contain the public key of who signed them. Then instead of the package having a random id, it has a name that the maintainer chooses. The public key acts as a namespace, so the name only has to be unique to their own public key.

Then there is no concept of "hijacking". Only the package author would be able to publish new versions to supersede old ones.

Third party sites can do code audits and keep track of the "trust" and "quality" of public keys. Zig could support a configuration text file which is a blacklist of known-to-be-compromised public keys, and a third party service could keep this text file updated. Auditing tools could display the dependency tree of a Zig project and break it down by author (pubkey) potentially augmented with data from third party sites. This data could be things like: which pub keys are verified by the third party service, which pub keys are known to have high quality code, which pub keys are known to have a large number of other packages depending on them.

Third party sites could support "log in" by users signing their login token with their private key and then they know you own that pub key, and you could edit metadata, verify your public key, etc.

When upgrading dependencies, you would notice when the public key changes, because that means the package changed authors. Better double check that package before upgrading. When a package changes authors, it would be considered by Zig to be an entirely different package. This would be equivalent to a major version bump - each package that depends on it would have to explicitly publish a new version depending on the new package with the different public key, or otherwise would continue to get the old version, even if another package in the dependency tree explicitly depended on the new version.

This also enables packages to have some metadata, which is verified to be believed by the author, because of the signature. Such metadata might be:

  • "this package is no longer used by the maintainer and it needs a new maintainer"
  • "this package is funded by donations and the URL is xyz"
  • "this package has a security flaw or otherwise needs to be redacted, and a bug fix version is not available" (this would be done with a bug fix version bump, so that dependency upgrading tools would fetch it and be alerted to the problem)

The package signature would take the place of the sha256 hash, and would be stored in the declarative lock data, whatever that ends up being. Once this is in place, package downloads could take place over insecure connections such as HTTP or FTP, because the signature would verify that the contents were not tampered with. Even if an author published a new package without bumping a version number, projects that depend on it would detect the problem when Zig notices multiple packages with the same name/version and different signatures.

I love that this idea requires almost zero support on Zigs side. Even the shipped blocklist of keys can just be part of whatever audit tool a third party creates.

However, the package signature taking place of the sha256 hash requires people to buy-in to self-identification through public keys. It would be great if I can just send someone a uri to something and them be able to import without depending on key-signing.

Any audit or linting tool can warn or forbid unsigned code as necessary, and a templating tool for new projects could encourage signature checking.

I think that's a good point.

Some use cases I want to support are:

  • Copying and pasting directories (or git submodules even though I'm annoyed about the usability of them) into one's source, and having that still understood via the package management tools (maybe this could simply be a file:// URL)
  • A compressed tarball downloaded through any file transfer mechanism, such as an https:// URL.
  • Publishing a package via a git tag, which could be fetched with a shallow clone.
  • Publishing a package via a subversion tag.
  • Any file transfer mechanism that gives you a directory of files.

It should be possible for a package to be bundled in all of these ways simultaneously, and have the same signature in all of them. So you could resolve, for example, a website giving a 404 for a project's dependency, by replacing the http URL with a git tag, and leaving the signature the same, and then everything is fixed. You could also provide multiple ways a package could be fetched, e.g. mirrors.

I believe I have realistic expectations of what people are willing to do in practice. For example, with this signing idea, it would only work if there was a single, clear, discoverable command, e.g. zig publish, that "just worked" including generating key pairs and putting the pubkey and signature in the correct places. Of course it would be configurable for those who read the docs and wanted more control over how it worked, but the defaults have to make it the laziest, easiest way to share software with each other.

Note that it will always be possible to give someone an URL and they can use it as a package with no formality - they could simply download the file(s) and then add a package dependency on the main file, either with --pkg-begin ... --pkg-end command line args, or with the Zig Build System. We already have support for this today. You can always bypass the package manager. Reasons to use the package manager are:

  • downloading directories of files, so that they don't have to be committed to source control.
  • detection of newly available packages, e.g. looking for new git tags, looking for new versions available at URLs.
  • declarative metadata to assist auditing and dependency management tools, to facilitate maintainability of a complex dependency tree - complexity existing in a computer science sense, but also existing in a social sense.
  • avoiding code bloat and increasing code reusability when Dependency A and Dependency B share a compatible version of Dependency C, as well as detection of incompatible versions of dependencies (which results in code bloat and potentially non-code-resuability)

The only language support for the package manager is a compiler builtin function that resolves the situation when one file should be superseded by another (see description above).

The other possibility that this facilitates is the idea of "trusted public keys" for which you would be willing to use precompiled binaries from. Then a third party service could pre-build packages for various combinations of targets and zig versions, and when using the package manager, if a match is found, you can save time in debug builds by using the pre-compiled artifacts. We're a long way away from something like this being practical, but it's important to consider what will and won't be possible.

commented

I want to deposit my long time issues with package managers that I feel are hugely important but always forgotten:

  1. I need to be able to easily switch from a binary-only dependency to debugging and changing that dependency's source code. It it such a pain in the ass to go from using a .jar to debugging that .jar's source code, modifying it, recompiling it. Same story with nuget. And when I say easily, I mean truly easily. I dont want to extract a .zip somewhere or copy it into my project folder. I don't want to have to do anything except press F5 in my IDE to debug and then I can step into the library code. (F5 of course automatically invoking the zig build, which in turn should be running the package manager)
    I don't even want to modify my build configuration. It should probably be the default to use source code level debugging for all dependencies anyway. As far as I know golang always uses source level dependencies.
    I never ever want to do any extra work in gdb, for example telling gdb which directories contain new sources it should look for. That's the job of the package manager to make all this automatic, I don't want to care about that.

  2. I do not like symlinking, it never works properly with networked drives, shared folders and numerous other issues like copy paste. Please don't consider doing that in the package manager, I rather have multiple libraries duplicated on my drive than to have them all symlinked somewhere.

  3. I do not believe that a package manager will ever have security and comfort without a social aspect to it, probably in the form of star based ratings. Andrew mentioned that already, there has to be some kind of trust established socially via ratings. Objectively if I didnt write the code and I didnt review it line by line with 100% confidence then I'm installing somebody's code blind and that requires trust. It's how all marketplaces (e.g amazon, ebay, app store, play store) work, through ratings and you hope that people would make a big fuss if a developer/seller is releasing broken, dangerous products.

For some interesting work being done on social trust, signing, code review, etc, check out https://github.com/dpc/crev

Seems to me this CppCon talk, from video game industry veteran Scott Wardle, is relevant to this discussion: https://www.youtube.com/watch?v=NlyDUQS8OcQ .

@BenoitJGirard in #855 (comment) you mentioned "Policing the dependency graph for large projects is one of the headaches we have at my day job". I'm very interested in hearing about use cases like this. Can you share more about your pain points?

That goes for everybody in this thread. I want to hear about everybody's real world use cases of other package managers, and in what ways the experience is positive or negative.

Everybody has an opinion on how package managers should work, and not everybody can get their way, but it is certainly my goal that everyone can have their use cases solved reasonably well.

Sure, here goes.

At my day job, we have a large code base (millions of lines of code), C++ and C#, all in one repository. The code is divided into packages ("projects", in Microsoft Visual Studio parlance) and the direct dependencies of a package are fairly clear.

It's the indirect (transitive) dependencies that are a pain.

Among our regular problems is someone changing code in package P without realizing that package is used, indirectly, in obscure and infrequently tested application A and breaking a key behavior in that app.

Another is a programmer adding a new dependency and indirectly breaking a poorly-written installer, which does not realize that now an extra library is needed for its application to work,

Compounding this problem, there is no friction to adding dependencies between packages. After all, code is in packages to be reused! So dependency graphs grow and do not shrink.

Not to go to far in imagining solutions, but if we kept track of the full, transitive list of dependencies of each application, we could have the build system emit a warning (or error!) when a new direct or indirect dependency is added to an application, or emit a warning (or error!) if, say, the commit message does not mention all applications affected has having been tested.

Two other pain points from the day job, about NuGet, the C# package manager from Microsoft; these are keeping us from fully embracing NuGet.

  1. It's possible to (mis)configure NuGet so it always downloads the "latest and greatest" referenced packages, and their dependencies, on each build. This leads to the horror of non-deterministic builds, where there is no guarantee that syncing the code to a given change will give you the same executables every time.
  2. If a blocking bug is found in a NuGet package, how do we fix it quickly? The repository for that package has to be found, branched or forked, cloned locally, the code hacked to point to this local copy while we debug, the code changed in the branch/fork, then somehow we must create a new NuGet package from that branch/fork and use that. We find it less troublesome to just have the code of every package directly in our main repository.

Brilliant. Thank you for explaining these use cases. I've thought about how to model these problems and although this comment does not contain my responses, here's the plan:

  • It's about 1 month until 0.4.0 release. We're not going to have a working package manager by then. I'm going to focus on other issues for the release.
  • Shortly after 0.4.0 is released I will make a blog post with a detailed Zig Package Manager proposal and specification. This will enumerate all the use cases, and explain how they fit in with the proposal. I will specifically mention @BenoitJGirard's use cases, and anyone else in this thread who takes the time to explain in detail real world use cases.
  • Next, start working on the issues blocking the package manager:
    • basic copy elision semantics (#287)
    • coroutine rewrite (#1363), possibly with this research issue (#1778)
    • event-based standard library networking code (#910)
    • package manager (this issue)

Getting through this dependency chain of issues will be my primary focus in 0.5.0.

commented

This has probably been touched on by someone else, but I find that in Go, I like having a copy of my dependencies, including versioning metadata, and in C/C++, I like using git submodules with forks of third-party dependencies in my account, so that I don't depend on the stability of the third-party interface.

In both cases, this a) avoids availability issues with remote sources, b) facilitates minimally-complicated redistribution of my source, c) in the case of Go, allows me to perform a single source clone without multiple Git operations. This is important to me because in nearly all cases currently I am forced to use either a very slow and spotty personal network connection, or a very slow and spotty network connection to my remote Git server, which is on the other side of the planet and likes to time out and freeze.

For this reason, I want to propose that Git not be the primary means of source redistribution, instead being used primarily for source versioning and development.

I think that the primary means of dependency distribution should be as single-file compressed archive downloads over HTTP -- or even some other means, such as BitTorrent! Even if internally they are implemented using Git or another DCVS, they should be distributed over a much more reliable and performant transport. Git is not a good protocol for distributing archives; it is a good protocol for manipulating and exchanging versioned source trees.

Sure, use Git for interacting with the primary repo as a developer, but go back to the old ways for distributing dependencies. Even support the distribution of dependencies as binaries with headers only.

On that note, I think the Xcode "Framework" concept is kind of neat. I don't like hiding details of things, but perhaps some idea of a library being packaged with metadata and related dependencies could make life nicer for most users.

You should also force all packages to be versioned, even if the version is "checksum of all its contents because the user never versioned it."

commented

My thoughts on Package Managers...

In addition to these, would be nice if the package name would not be used as an unique identifier (like in cargo/rust)

For *nix systems the zig package manager and tooling should respect the XDG base directory specification, and use whatever the corresponding locations are for Windows. Getting this right from the start will save people a lot of pain later.

// in main package
exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708");

// in a nested package
exe.addGitPackage("fancypantsjson", "https://bitbucket.org/mirrors-r-us/zig-fancypants.git",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

Because this is decentralized, the name "fancypantsjson" does not uniquely identify the package. It's just a name mapped to code so that you can do @import("fancypantsjson") inside the package that depends on it.

But we want to know if this situation occurs. Here's my proposal for how this will work:

comptime {
    // these are random bytes to uniquely identify this package
    // developers compute these once when they create a new package and then
    // never change it
    const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";

Could the random bytes not instead be a secret, and then the current version be signed? This will head off some typosquatting-adjacent attacks wherein the random bytes are used in a second source not actually controlled by the author, or a domain is hijacked and hosts a malicious payload (at least for users of the code that already have the signature).

It will require manually re-verifying on ownership-change, but I feel that's more of a feature than a bug.

I also think one of the biggest problems with node (and something that will soon be seen with cargo) is that less trustworthy (in code quality, safety, and malicous-ness) packages look very similar to highly trustworthy ones -- the cargo site has some metrics which help judge this, but they are not highly meaningful to beginners.

Some way of delegating some or all trust (or distrust) to the community (especially highly trusted members) would be nice for beginners who may not necessarily be trust their own judgement would be nice -- I have some thoughts, but do not wish to further clutter the comment unless they are considered relevant.

Could the random bytes not instead be a secret, and then the current version be signed? This will head off some typosquatting-adjacent attacks wherein the random bytes are used in a second source not actually controlled by the author, or a domain is hijacked and hosts a malicious payload (at least for users of the code that already have the signature).

If i understood andrew correctly, a specific package version has two things: The package_id that identifies the package and a cryptographic hash that verifies that the packages is unchanged.

This allows hosting the same package on multiple sources without having the problem of "is this source trustworthy?" because you can check the hash of the package. But this also allows the simple creation of forks when the original maintainer abandons the package, you can fork the package, change things and publish it together with a new hash. The package manager will still recognize the package as the same package even though the hash will differ. You have to change the hash in the reference though.

Some way of delegating some or all trust (or distrust) to the community (especially highly trusted members) would be nice for beginners who may not necessarily be trust their own judgement would be nice

I agree with this 1000%. One of the biggest problems with npm is that packages are all "peers" and maintained separately. This leads to an explosion of mostly unverifiable content and duplicate functionality — e.g. underscore vs lodash vs ramda vs ES6 native vs ES6 shims.

What if we had a concept of trustworthiness embedded into packages? Stealing from the Arch model, they have a split between "core", "extra", "community" in pacman as well as "aur" and finally "decentralized". We could somehow have groups that could be more trustworthy than others. My big question is can we improve trust and unify functionality while simultaneous keeping all packages reasonably decentralized?

The beauty of decentralization is that any third party can step up and become an entity that curates trusted packages. As far as I'm concerned, this task is (currently) out of scope for the Zig project itself. Zig's role will be to provide helpful tools that provide insight when choosing what packages to depend on, and when doing the chore of upgrading / maintaining. For example, the package manager could provide various ways to visualize a dependency tree for easy auditing of trust. It could provide a way to integrate with third party services that provide extra metadata, such as public keys of known authors.

In summary, Zig's package manager will be a glorified downloading tool, that provides analysis and details to help the human perform the social process of choosing what set of other people's code to rely on, but without any central appeal to authority.

Would it be in scope to provide a canonical mechanism by which to assist that social process? My thought would be to sign assertions like 'reviewed', 'problem', 'supports', 'trusts', and 'distrusts'. Ie. if andrewk 'reviewed' a version of a package, that gives you a data point that the code is not obviously malicious or egregiously low quality. 'trusts' could be transitive up to a limit. So if noone in your network 'trusts' or 'reviewed' a package, then installing it would require manually verifying that this is what you wanted to do. If even the author does not 'support' a package, then you know including it in production is probably a bad idea. If someone you 'trust' 'distrusts' a package and noone else trusts it it won't install, etc.

Assertions could be hosted anywhere (including in the packages themselves). As well as produced by bots (fuzzbot 'reviewed' , semverbot 'reviewed' ).

I think something (vaguely) like this is the minimum for avoiding a bunch of attacks that have been used on the pip and node ecosystems. It also decouples curation and hosting in a way I quite like.

I realise this could all be done externally, but I don't think it would catch on without the package manager nagging you about low trust sources.

@MasterQ32 I think that automatically recognizing a package as 'the same package' if it has no involvement from the same author(s) is a bug though. Recognizing it if it claims to be the same package and there is a signature by someone whose authorship claim you have verified (either manually or by having it signed by another author) is the minimum. As such I see the benefit of the random 'canonical name' but still think not having a signature is a really bad idea.

  1. It's not a language package manager, but I would like to reference https://theupdateframework.io/security/ Malware was found in published packages in the past, so security is important.
  2. I do not see https://github.com/mrfancypants/zig-fancypantsjson and https://bitbucket.org/mirrors-r-us/zig-fancypants.git as different packages. We should consider them alternative download sources of the same package. I am not sure about the condition. I mean are packages the same if signatures match, or do we need anything else?
  3. I really do not like the idea of dependency on any version control system. It will make life of Windows users pure pain. Downloading a zip/tar.gz file over HTTP is easy, fast, firewall friendly, may be encrypted and signed. Downloading from random git repositories will trigger enterprise InfoSec guys, if allowed at all.
  4. I think there should be one central repository and possibility to easily add alternative repositories on a project level. Like apt/dnf do. Just DEB/RPM publishing is pain and package publishing should not be pain. Otherwise any typos will lead to installing rogue packages. (https://www.bleepingcomputer.com/news/security/ten-malicious-libraries-found-on-pypi-python-package-index/)
    I mean this process must be intentionally two step, with first explicit expression of trust in the author and only after that reference a package.
  5. What are reasonable answers to leftpad story? What if an author wants to remove a package which is a dependency for another package? How exactly should everything break or how exactly should everything not break?
  6. What are specific closed-source requirements? I'd say "don't even need https because of the sha-256" will not apply anymore.
  7. What about binary packages? What about environment specific packages? Problems python wheels try to fix. What if I want to vendor curl on Windows, but reference system one on Linux?
  8. What kind of reports a developer needs? Tree of dependencies, directed graph of dependencies?
  9. I don't think "zig build" should download anything. That's pretty much "hidden control flow".

Hello,
As a passer-by who just stumbled upon Zig a few days ago, I'd like to add my 2-cents here.

TL;DR
Please, let the package manager be independent and optional. At least please provide first-class support for offline builds.

IMO, a package manager is out of scope for a programming language (Zig, or any other). Yes, they can be very helpful for encouraging adoption and sharing code, but I think they can perform that role just fine as separate, and optionally bundled tools.

Take Python's pip for example (my main language). I consider pip to be barely usable. It works well enough for pulling down packages into a virtualenv during development, but I sure am glad I don't have to rely on it. Pip's saving grace is that it is optional for Python development.

Take Rust's Cargo for example. Cargo is a great tool. It has been a positive binding force for the Rust community, it's easy to use, and it just works. But Cargo's strength is also it's weakness, because it's an all or nothing deal. Cargo is great ... when it works. And Rust is useless without Cargo. I gave up on offline builds for Rust, and resorted to bringing my offline work laptop home to update the Rust software I use.

@andrewrk
In summary, Zig's package manager will be a glorified downloading tool, that provides analysis and details to help the human perform the social process of choosing what set of other people's code to rely on, but without any central appeal to authority.

I'm glad this is your stance on it.
In my opinion, I see little reason for it to be more than merely a convenient wrapper around git, wget, and gpg.

Edit

Maybe my point wasn't clear. I am not against a package manager for Zig. Nor am I against it having it as an official Zig project. I just don't see why it should be directly integrated into the language/compiler.

How about adding SHA256 as a parameter somewhere to addUrlPackage so it's possible to migrate to a different hash function in the future?

Just as a note, though I really have no preferences on the matter, easy offline builds and decentralized code distribution are absolutely essential when it comes to the adoption of Zig as a systems language. It's mostly ideological, but this weakness is really slowing down the penetration of Rust into hardcore libre software (and a lot of systems stuff tends to be on that end of the spectrum).

To this end, even if private package repositories are supported from the get-go, a "main" one would likely emerge pretty quickly. Building any kind of software without it will become effectively impossible.

Go's approach, though somewhat more verbose, seems preferable in this regard, and is a smaller development burden. It should be easy enough to compose a searchable index for convenience. And working offline is as simple as fetching once, and vendoring from then on.

@FlyingWombat Agree with you on this.
Personally the solution is have a few paths to search dep, some are manage by zig' package manger while some are manager by user or distro. Then we can choose between use pkg manager only, use system only or use other as a fallback.
Distro definately do not like packaging rust, go, haskell and npm stuff as it is pita. On the other hand C/C++/sh without relying on the system package manger is a PITA in places like windows where there is no package manager.
A good way to think will have a dir relative to the zig binay marked as system while have another marked for the zig package man and a vendor dir for package specfic dir.

Way I see it, all the compiler needs is a way to resolve source imports in the filesystem. Then the package manager just does its work to get the dependencies, and simply reports the import paths to the build manager.
A few suggestions:

  • Add search paths to an env-var
  • Write file/dir locations to a manifest file "imports.txt", which is read by build.zig
  • Symlink the downloaded files to a "vendor" dir in the build dir

Fetching dependencies and building could be as simple as zig-pkg fetch && zig build (if that's too much typing, just make a shell alias 😛 )

https://github.com/zigtools/zpm - Inofficial Package Manager for Zig

Protection from circular dependencies should be implemented.

I second the notion that this maybe shouldn't specifically involve the compiler in any way. IMO just do a simple flat text file format (YAML/TOML/JSON) but build in all these features discussed above regarding decentralization, hashes, etc.

Packages could be called zags and the package manager binary could be zag. Then you could do things like zag install, zag update etc...

In terms of decentralization, I think git is always the best place to start. If you pair that with cryptographic hashes of the tar of the repo, you can get all the properties you want in terms of preventing force pushes or repo changes that alter the public version of a package without changing a version number.

Bonus points for automation of changes to the YAML/TOML/JSON config file, such as things like zag add dep package-name, zag update --save (writes to config file), zag update package1 package2.

Additional bonus points for a --offline mode. I envision it working by running zag export deps [file] on a machine with internet and then zag import deps [file] on the machine without internet. Then as long as you add --offline to all your zag commands it won't try to hit the internet and will instead use the local repository.

Having a hashless, fetch-latest mode ala golang will also be good for small projects. There are tradeoffs between the two, and I think making the tradeoff should delegated to the user, not the package manager or the compiler. The user knows their own context best.

No one has brought it up yet, but I think Deno's package handling could be a good source of inspiration, if we want a decentralized manager: https://deno.land/manual@v1.4.4/linking_to_external_code

From the above link to Deno's docs:

import { assertEquals } from "https://deno.land/std@0.73.0/testing/asserts.ts";

I don't think it's a good idea to bake URLs into Zig source code.

Deno is a JS/TS runtime -- a web technology. Allowing URLs in source imports makes sense for it.

Zig, on the other hand is a general purpose programming language, and needs 1st-class support for offline builds.

@FlyingWombat I am not saying it should mimic Deno, just use as a source of inspiration since they have solved similar problems.

I am not sure what you mean with offline builds. Deno is just a runtime for server side code, it has the exact same requirements for building offline as Zig. If you read the documentation you will see that all dependencies are cached the first time you run (build) your application.

... just use as a source of inspiration ...

This I take no issue with. But I felt that many would just look at Deno's import syntax as the primary feature (as I did); and I disagree with that syntax for Zig. How Deno implemented it's package management, yes could have some valuable insight. If you have a specific feature or implementation detail from Deno's package management that you would like to highlight, please mention it.

What I mean by 1st-class support for offline builds is this: it should be just as easy to build a Zig project on an isolated system as it is on a connected one.

It must be straightforward to recursively download all dependencies on a connected machine. And likewise to transfer and vendor them locally on the isolated machine -- which will be the one performing the build. This is one reason why I've been advocating to keep the package manager and compiler as separate entities. All the connected machine would need in order to collect build dependencies is a small static-linked executable.

Offline builds and URL-namespaced dependencies are not mutually exclusive, but it would require intelligent caching or a vendoring mechanism. The Go toolchain is an example of this.

I agree with everything in this comment: #943 (comment)

An anecdote: Part of my job involves maintaining a .Net project. We use NuGet for that project. One of our direct dependencies had a dependency on some specific version of a library. Another one of our direct dependencies had a dependency on some other version of the same library. The standard "solution" for this problem in the .Net ecosystem is a binding redirect, which (I believe) basically means choosing one version of the library, linking it to an assembly that expects a different version, and then praying that nothing goes wrong at runtime. Using binding redirects wasn't an option for us (it's complicated) so we had to refactor the project.

The rest of this comment is basically an attempt to apply the dependency inversion principle. The dependency inversion principle says that software components should depend on abstractions, not on concrete implementations.

In object-oriented programming, the dependency inversion principle gives us dependency injection: Any IO class that has dependencies should express those dependencies with some set of interface types in the constructor parameter list. Before anything else happens in the application, the concrete dependencies for a given IO class are instantiated, and then the IO class itself is instantiated. IO classes do not choose their own concrete dependencies. The type system for the programming language is able to validate this process because each IO class declares which interfaces it implements, and any invalid declaration will cause an error.

In package-management, I think, the dependency inversion principle gives us an analogous goal: Any library that has dependencies should express those dependencies with some set of abstract specifications in the package import list. Before anything else happens in the build process, the concrete dependencies for a given library are resolved, and then the library itself is resolved. Libraries do not choose their own concrete dependencies. The type system for the package manager is able to validate this process because each library declares which abstract specifications it implements, and any invalid declaration will cause an error.

Obviously you would need some way to express an "abstract specification" that covers everything important about the API. In a language like C, you would probably want to use header files. And header files would probably be good enough, if you don't need the minor version distinction that SemVer tries to encode. But if you do need the minor version distinction, then you probably need some concept of subtyping in the type system for the package manager, and I'm not sure how viable that is outside of an object-oriented language.

Some notes: I talk about "IO classes" because I don't really use dependency injection for datatypes or utility classes. This raises some questions, and I'm not sure what the answers are. I'm also not sure if this comment really makes sense, but I figured I might as well share my perspective.

I think zig can already do "dependency injection" for your use case:
Just provide a different file under the package identifier which needs to fulfil the same public API ("header file") and it would work, as long as the API is truly compatible. The package manager should be able to allow such overrides

Hi. I just learned about Zig a few days ago and have been learning about it for a bit. It looks like a really awesome language.

I'd like to voice support for decentralized Deno-style package management:

Pros:

  • decentralized
  • identifying imports by URLs allows for adding support for new protocols later, or dynamically via registering code to fetch packages with specific URL protocols at compile time (ie. "git", "ipfs", "ssb", "dat, "magnet", etc.)

Cons:

  • cannot enforce immutability - this can be addressed this stored checksum verification and an offline cache which can be committed to source control if desired
  • cannot enforce availability - this can be addressed with a proxy, either public or private, and offline cache. Note that this can also be an issue with centralized repos: See the unpublishing of left-pad from NPM.

Ultimately I think the arguments for / against decentralized package management come down to trust. With a centralized package manager, you trust its maintainers to keep packages immutable and mostly always available. With a decentralized approach, you trust your chosen package hosts and proxy, and you get the advantage of being able to choose who you trust to fill those roles.

Additionally, Deno optionally allows for import maps to let the user decide which alias they want to use in their code. Theoretically, I suppose this could allow the user to depend on two different versions of the same package under separate aliases ("my-package", "my-package-v2") which is a pretty cool plus.

I like the approach proposed in #943 (comment) as it seems to map closely to what I mentioned. But I'm curious: What's the disadvantage of putting the SHAs in a separate file like a lockfile? Maybe one could optionally omit those values in the addUrlPackage and in that case they will be automatically added to an optional lockfile or replaced in the build.zig inline? While working, I'd like to be able to specify a URL and have the language tooling calculate and store the SHA for me. Another small benefit of this would be the ability to specify URLs directly in import statements, without adding them to the build.zig file, which would make testing out new packages incredibly easy. Any thoughts?

Thanks for the amazing work!

commented

I agree with everything in this comment: #943 (comment)

great comment to point out. All current package managers are terrible in this regard, they let me upgrade packages and then later at build time I get a errors that technically the package manager could have known about. It knows which functions are exported. It knows which functions I use. It could tell me if an upgprade will let me build or not. I think I can guess though why nobody ever implemented this... it's a lot of work for a quality of life improvement that is perceived as tiny or irrelevant by most people who dont work in very large projects with interconnected dependent libraries.

Hey guys. Just wanted to mention that a nodejs package manager: Yarn 2 (codenamed berry) might be worth looking into.

It has a plugin-centric architecture, and there has already been experimentation for making it work with C/C++:
yarnpkg/berry#1697

Might be pretty easy to get an MVP ziglang package-manager quickly going, at least as an interim solution.

Honestly, an official, language-wide standard for managing dependencies would be a big win over C/C++ and their hodgepodge of warring factions.

I request that a section be devoted to development-time dependencies, such as linters and test frameworks, and an equivalent to bundle exec / npm bin for managing Zig utilties on a per-project basis.

@andrewchambers

  • Packages should be immutable in the package repository (so the NPM problem doesn't arise).

I would like to remind you about the deps node_modules hell that made by npm(yes there's some solutions to eliminate that hell however why do we need to roaming around of it instead pass it on beginning?) and the best solution is made by go & deno.

  • Packages should only depend on packages that are in the package repository (no GitHub urls like Go uses).

Same first note + how about my private libs ? should we pay for private repos ??!!!; i suggest use the deno & go solution to save disk-space eliminate dependencies folder such as node_modules and save data bandwidth for clients and the repository server (less traffic & high-availability).

Thoughts that might not be great for Zig:

  • Enforcing Semver is an interesting concept, and projects like Elm have done it.

Why there's something called Semver? isn't it to remove the deps compatibility hell ?; as a coder it is so easy to me to know when should i upgrade my lib or not only by reading the x.y.z (BreakingChanges.Features.PatchBugs) without read the change log at all.

Additional ideas :

  • something such as npm scripts to register & reuse the registered scripts (to stop rewrite long commands & args).
  • ability to show the deps graph.
  • ability to configure the cache folder for deps.
  • ability to remove dead deps (orphan libs which are not under usage anymore to keep the disk-space clean).
  • built-in audited deps which are inside the central repository / bot to scan zig files and detect vulnerabilities such as CodeQl.
  • built-in docs builder.
  • built-in unit tests.
  • built-in test coverage.

Thoughts that might be so horrible for Zig:

I think zig can already do "dependency injection" for your use case:
Just provide a different file under the package identifier which needs to fulfil the same public API ("header file") and it would work, as long as the API is truly compatible. The package manager should be able to allow such overrides

This doesn't work when you want one package to be able to access another one. Like if you have a "logging" package and "auth" package, auth package cannot access the logging package, because the addPackagePath resolution only applies to the root source file.

(I am using a workaround in one of my projects: instead of using addPackagePath in build.zig, have the main (root) file import specific implementations of each "package" and make them pub, and then having other files access them via @import("root"). But this doesn't work in tests.)

Hm; @CantrellD's comment, and @419928194516's comment it builds upon, reminded me somewhat of the approaches explored in:

As such, I'm wondering if it could make some sense to explore some similar API<->implementation decoupling in the package manager. Notably, if yes, I would imagine:

  • with static typing, maybe hopefully for some package X, the set of package interfaces it depends on could be automatically generated by some tool? maybe as part of a release process of the package? maybe those could then be published for easy browsing and reviewing?
  • similarly, maybe the exposed interface/API of a published package could also be extracted by an automatic tool to a similarly formatted spec? again, maybe it could also be published for easy browsing and reviewing?
  • then, tools could maybe exist that could e.g.:
    • allow easy checking if particular package's "provided interface" does or does not implement some "depended interface", and thus can be "linked" by end-user as dependency for some other package?
      • if not, it could possibly list the API differences making it impossible, so that I could decide to write a wrapper package or something;
    • allow easy checking what changes does some version of a package have in its API vs. a different version, thus also allowing for "detection of breaking changes in the API"? possibly issuing a warning to package's author when releasing a package that they did so? or just allow package author if they want to check compatibility with some API spec they explicitly want to align with (presumably typically "of old versions of their package", but this being just a loose presumption; maybe e.g. they want to also/only align to API of some thirdparty package)

Just some ideas that came to my mind after reading a huge chunk of the thread above (though I can't 100% guarantee if I managed to internalize all of it). I'm not even sure if that's at all doable, notwithstanding whether this should be done at all; but I'm interpreting that this is still mostly a brainstorm phase, so throwing in my brainstorm contribution. Cheers and good luck!

Running zig build on dependencies is desirable because it provides a package the ability to query the system, depend on installed system libraries, and potentially run the C/C++ compiler. This would allow us to create Zig package wrappers for C projects, such as ffmpeg. You would even potentially use this feature for a purely C project - a build tool that downloads and builds dependencies for you.

For managing C dependencies (and maybe Zig dependencies too, eventually?) it might be useful to be able to request the system-provided version of a given library. At least on Linux.

Let's say you have a library that uses Curl and OpenSSL and you want to link it against the system-provided versions of those libraries. It'd be nice if the Zig package manager could be configured to skip downloading them if a compatible version of the development library is already installed on your system in the traditional way e.g. dnf install openssl-devel libcurl-devel, and to make sure that those copies are used.

Of course, that can get tricky with the differences between families (Debian, Fedora, Arch) and within families (Debian/Ubuntu, Fedora/RHEL/CentOS). You might need to list the package names for a couple of different distros and have some way of constraining the acceptable versions. I can see how the cost/benefit could be questioned, it just strikes me as an interesting idea, especially since Zig is so nicely suited to being used with dynamic linking.

What are your thoughts on PNPM? It makes sure (among other things) that the same package (with same version) only has 1 copy of it in the system instead of the same package being used in multiple projects on the system in multiple node_modules folders.

While this is great for JS, I wonder if this is possible for a compiled language. (Apologies in advance if it has been mentioned before in this thread I couldn't go through all of it)

Go announced a security vulnerability today regarding malicious code execution when running go get: https://blog.golang.org/path-security

This is something to keep in mind for the package manager, and perhaps more broadly, the build system.

Yep that's one of the main reasons build scripts use a declarative API. Idea being that the package manager and build system could still run the build.zig script in a sandbox and collect a serialized description of how to build things. Packages which use the standard build API and do not try to run arbitrary code would be eligible for this kind of security, making it practical to make it a permissions flag you'd have to enable: to allow native code execution from a given dependency's build script.

Until we have such an advanced feature, however, it should be known that zig build is running arbitrary code from the dependency tree's build.zig scripts. And it is planned for build.zig logic to affect package management. However it is also planned that the set of possible dependencies will be purely declarative, so it will be practical to have a "fetch" step of package management that does not execute arbitrary code. However however, it is also planned for the package manager to support "plugin packages" that enable fetching content from new and exotic places, for example ipfs. For this feature to work, again, relies on execution of arbitrary code.

Ultimately I think what we will end up with is that some projects will want to execute arbitrary code as a part of the build script, but for most packages it will not be necessary, so that it can be no big deal to explicitly allow an "arbitrary code execution flag" on those dependencies that need it.

Idea being that the package manager and build system could still run the build.zig script in a sandbox and collect a serialized description of how to build things.

What if build.zig had to work completely at comptime? That way it'd already be sandboxed to whatever is possible at comptime.... (i.e. side-effect free)

Is it really neccessary to have the dependencies inside the build.zig? Wouldn't be better to have them in a separate dependencies.zig?

I would expect building a package to be an step unrelated to downloading the sources and its dependencies.

That is, similarly to what one does when using sources to generate binary packages in a distribution. For example in debian, after you have downloaded some debian-aware sources for a program, you use "dpkg-checkbuilddeps" to check if the dependencies are satisfied, and "apt-get build-dep" to request the download of the dependencies. And then you can start the build/test/fix/rebuild cycle (be it with make, fakeroot debian/rules binary, dpkg-buildpackage or debuild).

I say this because it would make me nervous that doing a compilation could download and update a dependency (which would be bad if there are issues I want replicate and debug, as the production code and my code would be using different versions of a dependency).

Note that I say this as an outsider that has recently discovered Zig and is evaluating doing some project with it, so I may be missing something or misunderstood the proposal.

commented

In many situations it's useful to see the umbrella organization from which the dependency comes. Made up Java examples:

  1. group: org.apache, name: httpclient.
  2. group: org.apache, name: regexutils.

Otherwise what happens is that people basically overload the name to include the group, everyone in their own way (apache-httpclient, regexutils-apache). Or they just don't include it and you end up with super generic names (httpclient).

It also prevents or minimizes "name squatting". I.e. the first comers get the best names and then they abandon them...

Requiring people who publish packages to own the domain they publish to is a basic security check that is worth it in my opinion. Say somebody publishes a zig package called "com.microsoft.uriparser".
Seen this suggestion here as Maven apparently is doing: https://www.reddit.com/r/programming/comments/lhu44g/researcher_hacks_over_35_tech_firms_by_creating/gn11fwj?utm_source=share&utm_medium=web2x&context=3

If you want inspiration, I have not dug in how it works internally but Dart Pub https://dart.dev/guides/packages is the best package manager I have work with, and I work daily with npm, go, maven, pods and some others. It just works seamesly, I'm all the time switching flutter and dart versions and pub reinstall all dependencies so quickly with zero troubles everytime I switch between flutter channels.

https://dart.dev/tools/pub/dependencies

@kidandcat would you elaborate on why Dart Pub your best package manager compares to others?

Maybe I'm just stupid, but wouldn't immutable packages make it really hard to push bug fixes and updates? Or would each version just be its own frozen entity?

I really recommend this talk by Rich Hickey (the author of Clojure) on dependency management https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/Spec_ulation.md. You'll learn that semantic versioning is no panacea.
Btw, Clojure has a tool called tools.deps which can resolves dependencies that use a git repository + a sha (see https://clojure.org/guides/deps_and_cli#_using_git_libraries).

Related to what @gphilipp wrote, Leiningen, the most popular build tool for Clojure, can be extended with plugins. One of these plugins takes care of versioning following the approach 1 git commit = 1 version
https://github.com/roomkey/lein-v

Lein-v uses git metadata to build a unique, reproducible and meaningful version for every commit. Along the way, it adds useful metadata to your project and artifacts (jar and war files) to tie them back to a specific commit. Consequently, it helps ensure that you never release an irreproduceable artifact.

A massive issue that we face with external libs however, are the fact that there's a rabbit-hole of dependencies that have to be added recursively. If you take 1 package that has only something like 2 dependencies. Each of those dependencies have further dependencies which have even more dependencies. As you might probably guess, this leads to a massive amount of dependencies being added to your project for importing 1 library.

We commonly see this issue when using rust crates. I would suggest that we make the user aware about the size and the implications of importing the library. It should be made known to the developer that they by importing the library, they are using x number of dependencies (not only the top level ones, all of them for the dependent packages too). This allows for developers to be more conscious. Additionally I highly recommend a metric that can be used to measure the negative impact or the size increase on the end-executable. I'm not exactly sure how this metric could be implemented but it's a suggestion for users who are conscious about their executable sizes and the cost of their imports.

I would suggest that we make the user aware about the size and the implications of importing the library

Would be cool if the compiler spits out a warning if your package's dependency tree becomes more than n levels deep where n is at the border between reasonable and egregious

Whilst I'm new to Zig and am learning it (so do take what I say with a grain of salt) - hopefully I can write some stuff down to add to this discussion that would be useful. I've skimmed all the comments on this issue - but apologies in advance if some of this stuff has already been covered.

So just to establish the basics - package managers can solve an important problem - have to track-down the dependencies of all the libraries (especially transitive ones) that you use & get them building easily. It's one of the biggest pain points in C/C++ and no one wants that.

However - they can introduce their own problems, which can be especially seen in the Rust/NodeJS ecosystems of "transitive micro-dependency hell" - which is where libraries are rarely standalone, and if you think you might be using something to solve a simple problem, you are in fact pulling loads of code (unawares) that is a massive surface vector for potential errors and issues that not only do you have little understanding of, but the library writers that you are grabbing directly might have little understanding of due to the large transitive network of packages. Whilst this is arguably a cultural issue as opposed to a technical one - the precedents set by technology can affect this.

It's an extreme version of what XKCD highlights. And can cause very real problems. It's worth pointing out that issues like this even effect the Rust compiler - where the self-hosted compiler itself now depends on some pretty deep transitive dependencies hierarchies iirc (it's something I've been told by a friend who contributes to the rust compiler told me - but I could be wrong on that). EDIT: just opening up a few of the Cargo.toml files in the repo for the rust compiler itself reveals dependencies on external dependencies. Even the driver which powers the update loop of the main compiler depends on a micro 200-line package that is still somehow not even version 1.0 yet.

It's also worth pointing out that there are fundamentally two different problems as well to deal with - package dependencies for libraries vs package dependencies for standalone projects (at end of the stream). With the former you need to deal with that fact that you don't know what context your library will be used-in, so having flexibility in your dependencies so you can deal with other instantiated libraries having mutual dependencies is needed. But in the latter case you do have full context, and being able to consistently and reliably build a standalone project is so important - I want to (at least as an option - but I think it should be the default, especially if package hosting is federated) be checking-in all of my packages to my repository by default. I want to know that even in 50 years, if I have the right version of the zig compiler and a copy of my source tree - that my project will build without needing to grab any dependencies from the internet. Sometimes you might even have different dependencies for different branches of a project (because often package upgrades will be implemented and tested in isolation from the main development branch) - and if you are switching back and forth a lot - triggering additional downloads from the internet each time is also no fun.

However, zig build could shout at me if I have an outdated version of a dependency that has recently received a minor update for example, and that could be an error by default to encourage the installation of security patches for example.

Another thing worth considering, is that especially in performance-sensitive contexts like games, library modification/customisation can be become a shipping necessity (unless a library has been really well designed. Therefore you might want to create custom forks of packages with your changes, but you might still want to know when the upstream package has been updated (so you know to update your fork).

As a library writer, I might want to make certain dependencies optional (especially if I'm designing it as a progressive api) - for example, by default if I'm writing some kind of graphical library, the default high-level API might require some kind of graphics back-end package as a dependency to allow it to be easily used and tested. But the lower-level API might allow the end-user to provide their own back-end, therefore allowing that dependency to removed. It's a minor thing, but thought it would be a use case worth mentioning.

The only thing that I can't stress enough is how important it is to allow (or even default to!) downstream end-users to checkin all of their dependencies into their version control. It's something I've had to write tools for in the past because otherwise you end-up in situations where servers can be offline and then someone ends up invalidating their cache somehow and all of a sudden they are unable to work until the dependency servers are back-up. I've been in situations where engineering effort has had to be expended on implementing features like this for package managers which didn't support this just because of how much of a problem that unreliability could become.

P.S. In fact, allowing arbitrary build artefacts to be put somewhere they can be checked-in to source control could be useful. eg. you might have a C++ library that you are building from source, but you know it is rarely going to change and each rebuild of it takes a good amount of time. Being able to declarative say "I want this compiled artefact to be checked-in" and have it put in a different folder could be useful - and not require the infrastructure of a shared networked build cache. I often do that anyway with third party libraries that I have built from source.

EDIT: Also something else that needs to work simply is cross compilation support. I imagine that it would be great to have an ecosystem of "wrapper packages" that simply wrap C/C++ projects but with everything setup so that they respect zig cross-compilation targets. In some cases this might have to involve linking against the correct pre-built binary libs.