Tilde (~) in path literal is illegal
aherrmann opened this issue · comments
Describe the bug
Path literals in the Nix language do not support ~
characters within the path (apart from the first character). However, such characters may occur in valid paths.
Steps To Reproduce
- Open
nix repl
- Enter
./foo~
- Observe
error: syntax error, unexpected invalid token, expecting end of file
.
Expected behavior
Nix accepts path literals containing ~
characters in other positions than the beginning as legal paths.
nix-env --version
output
nix (Nix) 2.11.1
Additional context
This is about ~
in locations other than the very beginning of the literal, where in stands for $HOME
, ~/foo
is accepted by Nix 2.11.1 and resolves to the same path as $HOME/foo
.
Possible workarounds thanks to @thufschmitt and @shlevy:
./${"foo~"}
./. + "/foo~"
Possibly relevant part of the code thanks to @layus.
The issue occurs in the Bazel extension rules_nixpkgs in the context of nixopts
location expansion when expanding labels to external workspaces in the new bzlmod dependency system. In that case the workspace directory is defined by the mangled name of the external workspace which includes ~
characters, e.g. external/nixpkgs_location_expansion_test_file~override/test_file
for @nixpkgs_location_expansion_test_file//:test_file
. The above workaround would have to be performed by users of rules_nixpkgs, meaning the issue is user-facing for rules_nixpkgs.
Priorities
Add 👍 to issues you find important.
This diff appears to allow this to work just fine without expanding the ~
to $HOME
:
diff --git a/src/libexpr/lexer.l b/src/libexpr/lexer.l
index 462b3b602..1d2421f8a 100644
--- a/src/libexpr/lexer.l
+++ b/src/libexpr/lexer.l
@@ -114,7 +114,7 @@ ANY .|\n
ID [a-zA-Z\_][a-zA-Z0-9\_\'\-]*
INT [0-9]+
FLOAT (([1-9][0-9]*\.[0-9]*)|(0?\.[0-9]+))([Ee][+-]?[0-9]+)?
-PATH_CHAR [a-zA-Z0-9\.\_\-\+]
+PATH_CHAR [a-zA-Z0-9\.\_\-\+~]
PATH {PATH_CHAR}*(\/{PATH_CHAR}+)+\/?
PATH_SEG {PATH_CHAR}*\/
HPATH \~(\/{PATH_CHAR}+)+\/?
vin@scadrial ~/w/v/nix (master)> nix repl
Welcome to Nix 2.14.0pre20230126_37b4a9e. Type :? for help.
nix-repl> ./foo~
error: syntax error, unexpected invalid token, expecting end of file
at «string»:1:6:
1| ./foo~
| ^
# vs
vin@scadrial ~/w/v/nix (master)> ./outputs/out/bin/nix repl
Welcome to Nix 2.14.0. Type :? for help.
nix-repl> ./foo~
/home/vin/workspace/vcs/nix/foo~
But that list still looks fairly restrictive (e.g. disallows @
, %
, etc). I wonder if this is intended / desired, that special characters require that stringification workaround?
One thing we have to be careful of when adding ~
to PATH_CHAR
is that ~/foo
doesn't change meanings. I think we have to reverse the order of each of these pairs of rules:
There are many characters, that are valid in actual paths but not in nix’ path-literals because it might collide with other syntax. We might want to allow the escape character \
in literal paths?
@aherrmann How are these path values passed to rules_nixpkgs
used in the end?
@infinisil On the command-line, e.g. as an --arg
, but how exactly is left to the user. These paths are exposed through a WORKSPACE scope implementation of Bazel's location expansion, you can see an example here, the corresponding Nix expression is here. In that particular case the workaround is the following patch ($$
escapes $
):
"external_file",
- "$(location @nixpkgs_location_expansion_test_file//:test_file)"
+ './$${"$(location @nixpkgs_location_expansion_test_file//:test_file)"}',
],
What's unfortunate is that this is not hidden in some implementation detail of rules_nixpkgs but exposed to its users.
@aherrmann I see, then I think an --argstr
should be used/recommended instead, which handles escaping correctly no matter the argument you pass, it always gets turned into a string. You can complain to users of --arg
by confirming that the checked value is a string using builtins.isString
.
There are two differences that might need to be fixed then:
- Relative path expressions automatically get transformed to absolute path values at parse time. This won't happen anymore with
--argstr
. E.g. whereas previously--arg foo ./.
would've setfoo = /some/path
, the new--argstr foo ./.
only setsfoo = "./."
. If you need an absolute path again you'll have to prepend the base you need to it, e.g."/some/path" + ("/" + foo)
. You won't need this if you only handle absolute paths though. - Paths get implicitly imported into the store, which won't happen anymore with a string. But you can still do it explicitly using
builtins.path
, assuming you have an absolute path as a string:This is needed herebuiltins.path { path = "/some/path"; }
@infinisil Thanks for looking into this!
- But you can still do it explicitly using
builtins.path
, assuming you have an absolute path as a string:
Bazel usually deals in relative paths and we try to avoid absolute paths to Bazel managed files because they are generally not reproducible: They'll typically have a prefix of the form /home/$USER/.cache/bazel/_bazel_$USER/$HASH/...
. The advantage of relative path literals is that they'll get turned into absolute paths into the Nix store, which are reproducible. If we expose non-reproducible absolute paths to user defined code, there's a high risk they'll find their way into generated files and will ultimately cause cache misses or other issues. Is there a way to avoid that risk with --argstr
?
Oh then that's great, because the current code using --arg
expands relative paths to absolute ones during Nix evaluation, potentially causing such problems. But with --argstr
no such expansion takes place (because you pass a string directly), so there's no risk using --argstr
@infinisil I'm not sure I understand. When I try to use the stringly relative path then I get errors of the form
error: string './tests/location_expansion/test_file' doesn't represent an absolute path
So, IIUC I have to convert it to an absolute path first somehow. With a path literal that happens automatically and turns it into an absolute path in the Nix store, which is reproducible. But, if I want to do the same with builtins.path
then I have to manually prepend the absolute prefix to the source file outside the Nix store, and that would not be reproducible. So, wouldn't the use of a path literal be safer?
Ohh I think you just need this actually (using --argstr
):
cp ${./. + ("/" + local_file)} $out/out/local_file
cp ${./. + ("/" + external_file)} $out/out/external_file
That's also what Nix would do underneath if local_file
/external_file
were relative path expressions, they get evaluated relative to the current Nix file's directory (./.
).
Note however that +
is a bit unsafe, I'd recommend the new lib.path.append
if that's already available for you (was merged very recently though):
cp ${lib.path.append ./. local_file} $out/out/local_file
cp ${lib.path.append ./. external_file} $out/out/external_file
This new function prevents appending absolute paths.
@infinisil Thank you for the pointer! lib.path.append
looks like a good addition.
Sorry to be going back and forth on this so much. I gave this a shot, --argstr
here and corresponding path_append
here.
One issue I encountered when passing --argstr
and then performing lib.path.append
in the .nix
file is that the path is now interpreted relative to the .nix
file instead of the workspace root. That means users now need to remember to prefix the right amount of ../
to climb back to the workspace root (e.g. path_append ../. argstr_local_file
). Using --arg
instead doesn't have that problem, since it's evaluated at the nix-build
invocation, which happens in the workspace root.
I keep having the impression that --arg
exposes the user to fewer foot-guns... (The ~
issue being the only one for the rules_nixpkgs
use-case.)
@aherrmann No problem! This is interesting detail about --arg
I wasn't aware of, and imo the best way to fix it would be this:
# --arg arg_base ./.
cp ${path_append arg_base argstr_local_file} $out/out/argstr_local_file
Note that --arg arg_base ./.
doesn't run into the same problem because it's always a correctly escaped expression. Alternatively (but then not using lib.path.append
since you don't have a path expression anymore):
# --argstr argstr_base "$PWD"
cp ${builtins.path { path = "${argstr_base}/${argstr_local_file}"; }} $out/out/argstr_local_file
But since these arguments are essentially user-facing, this does make it more annoying to call. So here's an alternative that doesn't need passing an extra base directory argument, at the expense of being an impure expression (which doesn't matter much since the base directory is impure in any case and you're not using pure evaluation which would forbid it):
cp ${builtins.path { path = "${builtins.getEnv "PWD"}/${argstr_local_file}"; }} $out/out/argstr_local_file
I keep having the impression that --arg exposes the user to fewer foot-guns... (The ~ issue being the only one for the rules_nixpkgs use-case.)
Yeah I agree that it seems that way, but extending the Nix grammar just to allow ~
in paths so this one use-case works is just a hack. I agree that ~
is a somewhat common character in paths, but in the end you're still passing a Nix expression using --arg
, which can still cause problems for a whole range of other paths, since only a small subset of characters are allowed. Here's some examples that break:
# Spaces
$ nix-instantiate --eval -E '{ x }: x' --arg x './foo bar'
error: undefined variable 'bar'
at «string»:1:7:
1| ./foo bar
| ^
# Dollar curly
$ nix-instantiate --eval -E '{ x }: x' --arg x './foo${}'
error: syntax error, unexpected '}'
at «string»:1:8:
1| ./foo${}
| ^
Passing a dynamic Nix expression to --arg
would only be safe if you make sure to escape it properly beforehand, but this is rather cumbersome to do:
$ escapeNixString() {
nix-instantiate --eval --json -E '{ str }: (import <nixpkgs/lib>).strings.escapeNixString str' \
--argstr str "$1" | jq . -r
}
$ evalPath() {
nix-instantiate --eval -E '{ x }: x' --arg x "./. + (\"/\" + $(escapeNixString "$1"))"
}
$ evalPath 'foo bar'
/home/tweagysil/foo bar
$ evalPath 'foo${}'
/home/tweagysil/foo${}
This is not a bug, because Nix intentionally restricts what characters are allowed in store paths. However, it is a valid feature request --- we could relax those rules.
Yeah I agree that it seems that way, but extending the Nix grammar just to allow
~
in paths so this one use-case works is just a hack. I agree that~
is a somewhat common character in paths, but in the end you're still passing a Nix expression using--arg
, which can still cause problems for a whole range of other paths, since only a small subset of characters are allowed.
Fair enough. Thank you for the discussion and suggestions on workarounds!
I've updated the rules_nixpkgs docs to include a recommendation on using --argstr
in conjunction with --arg workspace_root ./.
and path_append
. The $${"..."}
approach is still listed as well. In the common case it is the easiest option. Too exotic characters in paths are not common in Bazel projects, e.g. spaces in paths are generally avoided as they tend to cause problems with existing Bazel extensions. But, in principle, symbols are allowed in target names.
Triaged in the Nix team meeting:
- @fricklerhandwerk: this looks like a bug
- @tomberek: there was a related bug where
~
was expanded to$HOME
within paths, but this here is something else
- @tomberek: there was a related bug where
- @thufschmitt: it would be fine to accept it
- @edolstra: that would be ad hoc, and we forbid many other characters
- @fricklerhandwerk: that's what all practical shells do, too, it could be enough to be consistent with convention here
- to discuss
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-02-24-nix-team-meeting-minutes-35/25757/1
Discussed in the Nix team meeting 2022-02-27:
- @roberth: we have to take into account that
[~/a~/b]
evaluates to[ /home/user/a /home/user/b ]
- agreement: this is lower priority and has some potential for debate, but we're open for a pull request
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-02-27-nix-team-meeting-minutes-36/25890/1
Is there any workaround for prefetching URLs that contain tilde? Like https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst
@thomasfire this seems to be a different issue, because URLs are not file system paths, and should always be passed as a string. The tilde is merely not accepted as the symbolic part of the store path, which can be worked around by setting name
:
nix-repl> builtins.fetchurl "https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst"
error: store path '0nk02idx70nv1dqi5dliibddznyx8wd6-msys2-keyring-1~20240410-1-any.pkg.tar.zst' contains illegal character '~'
nix-repl> builtins.fetchurl { name = "foo"; url="https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst";}
"/nix/store/q2jzkl6xvya8znhg0rqvf8vrz5xlgm1m-foo"
Needs documentation at https://nixos.org/manual/nix/unstable/language/builtins#builtins-fetchurl
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/illegal-character-error-trying-to-use-fetchurl/47287/2
#10941 has improved the error message to become:
$ nix run nix -- repl
Nix 2.24.0pre20240625_ccb679e
Type :? for help.
nix-repl> builtins.fetchurl "https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst"
error:
… while calling the 'fetchurl' builtin
at «string»:1:1:
1| builtins.fetchurl "https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst"
| ^
error: invalid store path name when fetching URL 'https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst': name 'msys2-keyring-1~20240410-1-any.pkg.tar.zst' contains illegal character '~'. Please pass an attribute set with 'url' and 'name' attributes to 'fetchurl', so that it can create a valid store path.
However, the documentation still needs to be fixed.