Provide an API to extract fields from Command builder

Question

Provide an API to extract fields from Command builder

euclio opened this issue 7 years ago · comments

Andy Russell commented 7 years ago

This is now implemented on nightly via #77029.

Summary
The following accessors are available on Command behind the #![feature(command_access)] gate:

Unresolved issues
Some concerns I had with the implementation, but probably not important:

Values with NULs on Unix will be returned as "<string-with-nul>". I don't think it is practical to avoid this, since otherwise a whole separate copy of all the values would need to be kept in Command.
Does not handle arg0 on Unix. This can be awkward to support in get_args and is rarely used. I figure if someone really wants it, it can be added to CommandExt as a separate method.
Does not offer a way to detect env_clear. I'm uncertain if it would be useful for anyone.
Does not offer a way to get an environment variable by name (get_env). I figure this can be added later if anyone really wants it. I think the motivation for this is weak, though. Also, the API could be a little awkward (return a Option<Option<&OsStr>>?).
get_envs could skip "cleared" entries and just return &OsStr values instead of Option<&OsStr>. I'm on the fence here. My use case is to display a shell command, and I only intend it to be roughly equivalent to the actual execution, and I probably won't display None entries. I erred on the side of providing extra information, but I suspect many situations will just filter out the Nones.
Could implement more iterator stuff (like DoubleEndedIterator).

Original issue below

The the std::process::Command builder is useful for building up a Command in a cross-platform way. I've found that it would be useful to extract the name, args, environment, etc. of a Command they have been set.

There are at least two places in the Rust compiler that would benefit from such an API. Instead, the authors have had to resort to wrappers instead of using Command directly.

https://github.com/rust-lang/rust/blob/master/src/tools/compiletest/src/runtest.rs#L1527

https://github.com/rust-lang/rust/blob/master/src/librustc_trans/back/command.rs

lukaslueg · Answer 1 · Sat Sep 09 2017 20:55:31 GMT+0800 (China Standard Time)

Related to #42200

Thomas Wickham · Answer 2 · Wed Oct 04 2017 05:52:09 GMT+0800 (China Standard Time)

Let's say I am ok to implement this, how should I proceed ? Directly with a PR or with an RFC first ?

David Tolnay · Answer 3 · Sun Nov 19 2017 09:29:32 GMT+0800 (China Standard Time)

Seems reasonable. I would be interested in seeing this explored in a PR.

Eyal Kalderon · Answer 4 · Thu Nov 21 2019 22:40:58 GMT+0800 (China Standard Time)

I am working on a crate that would directly benefit from this. Is anyone still working on this feature? If not, would someone mind mentoring me to open a PR?

Eyal Kalderon · Answer 5 · Sun Nov 24 2019 21:32:57 GMT+0800 (China Standard Time)

Also, I have a tangential question not related to this issue, but might be raised as a result of introducing this feature: is there any way in the standard library to check whether a given std::process::Stdio is set to inherit, piped, or null? If we add the ability to extract these fields from the Command builder, I think it makes sense to add the ability to inspect the values at run-time as well.

Eric Huss · Answer 6 · Sun Oct 04 2020 02:34:36 GMT+0800 (China Standard Time)

This is now implemented and available on nightly, I have updated the original description with details.

Kevin Staunton-Lambert · Answer 7 · Tue Feb 16 2021 08:59:55 GMT+0800 (China Standard Time)

I feel this would be generally useful for debugging Command processes

Jade Lovelace · Answer 8 · Sat Jun 12 2021 17:55:56 GMT+0800 (China Standard Time)

@rust-lang/libs Can we stabilize this? It seems like the remaining concerns listed would not require API break to change.

Oli Scherer · Answer 9 · Sat Jun 12 2021 17:56:53 GMT+0800 (China Standard Time)

cc @rust-lang/libs

Jane Losare-Lusby · Answer 10 · Tue Jun 22 2021 06:04:28 GMT+0800 (China Standard Time)

@rfcbot merge

Rust RFC bot · Answer 11 · Tue Jun 22 2021 06:04:29 GMT+0800 (China Standard Time)

Team member @yaahc has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

~~arg0~~ resolved by #44434 (comment)
~~preprocessing~~ resolved by #44434 (comment)
~~use-case~~ resolved by #44434 (comment)

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Jane Losare-Lusby · Answer 12 · Fri Jul 23 2021 07:17:12 GMT+0800 (China Standard Time)

Checking off sfacklers checkbox on the FCP since he left the libs team.

Josh Triplett · Answer 13 · Thu Jul 29 2021 03:24:42 GMT+0800 (China Standard Time)

@rfcbot concern arg0

I'm concerned about the potential for confusion introduced by indexing args without account for arg0.

Josh Triplett · Answer 14 · Thu Jul 29 2021 03:28:45 GMT+0800 (China Standard Time)

@rfcbot concern preprocessing

As discussed in today's @rust-lang/libs-api meeting, we're concerned about any platform-specific preprocessing that Command may do internally, and whether the data provided back via these APIs is preprocessed in some platform-specific way or needs to have that preprocessing reversed.

Josh Triplett · Answer 15 · Thu Jul 29 2021 03:33:52 GMT+0800 (China Standard Time)

@rfcbot concern use-case

Based on discussion from today's @rust-lang/libs-api meeting:

Could we get some additional information about the use case of this? We understand the use case for setting these fields (e.g. "Change command to different path and run again", or "change one argument and run again", or "change one environment variable and run again"). Methods like spawn don't consume the Command, so it would be possible to use one Command to do several similar command invocations.

It's not as clear to us what the use case for getting these fields is, though, and a "get" API seems more complex (not least of which with all the unresolved questions), compared to a "set" API.

Jade Lovelace · Answer 16 · Thu Jul 29 2021 03:53:25 GMT+0800 (China Standard Time)

Regarding use case, the place where I initially wanted it was in rust analyzer where we were invoking cargo with a programmatically generated command line and I wanted to log invocations without maintaining a second copy of the invocation in parallel. I think we could use the Debug impl for this though as it only needs to be human readable.

Ralf Jung · Answer 17 · Thu Jul 29 2021 03:57:52 GMT+0800 (China Standard Time)

I needed an API like this (specifically for environment variables) at least once when writing a patch for rustc bootstrap.

Rustc bootstrap, unsurprisingly, creates rather complicated Commands involving a lot of careful setting of environment variables. For example, it needs to add various folders to LD_LIBRARY_PATH. Various abstraction layers might have to each add their own folders there. The problem is, given a Command, there is no way to extend the LD_LIBRARY_PATH that was set here so far. The only thing you can do is overwrite it. This leads to a much more fragile API than I would like. See #85959 for the concrete case where that came up.

The other possible usecase (that however probably has other solutions) is debugging: {:?} on a Command will print the command called and the arguments passed, but not the env vars being set. If you want to also show the env vars, you need to separately track them yourself.

The issue description also contains links to 2 places inside rustc that seem to benefit from these getters.

Eric Huss · Answer 18 · Thu Jul 29 2021 04:09:59 GMT+0800 (China Standard Time)

Just to reiterate the above, the use case is usually to display the command to the user (like for verbose output or logging). Rustbuild currently uses the debug display (stream_cargo) which is unsatisfactory for several reasons (doesn't include environment variables, may not be copy-pasteable, etc.). That is one place I wanted to use this.

A real-world example is Cargo which has its own wrapper around Command called ProcessBuilder to work around the limitations of the Command API. It has a Display impl which uses get methods to format in a quasi-shell syntax that can be used in most shells. I wouldn't want Command to do that directly, but adding get methods provides the ability to do such a thing more easily without a wrapper.

I'm personally not too concerned about preprocessing. I'm not aware of any of the impls that currently do anything like that. There is some concern about additional memory usage if it needs to keep multiple copies, but I don't consider that a high concern.

As noted in the original PR, if someone wants to add support for fetching arg0, that can be added to CommandExt. Custom arg0 usage is pretty rare, and doesn't fit well within the cross-platform model. If needed, I think the documentation could be extended to mention that arg0 is not included on unix platforms.

Josh Triplett · Answer 19 · Thu Jul 29 2021 04:23:46 GMT+0800 (China Standard Time)

@RalfJung Doing a read-modify-write of LD_LIBRARY_PATH doesn't seem ideal, especially if it needs to be repeatedly re-parsed. I agree that that shouldn't be open-coded, which seems like the issue with #85959. But I feel like that use case would work just as well by having a wrapper on Command that separately tracks a Vec<PathBuf>, allows adding to that Vec, and overwrites the environment variable. That would factor out the modification code into one single place.

Also, I think it'd be entirely reasonable for the Debug impl on Command to print environment variables and anything else that has been set.

Eric Huss · Answer 20 · Thu Jul 29 2021 04:43:34 GMT+0800 (China Standard Time)

I think it'd be entirely reasonable for the Debug impl on Command to print environment variables and anything else that has been set.

I tested that out with bootstrap for a while, and the experience was pretty terrible. The amount of verbosity was overwhelming, and the tool usually wants more fine-grained control over what is exactly displayed (like no -v, -v, -vv). I personally would not like to see the Debug impl change until a long while after something like these getter methods are stabilized to give authors a chance to implement a better display. There is a separate issue #42200 for changing the Debug impl.

Izzy Muerte · Answer 21 · Thu Jul 29 2021 04:43:43 GMT+0800 (China Standard Time)

Hello! I had some use cases for this. In some instances its to "fix" calling older versions of a specific tool (In this case, cmake). Newer versions of CMake support passing a JSON file as a "preset" of settings, and an external tool can technically take these JSON files and turn them into specific settings for configuration, while users are unaware of the underlying settings.

There's also the ability to receive a Command from any other API and parse specific values out via something like Clap. Basically a serde-process if you will. This would allow me to, for example, get the cc::Tool::to_command call and use that to extract the compiler itself for setting -DCMAKE_CXX_COMPILER or -DCMAKE_C_COMPILER, etc, while also being able to extract the additional flags users have set to set the CMAKE_CXX_FLAGS_INIT or other values.

Additionally, with these options I can give users the option to pass a Command to my APIs instead of requiring that a manual Path, string, or some other value is passed and then requiring that I find it with the which crate or some other mechanism.

Lastly, right now I can deserialize information from, say, JSON into a Command (the tests array), but I cannot serialize said information back. This extraction API would make it much easier as I would not have to wrap all of Command to get this behavior.

Josh Triplett · Answer 22 · Thu Jul 29 2021 04:55:21 GMT+0800 (China Standard Time)

On Wed, Jul 28, 2021 at 01:43:46PM -0700, Eric Huss wrote: > I think it'd be entirely reasonable for the Debug impl on Command to print environment variables and anything else that has been set. I tested that out with bootstrap for a while, and the experience was pretty terrible. The amount of verbosity was overwhelming, and the tool usually wants more fine-grained control over what is exactly displayed (like no `-v`, `-v`, `-vv`). I personally would not like to see the Debug impl change until a long while after something like these getter methods are stabilized to give authors a chance to implement a better display. There is a separate issue #42200 for changing the Debug impl.

That sounds completely fair; thanks for clarifying!

Ralf Jung · Answer 23 · Thu Jul 29 2021 14:58:41 GMT+0800 (China Standard Time)

Doing a read-modify-write of LD_LIBRARY_PATH doesn't seem ideal, especially if it needs to be repeatedly re-parsed. I agree that that shouldn't be open-coded, which seems like the issue with #85959. But I feel like that use case would work just as well by having a wrapper on Command that separately tracks a Vec, allows adding to that Vec, and overwrites the environment variable. That would factor out the modification code into one single place.

No (re-)parsing is necessary; adding a path to LD_LIBRARY_PATH is as trivial as prepending <new_path>: to the existing path (possibly with a special case for when LD_LIBRARY_PATH was not set at all yet).

I don't think one should have to write a wrapper for this. This kind of usecase should be supported out-of-the-box.

I tested that out with bootstrap for a while, and the experience was pretty terrible. The amount of verbosity was overwhelming, and the tool usually wants more fine-grained control over what is exactly displayed (like no -v, -v, -vv). I personally would not like to see the Debug impl change until a long while after something like these getter methods are stabilized to give authors a chance to implement a better display. There is a separate issue #42200 for changing the Debug impl.

We can provide at least 2 levels of verbosity with {:?} and {:#?}... but yeah that is still not terribly flexible.

Josh Triplett · Answer 24 · Thu Aug 05 2021 03:02:11 GMT+0800 (China Standard Time)

@rfcbot resolve use-case

Josh Triplett · Answer 25 · Thu Aug 05 2021 03:02:34 GMT+0800 (China Standard Time)

@rfcbot resolve preprocessing

Chris Denton · Answer 26 · Thu Aug 05 2021 03:10:22 GMT+0800 (China Standard Time)

Am I right in thinking that Command was originally designed as being, essentially, a function call with optional arguments (or at least the Rust equivalent)? So its use as a container type is basically being retrofitted?

Josh Triplett · Answer 27 · Thu Aug 05 2021 03:18:47 GMT+0800 (China Standard Time)

I've resolved two of the three concerns.

I also want to confirm, here: the only plan is to support get_args which iterates over the arguments, but not to have something like get_arg(index: usize)? Is there a planned API for changing a specific argument by index?

As long as there's no API that accepts a numeric "index", I'll resolve the arg0 concern as well. I just want to make sure that anything accepting an "index" uses 0 to mean arg0, and 1 to mean the first command-line argument.

Amanieu d'Antras · Answer 28 · Thu Aug 05 2021 05:12:04 GMT+0800 (China Standard Time)

My understanding is that we don't want to support any of these APIs because that is not the point of Command. As @ChrisDenton said, it's a function call with optional arguments, not long-term data storage.

There is some desire to support an extended Debug impl that also prints environment variables for logging purposes, but I feel that this should be a separate issue.

Ralf Jung · Answer 29 · Fri Aug 06 2021 21:15:22 GMT+0800 (China Standard Time)

As @ChrisDenton said, it's a function call with optional arguments, not long-term data storage.

In building that function call incrementally (which is a common usecase for builder patterns), some amount of read access is required, as I laid down above. This has nothing to do with using Command as long-term storage.

Josh Triplett · Answer 30 · Thu Aug 19 2021 03:18:16 GMT+0800 (China Standard Time)

@rfcbot resolve arg0

Rust RFC bot · Answer 31 · Thu Aug 19 2021 03:18:19 GMT+0800 (China Standard Time)

🔔 This is now entering its final comment period, as per the review above. 🔔

Rust RFC bot · Answer 32 · Sun Aug 29 2021 03:19:03 GMT+0800 (China Standard Time)

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

Ben Boeckel · Answer 33 · Fri Sep 03 2021 03:06:59 GMT+0800 (China Standard Time)

Does not offer a way to get an environment variable by name (get_env). I figure this can be added later if anyone really wants it. I think the motivation for this is weak, though. Also, the API could be a little awkward (return a Option<Option<&OsStr>>?).

Not that it affects this PR, but something like this would be fine I think:

struct ProcessEnvSetting<'a> {
  Inherit,
  Unset,
  Set(&'a OsStr),
}

impl<'a> ProcessEnvSetting<'a> {
  // Convenience method.
  fn value_in_child(&self) -> Option<Cow<'a, OsStr>> {
    // query current environment if `Inherit`
  }
}

env_clear could then set an internal flag to return Unset rather than Inherit for any variable not explicitly set.

Yuki Okushi · Answer 34 · Wed Jul 20 2022 05:24:09 GMT+0800 (China Standard Time)

Triage: The feature has been stabilized by #88436, closing as complete.