rust-lang / rust

Empowering everyone to build reliable and efficient software.

Home Page:https://www.rust-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide an API to extract fields from Command builder

euclio opened this issue · comments

This is now implemented on nightly via #77029.

Summary
The following accessors are available on Command behind the #![feature(command_access)] gate:

Unresolved issues
Some concerns I had with the implementation, but probably not important:

  • Values with NULs on Unix will be returned as "<string-with-nul>". I don't think it is practical to avoid this, since otherwise a whole separate copy of all the values would need to be kept in Command.
  • Does not handle arg0 on Unix. This can be awkward to support in get_args and is rarely used. I figure if someone really wants it, it can be added to CommandExt as a separate method.
  • Does not offer a way to detect env_clear. I'm uncertain if it would be useful for anyone.
  • Does not offer a way to get an environment variable by name (get_env). I figure this can be added later if anyone really wants it. I think the motivation for this is weak, though. Also, the API could be a little awkward (return a Option<Option<&OsStr>>?).
  • get_envs could skip "cleared" entries and just return &OsStr values instead of Option<&OsStr>. I'm on the fence here. My use case is to display a shell command, and I only intend it to be roughly equivalent to the actual execution, and I probably won't display None entries. I erred on the side of providing extra information, but I suspect many situations will just filter out the Nones.
  • Could implement more iterator stuff (like DoubleEndedIterator).

Original issue below

The the std::process::Command builder is useful for building up a Command in a cross-platform way. I've found that it would be useful to extract the name, args, environment, etc. of a Command they have been set.

There are at least two places in the Rust compiler that would benefit from such an API. Instead, the authors have had to resort to wrappers instead of using Command directly.

https://github.com/rust-lang/rust/blob/master/src/tools/compiletest/src/runtest.rs#L1527

https://github.com/rust-lang/rust/blob/master/src/librustc_trans/back/command.rs

Related to #42200

Let's say I am ok to implement this, how should I proceed ? Directly with a PR or with an RFC first ?

Seems reasonable. I would be interested in seeing this explored in a PR.

I am working on a crate that would directly benefit from this. Is anyone still working on this feature? If not, would someone mind mentoring me to open a PR?

Also, I have a tangential question not related to this issue, but might be raised as a result of introducing this feature: is there any way in the standard library to check whether a given std::process::Stdio is set to inherit, piped, or null? If we add the ability to extract these fields from the Command builder, I think it makes sense to add the ability to inspect the values at run-time as well.

This is now implemented and available on nightly, I have updated the original description with details.

I feel this would be generally useful for debugging Command processes

@rust-lang/libs Can we stabilize this? It seems like the remaining concerns listed would not require API break to change.

cc @rust-lang/libs

Team member @yaahc has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Checking off sfacklers checkbox on the FCP since he left the libs team.

@rfcbot concern arg0

I'm concerned about the potential for confusion introduced by indexing args without account for arg0.

@rfcbot concern preprocessing

As discussed in today's @rust-lang/libs-api meeting, we're concerned about any platform-specific preprocessing that Command may do internally, and whether the data provided back via these APIs is preprocessed in some platform-specific way or needs to have that preprocessing reversed.

@rfcbot concern use-case

Based on discussion from today's @rust-lang/libs-api meeting:

Could we get some additional information about the use case of this? We understand the use case for setting these fields (e.g. "Change command to different path and run again", or "change one argument and run again", or "change one environment variable and run again"). Methods like spawn don't consume the Command, so it would be possible to use one Command to do several similar command invocations.

It's not as clear to us what the use case for getting these fields is, though, and a "get" API seems more complex (not least of which with all the unresolved questions), compared to a "set" API.

Regarding use case, the place where I initially wanted it was in rust analyzer where we were invoking cargo with a programmatically generated command line and I wanted to log invocations without maintaining a second copy of the invocation in parallel. I think we could use the Debug impl for this though as it only needs to be human readable.

I needed an API like this (specifically for environment variables) at least once when writing a patch for rustc bootstrap.

Rustc bootstrap, unsurprisingly, creates rather complicated Commands involving a lot of careful setting of environment variables. For example, it needs to add various folders to LD_LIBRARY_PATH. Various abstraction layers might have to each add their own folders there. The problem is, given a Command, there is no way to extend the LD_LIBRARY_PATH that was set here so far. The only thing you can do is overwrite it. This leads to a much more fragile API than I would like. See #85959 for the concrete case where that came up.

The other possible usecase (that however probably has other solutions) is debugging: {:?} on a Command will print the command called and the arguments passed, but not the env vars being set. If you want to also show the env vars, you need to separately track them yourself.

The issue description also contains links to 2 places inside rustc that seem to benefit from these getters.

Just to reiterate the above, the use case is usually to display the command to the user (like for verbose output or logging). Rustbuild currently uses the debug display (stream_cargo) which is unsatisfactory for several reasons (doesn't include environment variables, may not be copy-pasteable, etc.). That is one place I wanted to use this.

A real-world example is Cargo which has its own wrapper around Command called ProcessBuilder to work around the limitations of the Command API. It has a Display impl which uses get methods to format in a quasi-shell syntax that can be used in most shells. I wouldn't want Command to do that directly, but adding get methods provides the ability to do such a thing more easily without a wrapper.

I'm personally not too concerned about preprocessing. I'm not aware of any of the impls that currently do anything like that. There is some concern about additional memory usage if it needs to keep multiple copies, but I don't consider that a high concern.

As noted in the original PR, if someone wants to add support for fetching arg0, that can be added to CommandExt. Custom arg0 usage is pretty rare, and doesn't fit well within the cross-platform model. If needed, I think the documentation could be extended to mention that arg0 is not included on unix platforms.

@RalfJung Doing a read-modify-write of LD_LIBRARY_PATH doesn't seem ideal, especially if it needs to be repeatedly re-parsed. I agree that that shouldn't be open-coded, which seems like the issue with #85959. But I feel like that use case would work just as well by having a wrapper on Command that separately tracks a Vec<PathBuf>, allows adding to that Vec, and overwrites the environment variable. That would factor out the modification code into one single place.

Also, I think it'd be entirely reasonable for the Debug impl on Command to print environment variables and anything else that has been set.

I think it'd be entirely reasonable for the Debug impl on Command to print environment variables and anything else that has been set.

I tested that out with bootstrap for a while, and the experience was pretty terrible. The amount of verbosity was overwhelming, and the tool usually wants more fine-grained control over what is exactly displayed (like no -v, -v, -vv). I personally would not like to see the Debug impl change until a long while after something like these getter methods are stabilized to give authors a chance to implement a better display. There is a separate issue #42200 for changing the Debug impl.

Hello! I had some use cases for this. In some instances its to "fix" calling older versions of a specific tool (In this case, cmake). Newer versions of CMake support passing a JSON file as a "preset" of settings, and an external tool can technically take these JSON files and turn them into specific settings for configuration, while users are unaware of the underlying settings.

There's also the ability to receive a Command from any other API and parse specific values out via something like Clap. Basically a serde-process if you will. This would allow me to, for example, get the cc::Tool::to_command call and use that to extract the compiler itself for setting -DCMAKE_CXX_COMPILER or -DCMAKE_C_COMPILER, etc, while also being able to extract the additional flags users have set to set the CMAKE_CXX_FLAGS_INIT or other values.

Additionally, with these options I can give users the option to pass a Command to my APIs instead of requiring that a manual Path, string, or some other value is passed and then requiring that I find it with the which crate or some other mechanism.

Lastly, right now I can deserialize information from, say, JSON into a Command (the tests array), but I cannot serialize said information back. This extraction API would make it much easier as I would not have to wrap all of Command to get this behavior.

Doing a read-modify-write of LD_LIBRARY_PATH doesn't seem ideal, especially if it needs to be repeatedly re-parsed. I agree that that shouldn't be open-coded, which seems like the issue with #85959. But I feel like that use case would work just as well by having a wrapper on Command that separately tracks a Vec, allows adding to that Vec, and overwrites the environment variable. That would factor out the modification code into one single place.

No (re-)parsing is necessary; adding a path to LD_LIBRARY_PATH is as trivial as prepending <new_path>: to the existing path (possibly with a special case for when LD_LIBRARY_PATH was not set at all yet).

I don't think one should have to write a wrapper for this. This kind of usecase should be supported out-of-the-box.

I tested that out with bootstrap for a while, and the experience was pretty terrible. The amount of verbosity was overwhelming, and the tool usually wants more fine-grained control over what is exactly displayed (like no -v, -v, -vv). I personally would not like to see the Debug impl change until a long while after something like these getter methods are stabilized to give authors a chance to implement a better display. There is a separate issue #42200 for changing the Debug impl.

We can provide at least 2 levels of verbosity with {:?} and {:#?}... but yeah that is still not terribly flexible.

@rfcbot resolve use-case

@rfcbot resolve preprocessing

Am I right in thinking that Command was originally designed as being, essentially, a function call with optional arguments (or at least the Rust equivalent)? So its use as a container type is basically being retrofitted?

I've resolved two of the three concerns.

I also want to confirm, here: the only plan is to support get_args which iterates over the arguments, but not to have something like get_arg(index: usize)? Is there a planned API for changing a specific argument by index?

As long as there's no API that accepts a numeric "index", I'll resolve the arg0 concern as well. I just want to make sure that anything accepting an "index" uses 0 to mean arg0, and 1 to mean the first command-line argument.

My understanding is that we don't want to support any of these APIs because that is not the point of Command. As @ChrisDenton said, it's a function call with optional arguments, not long-term data storage.

There is some desire to support an extended Debug impl that also prints environment variables for logging purposes, but I feel that this should be a separate issue.

As @ChrisDenton said, it's a function call with optional arguments, not long-term data storage.

In building that function call incrementally (which is a common usecase for builder patterns), some amount of read access is required, as I laid down above. This has nothing to do with using Command as long-term storage.

@rfcbot resolve arg0

🔔 This is now entering its final comment period, as per the review above. 🔔

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

Does not offer a way to get an environment variable by name (get_env). I figure this can be added later if anyone really wants it. I think the motivation for this is weak, though. Also, the API could be a little awkward (return a Option<Option<&OsStr>>?).

Not that it affects this PR, but something like this would be fine I think:

struct ProcessEnvSetting<'a> {
  Inherit,
  Unset,
  Set(&'a OsStr),
}

impl<'a> ProcessEnvSetting<'a> {
  // Convenience method.
  fn value_in_child(&self) -> Option<Cow<'a, OsStr>> {
    // query current environment if `Inherit`
  }
}

env_clear could then set an internal flag to return Unset rather than Inherit for any variable not explicitly set.

Triage: The feature has been stabilized by #88436, closing as complete.