murarth / gumdrop

Rust option parser with custom derive support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation should mention how to accept filenames as arguments.

vi opened this issue · comments

A filename or path is not a String, but OsString. I don't see any mention of OS strings in crate-level documentation for gumdrop.

Also I see parse(try_from_str and parse(from_str, but I don't see the exhausive list of what I can put into parse(...). Is there from_os_str?

gumdrop does not currently accept OsString values at all. The Parser structure accepts only values implementing AsRef<str>.

This is because the parser must check whether each string begins with the substring - or -- in order to report short and long options to the caller. OsStr does not currently provide a platform-independent method equivalent to str::starts_with. Nor does it allow iterating over the characters of a string (which is required for processing short option strings).

It may be possible to implement an internal gumdrop API that implements these operations atop the platform-specific Unix and Windows OsStr extensions, but I have not yet explored this possibility.

starts_with

Just view OsStr as byte slice, then use something like starts_with(b"--")? Should work, at least on Unix. I also found this (but haven't checked).

does not currently accept OsString values at all

It should be documented then, with specific mention about filenames. What happens if one specifies a non-UTF8 argument?

It would not be possible to implement this feature only on Unix, as it requires changing the public API to accept argument values as T: AsRef<OsStr> instead of T: AsRef<str>. Making such a change on only one platform would be messy and not a very good idea. However, it may be possible to also implement this feature on Windows. I will investigate this soon.

As to your last question, the functions parse_args_or_exit and parse_args_default_or_exit call std::env::args to the get the list of arguments as String values. That function will panic if any arguments cannot be stored in a String. I have just pushed a commit that adds this documentation.

In any case the limitation should be mentioned in the docs for current version. For example, example code blocks may include #[options(short="o")] output_file: Option<String>, with a notice in the comment about non-Unicode pitfall.

Bugs in handling of non-unicode filenames are easy to miss.

What's the concrete issue here? At least on recent Rust versions, I'm able to parse arguments to e.g. PathBuf. Perhaps this is facilitated by some recently added trait impls?

Internally, gumdrop processes arguments using str rather than OsStr, meaning that non-Unicode filenames given as arguments can cause problems. When you're parsing the arguments to PathBuf or whatever, it's actually just converting &str/String to PathBuf rather than &OsStr/OsString, so it will still break with non-Unicode paths.

All of the above assumes nothing has changed in the past few months (I'm pretty sure nothing has).

Internally, gumdrop processes arguments using str rather than OsStr

...but as @murarth already pointed out, the arguments come in as String, not OsStr(ing).

Can you give concrete, minified examples (outside of gumdrop, using std alone) of:

  1. How one of these breakages might occur
  2. How to avoid the issue

?

You can avoid the issue by using OsString rather than String. You can use std::env::args_os() to get an iterator returning the arguments as OsStrings (or &OsStrs? I can’t remember which). This crate uses std::env::args() if it’s getting Strings (as mentioned by @murarth).

You can easily break something that uses String for argument parsing by creating a file using garbage non-Unicode characters in the path and trying to use the path as an argument to the program.

If you want a concrete example that doesn’t use gumdrop, just find any program using the getopts crate and give it a non-Unicode path.

You can avoid the issue by using OsString rather than String.

Wouldn’t that have an analogous problem in that it would break parsing of Unicode String arguments?

Do you actually need to know that the argument is Unicode most of the time? If it’s a path that you use to open a file, you likely don’t. If it’s a command-line option, you can just use the OsStrExt or whatever methods to check the string. If you really need Unicode for whatever reason for one string, you can either do a lossy conversion or a checked conversion to String (Cow<‘_, str> for the lossy conversion).

The benefit of OsString is that you have the ability to handle everything (although it may be more effort), whereas String will just not work in certain cases.

clap-rs/clap#1524 might be of interest. I made an OsStr wrapper with string-like methods that tries to do-the-right-thing, and have had a go at replacing clap's existing OsStr handling with it.

For the record, this is the number-one reason why I'm not even considering switching from structopt to gumdrop to cut down the size of the template I use for writing little CLI utilities.

I have mojibake'd paths on my drives that would panic something that assumes arguments are valid UTF-8.

In fact, they revealed that Serde's PathBuf serialization fails unexpectedly if you try to output not-valid-UTF8 paths to JSON. My workaround for that was to come up with a newtype implementing an escaping scheme. (POSIX paths may contain any sequence of bytes except binary null, and spec-compliant JSON parsers are required to support strings containing binary nulls, so, by using binary null as an escape character that means "reinterpret the next codepoint as a byte", the 99+% of paths that are String-able pass through the escaping process unchanged.)

(The same project also discovered that Serde panics on attempting to serialize the pre-epoch modification timestamps from an old floppy disk with mildly corrupted filesystem metadata.)