fish-shell / fish-shell

The user-friendly command line shell.

Home Page:https://fishshell.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

associative arrays ("dictionaries") would be useful

maxfl opened this issue · comments

I thought I will implement it by myself, but currently I have no time for it. So I just write down the idea in case somebody will like it and implement, or in case I will return back to it next year.

The idea is to extend the definition of variables so the strings could be used as the access-keys in addition to integer numbers. One can think of them as std::map<string,string>, or python dictionary, or lua tables which are the closest. Possibility to use dictionaries exists also in bash/zsh, though dictionaries are distinct from ordinary variables.

Why it is useful?

The latest example I saw is in __terlar_git_prompt, where you need to get a color and symbol for named elements in a list:

for i in $fish_prompt_git_status_order
    if contains $i in $gs
      echo 'set_color $fish_color_git_'$i
      echo 'echo -n $fish_prompt_git_status_'$i
    end
end | .

Where $i runs the items from the list: added modified deleted etc.

If one could use variables as dictionaries, this piece of code could be rewritten in the following way:

for i in $fish_prompt_git_status_order
    if contains $i in $gs
      set_color $fish_color_git[$i]
      echo -n $fish_prompt_git_status[$i]
    end
end

Which for me looks much nicer and simpler.

How not to break the ordinary variables?

That is actually very simple. The idea is almost the same as in lua tables (http://lua-users.org/wiki/TablesTutorial):

  • If the key is number, it refers to the 'array' part of the variable, which is what a variables currently.
  • If the key is a string (starts from a letter for example), it refers to the dictionary part.
  • Dictionary part should not affect all the previous behaviour. "$variable" is expanded to a list of array part. 'count $variable' will print the number of elements in the array part.
  • "set var " also works only with the array part.

The differences comes with handling subscription:

  • 'set var[key] value' or 'set var[key1 key2] value1 value2' should be expanded to handle string keys.
  • $var[key] or $var[$key] or $var[key1 key2] should return values, which were previously set.
  • There should also be functions for working with dictionaries, like getting list of keys for iteration, or getting the number of elements in dictionary part.
set var[a b c] d e f
echo $var  #-><nothing>
count $var #->0
get --size var   #-> 3
get --keys var   #-> a b c 
get --values var #-> d e f
get --pairs var  #-> a d b e c f

Other issues

  • One should also update universal variables handling.
  • Dictionary part of the variables could not be exported.

Thats it

Any ideas/thoughts/critics are appreciated.

Question: can the map value be a multi-element list, or is the value restricted to one element?

If no, then maps are actually inferior to the current environment maps, so not every use case of set prefix_$key $value can use maps instead, since $value may expand to multiple elements.

If yes, then get --pairs is broken, since there is no nested list structure in fish (yet). This is a relatively minor issue, as you can always use a for loop to iterate through all the keys. This also somewhat breaks getting/setting multiple keys at once: $var[a b] loses the boundary of $var[a] and $var[b] when both are lists, and the difference between set var[a] a b c and set var[a b c] a b c can be a bit confusing

Trivia: There is catch that when the answer is "yes", where maps are not inferior to current environment maps, maps are still inferior to environment maps after maps are implemented, since maps cannot contain maps, while environment maps can. OTOH, in a shell language, the trouble of nested data structures can be much larger than its benefits, and non-nested data structure is good enough.

Originally the idea was to restrict it to one element list.
The fish have one complex type (array, which is environment variable) which can contain basic type elements (strings). The idea is to extend the complex type, keeping it able to contain the strings only. This operation supposes the minimal changes in the code.

The abilities are of course less than the abilities of environment maps, but even in this form give a lot of advantages.
'set prefix_$key $value' is the bad example, because it's a trouble to get the value back by the key: simple 'echo $prefix_$key' doesn't work, because you want to expand the key first. The solutions with 'eval' or with local variables are ugly.

Map elements as lists creates an issue of nesting and requires more planning. You start to think of adding the basic type array and finish with idea that the environment variable is the array of arrays. Which can give you a lot of benefits and a lot of troubles.

But your idea is good. Though it breaks setting several map elements with set. The confusion can be solved by specifying the rule: 'set var[a b c] a b c' sets one element per one key or issues an error, 'set var[a] a b c' sets a list for a key. The confusion of boundary of '$var[a b]' is the minor issue.

Thank you for pointing this out.

I'd like to point out some limitations of maps:

  • Functions cannot take maps as arguments
  • With list builtin in place (#445) you will be able to write list literals, but you still cannot write map literals. More generally, builtins and functions will be able to output lists, but not maps.

@xiaq, sure. But I can not think of it as of limitation, because maps were not intended to be passed as function arguments. Function can take an array of strings (and only one array), and there is no way to change it without breaking everything.

But function can take the variable name and use it to get all the keys/values from it, if it's really needed.

What about special pieces of syntax, like $a[-1] or $a[1..3]. This proposal seems to be uncompatible with such syntax.

@glitchmr, the point is in making correct naming limitations. It depends on what actually will be implemented.

  1. If associative containers will be distinct from the usual variables, then different approach in '[]' expansion will do the job: ranges are allowed for the arrays, but disabled for the containers. Container keys can contain any character. No intersection with ranges are possible.

  2. If associative containers are implemented as extension to the arrays (like lua tables), then it should be forbidden to start key names from '-', '..' and a digit. In this case the distinction between container key and array index/range makes the proposal compatible.

  3. The mixed mode is possible: the variable can be declared as pure associative container with any key possible, or as array+container with limitation on the key names symbols.

This would also make the fish_user_abbreviations variable much nicer to use, IMO. A map makes much more sense than a list with key=value values in it. This also solves problems with duplicate keys, and makes it easier to add abbreviations without removing existing ones.

The fish_user_abbreviations "interface" is very much temporary. The current long term plan for abbreviations is something more akin to completions: opaque storage managed by a builtin.

Of all the issues targeted to "next-major" this would be my first pick for inclusion in fish 3.0. The actual implementation work should be straightforward. The hardest part is deciding on syntax and behavior.

I personally would like to have json as a first class citizen in a shell. With tools like jq to access, transform and pipe it around alongside. But I guess a debate about this would be too heated 😅.

But maybe the implementation could keep something like jsonpath in mind? The child operator would be the same like proposed above, I think []:
https://kubernetes.io/docs/user-guide/jsonpath/

@webwurst: I don't think the comparison with json fits here. This is about "dictionaries", i.e. "arrays with string keys". It's still not about nested data structures, which would be where jsonpath (or xquery or something) would become applicable.

Repeating my previous comment I think this should be part of fish 3.0. Because if we don't do this for the 3.0 release it won't be possible before the 4.0 release -- which means we won't have dictionaries before 2019. I'd like people to start thinking about what this would mean for fish syntax and common patterns like for x in $var. Obviously we'll need to add flags to the set command but that's the easy part. Same for changing indexing (e.g., $var[1]) to allow non-integer keys.

Here's an example of a question we want to get right. What should the set syntax look like? Should it be set var key1 val1 key2 val2 or set var key1=val1 key2=val2? What does this mean for this pattern: set var $var new_val1 new_val2 (whatever we decide for specifying key/val pairs)? Perhaps we simply disallow that pattern for dictionaries and adding key/val pairs always extends the dictionary with some way (set --erase dict_var?) to put the dictionary in the empty state.

set var key1 val1 key2 val2 or set var key1=val1 key2=val2?

IMO both are kind of hard to wrap your head around, but the latter seems better. It's too bad one couldn't just pass a dictionary literal as an argument to set.

@floam, What would a "dictionary literal" look like? Presumably you're talking about augmenting fish syntax so that dictionaries can be used in other contexts. Such as the Python dictionary literal syntax: tel = {'jack': 4098, 'sape': 4139}. How would that work in the context of fish? Do we need dictionary literals outside the context of the set command?

I find the Python syntax pretty ideal. I'd need to think about concrete examples of exactly which contexts one would use them in other places in a shell and what the ramifications might be.

For a shell context set var key1=val1 key2=val2 feels pretty natural, and would cause less complications as set var key1 val1 key2 val2 is already used to declare a regular array

@jlsjonas: And what happens if "key1" contains a "="?

@faho @jlsjonas The = approach might clash with possible values for variables already, say an option list for GCC or anything that supports --option=value syntax (or even more worryingly the --map-option=key=value syntax that I encountered once).

I think that specifying to set that we actually want to treat the var as an associative array via a parameter is more appropriate: set --dict / -D. Also, rather than having a delimiter for values, I think that values should be assigned lisp-style, or in strides of two - set -D dictVar key1 value1 key2 value2. This syntax has a number of advantages,

  1. No need for new parameter parsing logic, just assume that input follows that pattern. Any spaces in keys or values would be handled already enquoting them (e.g. set -D dictVar "key 1" "value 1" "key2" "value2").

  2. This makes it easier to procedurally assemble a dictionary from another commands output.

  3. Relating to number one, this doesn't require any special character escape handling for = and doesn't restrict possible values for the key name.

This doesn't clash with a command like set -D dictVar loneValue since it should not be possible to insert a free value in to a dictionary, so uneven set -D directives should always fail.

@RomanHargrave I like your thinking.

Expanding a dictionary var should result in just the keys. For example,

set -D mydict key1 val1 key2 val2
for key in $mydict
    echo "The key is |$key|"
    echo "The val is |$mydict[$key]|"
end

This way we don't have to introduce new syntax such as zsh's ${(k)mydict} to get the keys. Note that the default behavior in zsh is to replace the expansion with just the values.

I agree on the key expansion from a user experience perspective, although depending on how it's implemented, the overhead of the duplicate retrieval could be a pain. Optimizing that should be easy enough, though (you could literally just cache the result of the most recent enumeration in the internal dictionary object, as something like 99% of the time it's going to be the next reference).

@krader1961 what about `for key in (keys $mydict) ... ?

On another note, we should figure out how we want to handle variable scope (exporting) and storage (e.g. out-of-process/environment) for dictionaries:

Bash has associative arrays, but it has a lot of shortcomings. These variables are not stored in the environment, or even exported to child bash processes.

For fish, we have several options:

  1. We could do as bash does, and limit arrays to the session scope, and make them a "special case", which is my least favorite option.

  2. We could still treat arrays as a special case, and export them to other fish process (children especially) using the universal variable store. This is a great option, but still could present issues for those who want to use dictionaries heavily in their scripts (multiple running scripts would interfere across sessions).

  3. We could encode the dictionary as plaintext and store it in the environment like a regular variable. This also means that other shells and processes could understand them.

Option 3 details

Luckily, there exists just the thing for this - delimiter characters.

Detail-wise, there are two ways we could go about this. The first is to store each pair as a single array element, with a unit separator (ASCII 31) between the key and the value. The second is similar to the first, with the same scheme encoding scheme for pairs, but with a record separator between elements (ASCII 30).

For reference, here's a pure fish implementation of an associative array using US-separated pairs stored as array elements: https://gist.github.com/c6330ab68bbb36268ba2720c2a01fdcb.

what about `for key in (keys $mydict) ... ?

Note that expansion of $mydict happens before keys is executed. We could implement keys to return just odd numbered elements (assuming the usual one based indexing of fish vars) and also provide a vals command to return just even numbered elements. The problem with this is that anyone iterating over dicts has to use one or the other since we don't have anything like python tuples. This would be a major PITA. Better to just return keys when expanding the var. The user can then use the key to lookup the value.

Having $dict expand to key/value pairs would only be useful when doing set dict $dict new_key new_val. And it makes more sense to leverage the new --append flag I just introduced so you just write set -a dict new_key new_val.

Bash has associative arrays, but it has a lot of shortcomings. These variables are not stored in the environment, or even exported to child bash processes.

This will be solved by serializing vars using google protobufs. See issue #3341.

Luckily, there exists just the thing for this - delimiter characters.

That's what fish currently does. See the ARRAY_SEP and ENV_NULL symbols in the source. That approach has numerous problems. Not least of which is that it means you can't use those characters in the keys and values of a fish var without another level of abstraction.

That's what fish currently does. See the ARRAY_SEP and ENV_NULL symbols in the source. That approach has numerous problems. Not least of which is that it means you can't use those characters in the keys and values of a fish var without another level of abstraction.

Those suggestions were more about passing arrays to other (potentially non-fish) processes. fish-to-fish is pretty much a non-issue.

Those suggestions were more about passing arrays to other (potentially non-fish) processes.

You wrote 0x30 but clearly meant decimal 30 which is the 0x1E character fish currently uses: #define ARRAY_SEP_STR L"\x1e". That approach is fragile and AFAIK no non-fish processes have ever taken advantage of our current encoding. Obviously I could be wrong about that since I've only been using fish for just shy of two years. But it seems highly unlikely any other programs are correctly decoding fish vars that consist of more than one value. Also, if we want to make that easy we should be exporting our vars using a robust encoding such as google protobufs or JSON. See issue #3341.

You wrote 0x30

Removed the leading 0x.

Also, if we want to make that easy we should be exporting our vars using a robust encoding such as google protobufs or JSON.

My concerns are primarily that we fish transmit data using the standard method (the environment) even with its shortcomings. Using a standard encoding like Protobufs or JSON as armor with the environment is a great idea, and would certainly be a good solution.

My concerns are primarily that we fish transmit data using the standard method ...

The main problem with our current encoding with regard to pushing vars into the environment is the same problem afflicting most CSV (comma separated by value) encodings. It doesn't allow for the magic separator value to be unambiguously included in the encoded data as data rather than a separator character. This affects not just sending vars from fish to non-fish programs but also fish to fish scripts. The mechanism you linked to, @RomanHargrave, is fundamentally flawed. It requires that the magic characters not be allowed in the data being exported/saved.

@RomanHargrave, The only sane way to export fish vars that represent lists or dicts is a robust encoding like JSON or google protobufs. Simplistic encodings like you linked to are no better than what we currently do.

Like I said, @krader1961, Protobufs or JSON as armor with the environment is a great idea, and would certainly be a good (though I should have said better) solution.

What's the alternative right now? Is there any way to recreate some kind of dictionary?

set fish 'fish' ~/.config/fish
set nvim 'nvim' ~/.config/nvim
set paths $fish $nvim

Would like to map a path to a name.

Never mind after a couple of searches I eventually found @faho 's answer

and ended up doing:

set keys 'fish' 'nvim'
set paths ~/.config/fish ~/.config/nvim

for key in $keys
  if set -l index (contains -i -- $key $keys)
    some_function $key $paths[$index]
  end
end

IMO this is a very complex feature and out of scope for fish 3.0.

Is there any plan to implement this feature?

Nobody is working on it, as far as I know.

i think it would be cool to implement dictionaries as named scopes which you can explicitly enter or leave.

while thinking about this i came up with another solution until we get a built-in: use functions with method-injection. (if you want modifiable dicts use a simple code-generator to rebuild the function)
very simple example:

function dict
    set -l -- foo my dict value
    $argv
end
function get --no-scope-shadowing -a __key
    echo $$__key
end
dict get foo  # my dict value
  • key-getters should exclude argv, (optional: _flag_*) and __* so you can use vars in methods without having to pass an exclude-list later on simply by prefixing with "__"
  • value-getters should always null-join values. this way multiline-strings work and getting values is always the same pattern: the returned value has to be null-splitted.
  • escape the values when building a definition for sourcing
  • get the dictionary name by looking at functions in the stack-trace.
  • as for passing the dict to other programms, you could export the items and a variable containing the key names...

In the ye olde TODO list from fish 1.x: how to export keys and values was pretty simple: don't.

  • Map variables. (export only the values. When expanding with no key specified, expand to all values.)

Any news on this? It's very strange to me that fish doesn't have this, it's a regression from Bash

I have given up and have just defined an external command to represent a dictionary using an association list. I'm not fluent in fish shell language, so I wound up writing it in Lua. It's kind of a pity to have to fork a process for every dictionary operation, but forking a process isn't the big expense it was in my youth. And this approach could easily be ported into a fish function.

Why is it off topic

@JulesGM Because it's not moving the topic forward. Please stop, this isn't helpful.

If there really is an assoc array, we can make an OO module.

However, I think it would be feasible to mimic the assoc array behavior using PHPish "Variable Variables". Implement it in pure fish and then put it as a shell feature or extension in master.

Here's a proposal for association lists:

# create a new association list (hint: it's just a list)
set assoc key1 val1 key2 val2

# extend the for loop syntax to take two values at a time
for key val in $assoc
    echo -n "$key=$val;" # prints key1=val1;key2=val2;
end

# new index syntax [@key] finds first key match, then returns next value
echo $assoc[@key2] # prints val2

# use set commands to modify assoc lists
set assoc[@key2] "new value"     # change val2 to "new value"
set assoc[@new key] val          # add "new key" and val to assoc list
set -a assoc key3 val3 key4 val4 # combine lists
set -e assoc[@key1]              # remove key1 and val1 from assoc list

# add flags to other functions
contains --key key3 $assoc   # succeeds
contains --value nope $assoc # fails
count --pairs $assoc         # prints 4

These lack type safety which makes them more error prone (duplicate keys etc) but I think in the context of shell programming, reusing lists like this is the simplest approach. We can naturally represent and print association lists, keys and values can contain any chars including spaces and numbers, and they don't introduce breaking changes.

nice but i believe count should not accept any args because otherwise it should also include a '--' to escape flags ect. there could be a new command or if you know its an assoc list just divide by two...