Suggestion: Decide on a proper grammar before implementing a fully fledged parser

Question

Suggestion: Decide on a proper grammar before implementing a fully fledged parser

tjgurwara99 opened this issue 3 years ago · comments

Very cool project!!

I was thinking about this yesterday and I think before writing a concrete parser, there should be a formal specification (token/grammar) of what kind of operators it should handle - otherwise it will become a technical debt and the parser would start to get messy.

Right off the bat, I think you should consider what operators should and shouldn't be legal in this shell. Once that is decided, then the grammatical (or syntactical) structure can be defined.

These are a few tokens that you may want to consider. Depending on your idea, you might want a simple one, in which case most of them can be ignored. But if you're aiming for a fully fledged shell then they might be useful to think about before hand.

const (
	PIPE = iota
	DOT
	COMMA
	COLON
	SEMICOLON
	LPAREN
	RPAREN
	LBRACK
	RBRACK
	LBRACE
	RBRACE
	EQUALS
	PLUS
	MINUS
	MULT
	DIV
	MOD
	NOT
	AND
	OR
	XOR
	LSHIFT
	RSHIFT
	LOR
	LAND
	LNOT
	LT
	GT
	LE
	GE
	EQ
	NE
	INC
	DEC
	ASSIGN
	ADD_ASSIGN
	SUB_ASSIGN
	MUL_ASSIGN
	DIV_ASSIGN
	MOD_ASSIGN
	AND_ASSIGN
	OR_ASSIGN
	XOR_ASSIGN
	IF_STMT
	ELSE_STMT
	FOR_STMT
	WHILE_STMT
	BREAK_STMT
	CONTINUE_STMT
	RETURN_STMT
	FUNC_STMT
	BLOCK_STMT
	VAR_STMT
	ESCAPE // for escaping special tokens (the ones defined here)
)

Once the grammar is fixed, then you would need to work on writing a Lexer for each of these tokens. Go is great when it comes to writing your own lexer! Anyways, if you need any help, let me know 😄 I'd love to help out with this one. It will be a very good learning experience for me 😄

Rak Laptudirm · Answer 1 · Wed Nov 10 2021 12:50:38 GMT+0800 (China Standard Time)

Thank You for being interested it mash!

So first of all, the package name is wrong, it is not a parser. The thing that is in the parser package is actually a lexer. I noticed it but was to lazy to change it (typical me).

The current syntax the lexer assumes is very simple, so it does not require a parser. It just breaks the command into "words", the first word it the command and the rest are the arguments.

Anyways the current implementation it bad. It works, but it is hard to add anything new. Since the whole lexer is hidden behind it's exposed api, it should not be hard to change the implementation. The problem is I am currently trying to decide on the syntax. There are many things, for example, I would like to get from bash, but of which i do not like the syntax of. Once that is decided a lexer should be easy, and a parser design will also be created.

I also want to put as much of the functionality possible into the builtin commands, so that it remains more of a shell that a programming language. I would of course want to make sure that it can do all that a regular programming language can, maybe a bit less.

Currently, I am trying to stay on commands only (will look at other types of statements later). I have been using this Stackoverflow question as a guide, and am trying to change them into a modernistic looking language, while keeping it easily usable as operators. If you have a syntax in mind, please let me know.

Here is the notes file i have been scribbling my thoughts on:

mash syntax:
1. Commands with args
2. Redirection operators [both stdout and stderr]
3. Logical operators

Logical operators:
1.  [command] && [command]
2.  [command] || [command]
3. ![command]

Redirection operators:
1. [command] [i[o]e]>  [stream]
2. [command] [i[o]e]>> [stream]
3. [command] [i[o]e]|  [command]
4. [command] <         [stream]

Streams:
Stdin:  i
Stdout: o
Stderr: e

Taj · Answer 2 · Wed Nov 10 2021 20:53:16 GMT+0800 (China Standard Time)

Ah cool, the operations you pointed out are I think is enough to start with

mash syntax:
1. Commands with args
2. Redirection operators [both stdout and stderr]
3. Logical operators

Logical operators:
1.  [command] && [command]
2.  [command] || [command]
3. ![command]

Redirection operators:
1. [command] [i[o]e]>  [stream]
2. [command] [i[o]e]>> [stream]
3. [command] [i[o]e]|  [command]
4. [command] <         [stream]

Streams:
Stdin:  i
Stdout: o
Stderr: e

If you'd like I can start working on the lexer - see whether you like this kind of approach because its quite a bit different from other approaches (eg Lex - the tool and regular expressions). Once the lexer is done then we can work on Parser 😄. The benefit of my approach is that you can do lexing and parsing concurrently. - I will push something in the next few days if you're okay for me to work on it?

Rak Laptudirm · Answer 3 · Wed Nov 10 2021 21:14:21 GMT+0800 (China Standard Time)

You can create a new branch on your fork and then open a pr here. It is open source for a reason :)

Here is a grammar specification. I have described it in simple english:

A command is composed of words, single quoted strings, and double
quoted strings, with white-space separating them (not required if two
different types are side by side). The strings will be parsed by
strconv.Unquote(). Normal words will be kept raw. Any white-space which
is not inside a string is discarded and acts as a seperator.

Operators are defined in the above comment by me.

If you need any help, feel free to ask me (though I think you won't).

Rak Laptudirm · Answer 4 · Thu Nov 11 2021 12:29:03 GMT+0800 (China Standard Time)

@tjgurwara99 Also, what is your opinion on the operator syntax I have created. Any suggestions would be greatly appreciated.

Taj · Answer 5 · Thu Nov 11 2021 17:29:10 GMT+0800 (China Standard Time)

Everything seems to be fine - nothing too drastic IMO, but I think there is an issue with redirection operators. The redirect operators > and < are used to write to file or stream (any io.writer) but pipe command like this command_1 | command_2 is essentially command_1 > tmp_file && command_2 < tmp_file so I don't think it makes sense to have [i[o]e] with pipe. Unless you were thinking of it in some different way, in which case could you elaborate as to what your thinking for pipe was?

Rak Laptudirm · Answer 6 · Thu Nov 11 2021 21:21:55 GMT+0800 (China Standard Time)

The i should not be there, that is true. But i thought they should be able to configure whether they want to pipe the Stdin, Stderr, or both.

Taj · Answer 7 · Thu Nov 11 2021 23:56:38 GMT+0800 (China Standard Time)

Is there a particular use case that you have in mind. I can't think of why you would need to pipe stderr?

Not to say that this would be difficult but rather a question of design choice. I think it is worth considering what things are just syntactical sugar and what is an actual feature. Cause we can easily work on syntactical sugar part later but the features need to be robust so that there is no technical debt to it and sugar can be added without issues.

Rak Laptudirm · Answer 8 · Fri Nov 12 2021 00:10:20 GMT+0800 (China Standard Time)

Yes we can work on it later. The reason to pipe Stderr is simply to maintain a regularity with the other operators and just because we can. There can be specific areas where it can be useful, but actually it is more due to the above reasons.

Taj · Answer 9 · Wed Nov 17 2021 04:01:28 GMT+0800 (China Standard Time)

A question came to me today about this. Do we need the lexer to be public at all? We can do the whole lexing and parsing in the same package and the lexer doesn't even have to be public. What do you think?

Rak Laptudirm · Answer 10 · Wed Nov 17 2021 16:42:37 GMT+0800 (China Standard Time)

I don't think the lexer needs to be public. You can combine them into the same package.

Rak Laptudirm · Answer 11 · Mon Jan 24 2022 17:48:49 GMT+0800 (China Standard Time)

Suggestion for the Lexer

The Lexer will return the following types of tokens:

WORD
DOUBLE_QUOTED
SINGLE_QUOTED
EOF

The parser will extract relevant information from them depending on the context.

Lexing details

All tokens will be emitted at runs of white-space, except inside strings.
A semicolon will be emitted at each newline, whose preceding token was a string.
A semicolon will be inserted if the preceding word is a valid identifier.
A semicolon will be inserted if the preceding word is one of ), }, or ].
An EOF token will be emitted at the end of the file.

Definitions

string: a token of type SINGLE_QUOTED or DOUBLE_QUOTED
word: a token of type WORD
identifier: a word which matches [_a-zA-Z][_a-zA-Z0-9]*

Rak Laptudirm · Answer 12 · Mon Jan 24 2022 18:38:00 GMT+0800 (China Standard Time)

Here is the current parser grammer:

(* the entire mash script *)
program = { statement } EOF ;

(* a statement is an effector followed by a semicolon *)
statement = [single | block] ";" ;

(* a block statement is a list of statements *)
block = "{" statement "}" ;
(* a single is a non block statement *)
single = condition | loop | command ;

(* condition statement is the if-elif-else statements *)
condition = "if" expression block { "elif" expression block } [ "else" block ] ;
(* loop ais the for loop in c-type languages *)
loop = "for" expression block ;

(* command is a builtin or executable *)
command = string { string } ;

string = WORD | SINGLE_QUOTED | DOUBLE_QUOTED ;

Rak Laptudirm · Answer 13 · Mon Jan 24 2022 18:40:25 GMT+0800 (China Standard Time)

If you are not familiar with Extended Backus-Naur form, I would suggest the Table of Symbols in this wikipedia page.

Taj · Answer 14 · Mon Jan 24 2022 18:41:34 GMT+0800 (China Standard Time)

Regarding the parser's grammar, I think you also need to consider the order of pipes and redirects.

PS: I'm familiar with EBN form

Rak Laptudirm · Answer 15 · Mon Jan 24 2022 18:41:35 GMT+0800 (China Standard Time)

I am also updating the gists in which I previously wrote the grammar to use this one.

Rak Laptudirm · Answer 16 · Mon Jan 24 2022 18:44:44 GMT+0800 (China Standard Time)

Regarding the parser's grammar, I think you also need to consider the order of pipes and redirects.

PS: I'm familiar with EBN form

Yes, the production for the grammar and expression still need to be added. I just wanted to make sure we are on the same page on the operators. I posted this once before, but this was the suggested operators for commands:

mash syntax:
1. Commands with args
2. Redirection operators [both stdout and stderr]
3. Logical operators

Logical operators:
1.  [command] && [command]
2.  [command] || [command]
3. ![command]

Redirection operators:
1. [command] [b | e]>[c]  [stream]
2. [command] [b | e]>>[c] [stream]
3. [command] [b | e]|     [command]
4. [command] <[c]         [stream]

Streams:
Stdin:  i
Stdout: o
Stderr: e

Any suggestions to improve it?

Taj · Answer 17 · Mon Jan 24 2022 18:50:19 GMT+0800 (China Standard Time)

I think the operators are good, they seem to be different but not difficult to work with. The only thing I was concerned about was to have it in Backus-Naur form. For example, currently bash has this BN form (if I remember correctly)

cmd [arg]* [ | cmd [arg]* ]* [ [> filename] [< filename] [ >& filename] [>> filename] [>>& filename] ]* [&]

If you can think of the order of operations and priority that would be infinitely more useful than having a proper grammar - because grammar can be extended later based on the order and priority of operations (think BIDMAS in maths). Does that make sense?

Rak Laptudirm · Answer 18 · Mon Jan 24 2022 18:52:03 GMT+0800 (China Standard Time)

If you can think of the order of operations and priority that would be infinitely more useful than having a proper grammar - because grammar can be extended later based on the order and priority of operations (think BIDMAS in maths). Does that make sense?

I am not sure what you tried to say there, could you elaborate?

Taj · Answer 19 · Mon Jan 24 2022 18:59:16 GMT+0800 (China Standard Time)

Sure, I'll try to construct an example. But I'm at work right now so will have to explain it to you in the evening (in about 8 hours) 😓

Taj · Answer 20 · Mon Jan 24 2022 19:07:32 GMT+0800 (China Standard Time)

In the mean time, could you try to write the grammar in Backus-Naur form (not extended BN form)?

Rak Laptudirm · Answer 21 · Mon Jan 24 2022 19:26:02 GMT+0800 (China Standard Time)

I am actually much more familiar with EBNF, and also know how to convert each operator as code. If it is better to use BNF, I will try to learn it though.

Taj · Answer 22 · Tue Jan 25 2022 06:13:22 GMT+0800 (China Standard Time)

The benefit of BNF is that it forces you to think about the order in which operations are legal. The EBNF is equivalent (in fact better for certain complex languages) but I think your EBNF is not complete.

As mentioned earlier, the BNF of shell is something like this

cmd [arg]* [ | cmd [arg]* ]* [ [> filename] [< filename] [ >& filename] [>> filename] [>>& filename] ]* [&]

The [...]* means zero or more can be present and [...] means zero or one. So a simple command will be something like

You can note from this that:

| filename is illegal.
cmd | cmd2 is legal.
The & at the end means that the commands on the left and right would be run asynchronously and can be present without the right hand side command.
a < b < c is illegal - can't have two inputs on the same line. In your current syntax on the linked gist its possible to have more than two inputs I think, but should it really be?
a < is also illegal - can't have no file redirection on input.
Also forces the whitespace characters around the redirections (although that's my personal taste and not the language spec I think)

etc etc.

As you can see that this makes it clear what the precise nature of the language is. Indeed, you can choose to make these ambiguous but if you do that and write a specification then the next person who writes your language might not behave the same in all conditions but will essentially be the same language. Python comes to mind as their specification is a bit loose in certain areas and different variants of python (Jython, PyPy, Python etc) behave a bit differently (although the basics are the same in all variants since they were unambiguous).

So I wanted you to consider writing these down precisely so that we can write proper tests in order to completely nail the basic language and extend from there.

Rak Laptudirm · Answer 23 · Tue Jan 25 2022 19:30:37 GMT+0800 (China Standard Time)

The final grammar is of course going to be unambiguous. Before writing the grammar for an expression, we need to decide on what operators to include in expressions. Notice the distinction between the command and a normal expression or statement. That is one of the design choices I think should work, but I would like your perspective on it.

Currently, I am thinking to treat statements which do not start with a keyword as a command. So:

if true {
  # do stuff
}

echo "Hello, World!" # command

This does create some problems, which I would like your help on.

Automatic semicolon insertion does not work, because the last token may be a non-identifier word.
Is treating commands as their own separate entity is a good idea?

Rak Laptudirm · Answer 24 · Tue Jan 25 2022 19:41:33 GMT+0800 (China Standard Time)

Also, here are the suggested expression operators.

+    sum                    integers, floats, strings
-    difference             integers, floats
*    product                integers, floats
/    quotient               integers, floats
%    remainder              integers

&    bitwise AND            integers
|    bitwise OR             integers
^    bitwise XOR            integers
&^   bit clear (AND NOT)    integers

<<   left shift             integer << integer >= 0
>>   right shift            integer >> integer >= 0

==    equal
!=    not equal
<     less
<=    less or equal
>     greater
>=    greater or equal

&&    conditional AND
||    conditional OR
!     NOT

I am using golang's operator set as it is smaller than most others but provided much better functionality.

Rak Laptudirm · Answer 25 · Wed Jan 26 2022 00:26:44 GMT+0800 (China Standard Time)

Also, another little implementation detail. I am thinking of parsing to an AST, which we will traverse to form bytecode. How does that sound? Do you have another suggestion for code execution?

Rak Laptudirm · Answer 26 · Thu Jan 27 2022 01:06:49 GMT+0800 (China Standard Time)

@tjgurwara99 I have updated the formal grammar spec with more details. Please see and review it.
https://gist.github.com/raklaptudirm/9aa25462cbb434906a340d047184a23e

Taj · Answer 27 · Thu Jan 27 2022 02:58:41 GMT+0800 (China Standard Time)

Automatic semicolon insertion does not work, because the last token may be a non-identifier word.

Can you give an example of a non-ident where this doesn't work? If you look at the spec for go under the section lexical elements there should be a subsection semicolon which explains their consideration - I think the same principle would apply here as well.

Is treating commands as their own separate entity is a good idea?

It is a design consideration. I don't think there is a problem with that; it might need some more thinking when it comes to constructing the AST though.

Second thing that I think you are confusing yourself with is that you consider that if a statement is not starting with a keyword then it is a command, but what do you mean by keyword? The grammar doesn't explain that right? Also, if it is what I think it is then if is a keyword but then echo is also a built-in command (which would be a keyword)?

Anyways, to not go down the road of constant improvisation on every new feature, I recommend constructing a simple baseline program in your newly confirmed syntax (a program that does very basic things and you want your language to execute this). This way, we would have a baseline to work towards and build on top of - does that make sense?

I like the operators that you have chosen, but then what happens when you use these with commands; remember we always work with the assumption that user (me) is stupid 😂.

Also, another little implementation detail. I am thinking of parsing to an AST, which we will traverse to form bytecode. How does that sound? Do you have another suggestion for code execution?

IMO, there is no need to do byte-code if you're going to convert it to AST to begin with. Just parse it to AST from lexer directly - I don't think that is difficult. Unless I misunderstood you here - I don't know what you mean by "traverse from bytecode" - I'm assuming from byte-code to AST, am I right in my assumption?

I left some comments on your gist about the grammar, let me know if anything doesn't make sense there.

Also I recommend looking at some programming language levers and tokens. Here's the source for go's lexer and parser.

Taj · Answer 28 · Thu Jan 27 2022 03:24:28 GMT+0800 (China Standard Time)

Also, in the meantime till we have the grammar confirmed, if you'd like to sharpen your language writing skills, I can help you write the Monkey language interpreter with AST... if you think that would help and not waste time. Let me know 😄

Rak Laptudirm · Answer 29 · Thu Jan 27 2022 12:03:22 GMT+0800 (China Standard Time)

Can you give an example of a non-ident where this doesn't work?

I think I have actually got it figured out. If the current line is a command, automatically insert a semicolon at the end of line if the last token is a string.

I think you are confusing yourself with is that you consider that if a statement is not starting with a keyword then it is a command, but what do you mean by keyword?

If you look at the grammar for the statement, it has a command terminal, followed by others (like conditional, expression, etc). All the terminals other than command start with a certain keyword if you check the grammar. That is what I mean when I say that a statement starts with a keyword. Hope I cleared that up.

IMO, there is no need to do byte-code if you're going to convert it to AST to begin with. Just parse it to AST from lexer directly - I don't think that is difficult. Unless I misunderstood you here

You did misunderstand me. I meant we first parse the tokens to form the AST and then go through the AST (traverse it) to form or create bytecode which can then be run by the virtual machine.

Rak Laptudirm · Answer 30 · Thu Jan 27 2022 12:04:12 GMT+0800 (China Standard Time)

Also, in the meantime till we have the grammar confirmed, if you'd like to sharpen your language writing skills, I can help you write the Monkey language interpreter with AST... if you think that would help and not waste time. Let me know 😄

We surely can. Tell we what to do and we will start.

I looked at it and the language is surprisingly close to the syntax of mash, which just makes it a better exercise.

Taj · Answer 31 · Thu Jan 27 2022 17:39:02 GMT+0800 (China Standard Time)

Here's what I did yesterday on Monkey - we can collaborate there and work with that as a simple language to work with and go from there.

There's a book that I read a few years back (I think 2017-18) called "Writing an Interpreter in Go" - it develops Monkey and I followed that for my first implementation of Monkey because I wanted to write my own language once. At the time, I followed the book but now I have more experience with this so I'm writing things in a bit of a different way. You can refer to the book as a basis for my implementation on the repo and we can work there.

Rak Laptudirm · Answer 32 · Sat Jan 29 2022 21:50:52 GMT+0800 (China Standard Time)

@tjgurwara99 What is your opinion on this:
There will be no function statements. Creating a function will be like creating a variable and assigning a function expression to it. Also, there will be no argument list. Every function body will have an args array, which will contain all the arguments provided to the current running instance of the function.

let function := func {
  # function body
}

Taj · Answer 33 · Sat Jan 29 2022 21:53:55 GMT+0800 (China Standard Time)

I think its alright 😄 I'm all for first class functions and it keeps things simple.

I'm not sure about the no arguments part of it though... I personally don't like it (as its not explicit what is happening or reader friendly) but up to you...

On a sidenote, not sure if you want to have a distinction between := and =

Rak Laptudirm · Answer 34 · Sat Jan 29 2022 21:55:58 GMT+0800 (China Standard Time)

What I wanted to do is require variable declarations. So := is for declaring a variable while = is for assigning to it.

Taj · Answer 35 · Sat Jan 29 2022 21:58:01 GMT+0800 (China Standard Time)

What I wanted to do is require variable declarations. So := is for declaring a variable while = is for assigning to it.

Thats all good but then do you need let in the assign statement too? If so we shouldn't have let in declarations either. Does that make sense?

Rak Laptudirm · Answer 36 · Sat Jan 29 2022 21:59:42 GMT+0800 (China Standard Time)

I understand what you are trying to say, but non command statements need to start with a keyword. You can think of let as saying the there will we some sort of declaration after it.

Rak Laptudirm · Answer 37 · Sat Jan 29 2022 22:00:57 GMT+0800 (China Standard Time)

Otherwise if expressions start with an identifier, we will be wondering whether it is a command or a indentifier.

git ...

Rak Laptudirm · Answer 38 · Sat Jan 29 2022 22:05:31 GMT+0800 (China Standard Time)

@tjgurwara99 How does the mechanism sound to you? Do you have another way we could do this?

Taj · Answer 39 · Sat Jan 29 2022 22:07:29 GMT+0800 (China Standard Time)

I understand what you are trying to say, but non command statements need to start with a keyword. You can think of let as saying the there will we some sort of declaration after it.

Hmm... Not sure what you mean here. Even if you look at the current bash or zsh we have simple assigns right? For example GO_PATH environment variables and such. And setting it as as simple as GO_PATH=/path/to/gopath/ so as you can see this is not a command but rather within the language. I think the declaration can as easily be the same...

I think you are viewing the program as command and not command but personally I think you should consider it the other way around. If its a program the syntactical analysis will take care of it, if its not a program then it falls back to the simple shell (look for command and if the command exists then execute it). Does that make sense?

Rak Laptudirm · Answer 40 · Sat Jan 29 2022 22:09:57 GMT+0800 (China Standard Time)

Yes, but that creates problems while lexing. We will be extracting keywords from where it is actually an argument to a command. This mechanism makes it easy to lex the command properly.

And non-command statements should start with the keyword for that very reason.

Rak Laptudirm · Answer 41 · Sat Jan 29 2022 22:15:24 GMT+0800 (China Standard Time)

Those are the reasons the let keyword is required in my opinion. What do you think @tjgurwara99 ?

Taj · Answer 42 · Sat Jan 29 2022 22:24:26 GMT+0800 (China Standard Time)

Yes, but that creates problems while lexing. We will be extracting keywords from where it is actually an argument to a command. This mechanism makes it easy to lex the command properly.

And non-command statements should start with the keyword for that very reason.

Lexer's job is to lex tokens thats all, parsers job is to create ast and evaluators job is to execute the program ast (usually).

Once the parser has ast, the way to recognise that its a language syntax is simply to think of it as a prgram first and then if it can't be a program then it must either be a command or invalid syntax.

Whether you go with command or not command vs program or not program, you will have two different branches anyways.

Also, making it easier to work with your language is much more important than making it easier to write your language. Simple is better than complex but simplicity is difficult to achieve so even if it may be a bit tricky to do it the other way around, I think its worth it in the long run. Anyways, this is also a design decision so ultimately you're the one to go whichever way you think is best. Let me know what you decide 😄

Taj · Answer 43 · Sat Jan 29 2022 22:25:48 GMT+0800 (China Standard Time)

Those are the reasons the let keyword is required in my opinion. What do you think @tjgurwara99 ?

I think in that case let will be required even for assigns then and that is not ideal either 😅

Rak Laptudirm · Answer 44 · Sat Jan 29 2022 22:26:02 GMT+0800 (China Standard Time)

How do you suggest we should do it?

Taj · Answer 45 · Sat Jan 29 2022 22:29:46 GMT+0800 (China Standard Time)

How do you suggest we should do it?

My suggestion is to go the program or not program way, and nail the small language first and then work on the fallback to commands later since that is well studied in terms of shells

Rak Laptudirm · Answer 46 · Sat Jan 29 2022 22:37:58 GMT+0800 (China Standard Time)

@tjgurwara99
Here is a list of issues which I think the let keyword will solve:

Lexing commands like echo ====== will be lexed as echo, ======, and not echo, ==, ==, ==(treating the == as the operator), as statements will be separated from commands. Otherwise, separation of arguments of commands will become more complicated.
Deciding whether a line is a command or a statement will be trivial, as the statements will start with predefined keywords.
It will prevent the problem where an expression with an incorrect syntax is evaluated as a command, which can be really hard to diagnose. For example, i = a * b c is invalid, probably an operator is missing, but it will be treated like a command and the vm will try to run it, which can lead to undesirable circumstances.

Replacing the current let keyword will require an elegant solution to all of these problems. If you do have one, we can proceed with the newer one.

Taj · Answer 47 · Sun Jan 30 2022 00:38:19 GMT+0800 (China Standard Time)

Yeah now I see the problem, I didn't think about the first point... Nice catch!

Regarding your second point, I think triviality of writing the language is besides the point of making the language easier to work with - if you catch my point 😅

My problem with let keyword is that I think we will use let on both = and := and therefore, I don't think it's an elegant solution to this problem either.

I don't have a solution yet unfortunately. Let's think this through and we can circle back to it after finishing parts of Monkey and you finish the Lexer for this. It seems like you have thought through the direction of going with command or not command route anyways so it's not like we're completely out of options but I think its worth considering better routes (if they exists - right now, I'm not sure my approach would be a better route).

My thinking on shells like bash or zsh is that essentially they are dumb by design because they only need to work with binaries and therefore binaries are a "first class citizens" in the lang of shell. But the plans for mash is to be a bit more nuanced so I don't know...

Rak Laptudirm · Answer 48 · Sun Jan 30 2022 00:46:29 GMT+0800 (China Standard Time)

My problem with let keyword is that I think we will use let on both = and := and therefore, I don't think it's an elegant solution to this problem either.
If you change your viewpoint to what I intend the let keyword to mean, I think it is elegant. You should this think of let as saying that "I am going to assign something to this assignable", and how we are going to do it depends on the assignment operator. Like the for keyword in go, where you know it is going to be some sort of loop, but to know what sort you have to look furthur.

Let me know your opinions @tjgurwara99 .

Taj · Answer 49 · Sun Jan 30 2022 01:01:43 GMT+0800 (China Standard Time)

But then what is the difference between let something = 1 and let something_else := 2. In the AST they would both be the same thing then. Plus lets say that you have

let something = 1
...

something=2

What happens then? At first glance there doesn't seem to be any problem, but lets say that we have a binary (command) called something what would happen to the variable something because syntactically it would be fine as this begins with not command. Because of this we will come back to the complecated impasse again 😂 where we would have to distinguish the program syntax vs command syntax. This is why I don't think that is an elegant solution either. Does that make sense? @raklaptudirm

Rak Laptudirm · Answer 50 · Sun Jan 30 2022 01:05:35 GMT+0800 (China Standard Time)

something=2

That will be treated as a command, and it will also not do anything to the existing variable, which is a completely different thing. Commands and variables are not related. So having a something command wont change anything in the variable.

Taj · Answer 51 · Sun Jan 30 2022 01:08:18 GMT+0800 (China Standard Time)

You've lost me there, I don't understand what you mean. How will that be treated as a command and how will we be able to access the something variable then?

Like how do I access something variable that we assigned with let something = 2 then?

Rak Laptudirm · Answer 52 · Sun Jan 30 2022 01:14:43 GMT+0800 (China Standard Time)

First of all, anything that does not begin with a keyword will be treated as a command. So the lexer will parse the first "word" and then use the lookup to see if it is an IDENT or any of the keyword. If it is a keyword, it will lex the rest of the line like normal lexers do. Otherwise, it will say that the line is a command, and will lex it accordingly (so no operators and keywords and stuff).

Now, when do you use a variable? In a statement. Inside statements(starting with keyword) the language will act like you expect it to, the identifiers are variables and stuff, and you cannot run a command there. So no problem using a var with the same name.

Now it is possible you want to embed the data of a variable into a command. That is possible by the fact that we will add a feature of embedding expressions inside strings, and commands dont work inside expressions so there will still be no ambiguity.

In other words, commands and variables are used in separate places, so there will be no ambiguity in what is which.

Hope I made it clear @tjgurwara99

Taj · Answer 53 · Sun Jan 30 2022 01:50:18 GMT+0800 (China Standard Time)

I still think there is a lot of ambiguity here. I have a few cases that I am considering and according to what you've written down, I think I understand what you meant but I still don't think that is a valid because it still is not clear what happened to the variable (according to your EBNF). Consider the following:

In the current shell implementations we have something like this

The variables here are accessed with $echo note that echo is a command but the token = takes precedence. This is possible because the shell language is a language first but falls back to commands when it fails to work it through.
Note another thing in this is that we never really had to use a let keyword but it was fine when it came to assignment. So the next part.

Now, the main question I had was how are you going to use the variable after you've assigned a value to one. The way its done in the current shells are simply prefixing with $ so will you be using them. If that is the case then is there really a need for the let in the following

let something := "hello"

This could just as easily be

$something := "hello"

but there is no benefit to having this because

something="hello"

will do the exact same thing. So there is no difference between := and = which is why I'm bit confused as to why you think this := declaration and = is different to begin with.

To your point about command and variables are used in different places, what happens here then:

let something = "hello"
echo something

Does it print "something" or does it print "hello". With this question we get to the same point again, how are you accessing this variable?

Finally, the evaluator of shell in fact expands all the values before executing, for example:

So we have something much more dangerous to be considered because essentially you can have self executing variable (I term I just invented lol - maybe there is a better term) which is what the Log4j fiasco is all about. For example, consider something like this:

let something = "rm -r /"
$something  # please don't run this command even though shell would just consider the whole string to be

In the current implementations of shell its not dangerous though because the variable is one ident but this should always be kept in mind I guess. Anyways, I digress.

Rak Laptudirm · Answer 54 · Sun Jan 30 2022 02:02:59 GMT+0800 (China Standard Time)

@tjgurwara99 It seems like we have some misunderstanding regarding this.

:= declares or creates the variable. The other assignment operators just assign to it.
Any identifier in a command(does not start with a keyword) is treated as a string. So:

let something := "hello"
echo something

will print something, not hello.

Using variable values inside a command is done inside the string, with something like template literals. I am still not sure how exactly I want to implement it, but am thinking on the lines of:

echo {something}
echo "{something}, World!"

Taj · Answer 55 · Sun Jan 30 2022 02:27:40 GMT+0800 (China Standard Time)

Lol I feel like we're going around in circles haha.

:= declares or creates the variable. The other assignment operators just assign to it.

Lol I know what that means but there are a few things that don't sit well with me. My precise concern is that in any shell we have failsafe of not having defined a variable but still being able to use it. So it never catches errors when something is not defined it just sends back "" empty string word. So my main question is that are you thinking that something variable doesn't exist to begin with? In that case, I further understand what you meant but another concern pops up in my head (sorry to be a pain about this 😅 - I have a tendency to understand requirements before I begin with something). Now that I understand the preliminary, what happens in this case:

let something := "echo"
something=hahaha

Will the above work in your language? if so what is the difference between let something = "hahaha" and something=hahaha - note the first one is not declaration but assignment. Is the second syntax even going to be valid in your new language (I don't think I would call that language a shell anymore haha).

Another concern is that, what happens when the variable has not been declared...

something=hahaha

or in the other case where the above syntax is invalid

let something = "hahaha"

Like what happens when something is not declared but you use it anyways in the assign. Do you throw error? Or do you assign it anyways. If you assign it anyways - there is no need for := IMO. If you don't then, it's a different story but then this deviates a lot from shell specification and I would say we should consider making something else entirely. Maybe something like a distinction between a command mode and language mode similar to what vim has a command mode and insert mode - not something that I recommend though since there will be a distinction between scripts and language, and I firmly think there shouldn't be any such distinction...

Anyways, let me know 😄

Taj · Answer 56 · Sun Jan 30 2022 07:20:11 GMT+0800 (China Standard Time)

Hey @raklaptudirm , I thought a lot about this and I think the best approach for us currently is to look at what our program would look like in the actual language - both approaches command and keyword thing that you talked about.

Once you write it down, I'll try and create a Psuedo AST and see if that makes sense. After that we can discuss problems that arise after that...

Rak Laptudirm · Answer 57 · Sun Jan 30 2022 17:05:13 GMT+0800 (China Standard Time)

After that we can discuss problems that arise after that...After that we can discuss problems that arise after that...

I would like to leave some final comments regarding the points you raised in your previous comment @tjgurwara99

Like what happens when something is not declared but you use it anyways in the assign.

Assigning to a non declared variable is considered an error.

If you don't then, it's a different story but then this deviates a lot from shell specification and I would say we should consider making something else entirely.

In this shell, I am trying to take the positives from existing shell implementations, but not to let them prevent me from doing things my way.

Maybe something like a distinction between a command mode and language mode similar to what vim has a command mode and insert mode.

I don't see how that would work, as the language is one, and the shell will just be a repl for the language.

The distinction between a statement(starting with the keyword) and command(not starting with the keyword), along with the let statement for assignments fixes all the above problems elegantly imo. Let me know what you think :)