Refactoring Evaluation and Encoding

Question

Refactoring Evaluation and Encoding

joha2 opened this issue 8 years ago · comments

After digging through the code for a while I have some ideas for refactoring (also partly mentioned at the Wiki):

First: Maybe we should get rid of the / notation and the ; BEFORE evaluation or encoding (e.g. by transforming * 3 | + 1 2; into the standard form (* 3 (+ 1 2))). Since the former notation is only more human-friendly than the later one.
Afterwards we obtain a list of list of list of ... list of strings from that standard form as output. This would simplify the evaluation and encoding code because you have less cases to consider.
First and a half: After discovering that the so called process mining can help for the automatic decoding of a message like cosmicos (tested with pypm and a proper formated old message), we should think about the bit encoding scheme of the commands again: The code words should have equal lengths and commands should be seperated from data in their respective format (i.e. intro is atm encoded as 203 and the binary digit 0 in the unary operator is encoded 203, too). Maybe commands should have 6bit codes (like 21010103 as already seen in the message) and data should be 8bit (e.g. 2000000003 for 0, 2000000013 for 1 and so on). This means that intro is encoded by 20000003 instead of 203. In my opinion this would simplify the automatic decoding very much. Does this make sense?
Second: Perhaps we should split up evaluation and message encoding, but start them both after the same preprocessing phase (i.e. the expression is in a well-defined state where it is easy to evaluate and easy to encode, too); In the actual state of the code there is IMHO no such intermediate well-defined state (or I messed it up ;-)); also encodeSymbols is used for both, evaluation (e.g. parsing of ints, or lookup vocabulary) and codifying (e.g. generating bitcodes for commands and numbers).

Alan Williams · Answer 1 · Wed Jul 13 2016 05:30:43 GMT+0800 (China Standard Time)

With regards to comment 1, I would think that would make sense. You can make a pull request, however, i'll hold off merging it due to @paulfitz's absence.
With comment 2, that also would make sense, and if you put a pull request into that, I think that it should be immediately mergable. One clarification though, when you say "start them both", do you mean create a second process?

joha2 · Answer 2 · Wed Jul 13 2016 06:22:01 GMT+0800 (China Standard Time)

I will prepare a pull request once I removed all the trace commands from the code ;-)
Sorry for the unclear sentences :-( By 'start them both' I mean that there is a unique preprocessing stage and both processes (encoding and evaluation) take its result as an input. The result of this preprocessing step is to be determined, but an initial step could be the first point I mentioned above.

Alan Williams · Answer 3 · Wed Jul 13 2016 06:26:05 GMT+0800 (China Standard Time)

Okay. That makes sense. Thanks for clarifying.

Paul Fitzpatrick · Answer 4 · Wed Jul 27 2016 21:37:41 GMT+0800 (China Standard Time)

Re first - / and ;, I'd vote for preserving this structure long enough to be used in encoded messages. I'd be totally cool with encodings that don't use them. But a previous iteration of cosmicos didn't have them, and things felt a lot better when they showed up. Messages got flatter and more clearly delineated. Agreed that it would be simpler in implementation to discard them, but in as much as I can put myself in the shoes of a receiver, it feels better to keep them (or at least the option of using them in an encoding). Happy to be discuss further, not religious about this.

Re first and a half - totally agree with changing encoding. I was initially excited to not have any symbols in cosmicos, with intro being identical to the number 0 in the final message. Lack of symbols simplified the language a lot, especially later on when it comes to self-reference and quoting, and also made the messages very short in terms of bit count. In retrospect, it is a dumb idea, making understanding the initial part of the message harder. So my current idea is to make the encoding of symbols a plug-in, where you can basically choose what you want. When encoding in bits, I'd agree that the length of symbols (especially symbols used early in the message) should be longer, to make them easier to spot prior to understanding details of language.

Re second - there's definitely some cleanup to do here, agreed :-)

joha2 · Answer 5 · Thu Jul 28 2016 00:36:07 GMT+0800 (China Standard Time)

Re re first: I totally understand your point. For me as a human reader the / and ; also simplify and clarify the syntax. My suggestion to get rid of them arised only due to the following points:

/ and ; also have to be encoded in the message (atm by using 023 and 2233) and therefore introduce another symbol which has to be understood by the receiver
the task of automatic decoding of the message is much more complicated by using these symbols, because brackets introduce some nesting, but from 023 a receiver may not understand whether to perform the commands in a nested or in a sequential manner.
and simplification of the syntax by using these symbols is maybe a human-centric perspective

As already mentioned in the wiki: When I substituted the 2, 3 by (, ) for me it was not clear that 023 means: Perform the following commands in a nested manner; but maybe this is also human-centric perspective ;-) As a compromise maybe one can add a flag to the compilation whether to use those symbols or not, would that be possible?
By introducing two functions which transparently switch between those two syntax schemes (one is already implemented in my fork) this should maybe not tooooo difficult ;-)

Re re first and a half: Sorry I did not really understand your second point. The final message still should only contain the symbols 0123 where identifiers are encoded in terms of some fancy calculated bitcodes. Is this correct? I did not understand how to introduce symbols (texty ones?) into the final message. Or do you want them only to appear at the evaluation stage?

Re re second: Further discussion maybe in #14 and #16

Paul Fitzpatrick · Answer 6 · Sat Aug 06 2016 06:47:31 GMT+0800 (China Standard Time)

Yes, a flag in compilation is totally possible. I'd like in fact to have specification of the coding details factored out more, so it is easy to have multiple possible encodings of the message. I've played with many myself and it would be nice to have them and others side by side as a "zoo", rather than being forced to commit to one true encoding.

Re re re first and a half :-) - sorry for being unclear. I'm considering alternate encodings, not limited to 4 symbols. Ideally there'd be flexibility to map to the physics of the transmission process. See e.g. the one used in the dearet scenario. Or Lincos had from time to time embedded clicks and noise and other bits and bobs. I haven't fully fleshed out my thoughts here, but something like:

Ideally, we'd like names to be quirky and easy to search for and recognize directly in a recording of whatever stream the receiver gets. We want names that are distinctive on as low a level as possible in the medium being used. For just about any model of the receiver's intelligence, this will help them out. Distinctive numbers are okay, but maybe we can do better with pops and squeals and gurgles and who knows what? We have to at some point consider error correction and compression, but that can be something we bootstrap to once the essentials of communication are in place.
(http://cosmicos.github.io/2014/08/29/more-naming.html)

Paul Fitzpatrick · Answer 7 · Sat Aug 06 2016 06:53:14 GMT+0800 (China Standard Time)

So maybe let's start with this issue of how / gets encoded. We'll need to set a flag somewhere, ideally in a way that can scale as more variants are added. Any preferences?

joha2 · Answer 8 · Sat Aug 06 2016 15:30:11 GMT+0800 (China Standard Time)

For the user who just wants to play around with the final encoded message, the flag should be specified at the Makefile level e.g. make --encoding="lincos". Further if I understood correctly there are a few self references in the message therefore we need a transparent encoding/decoding functionality. For this the codify-functions should not be hardcoded anymore but maybe extended to a codify-class interface. Is there a possibility to resolve the strong coupling between encoding/decoding and evaluation? Or is there at least the possiblity to reduce this strong coupling to a well-defined class interface (by communication with a well defined transfer format)?

I am just to unexperienced with haxe to know these things, but I think the codify-class afterwards should be modified by some kind of inheritance such that there is a basic interface and every new encoder/decoder should overload these functions.

Do you always want to start from this scheme-like format?

Paul Fitzpatrick · Answer 9 · Sun Aug 14 2016 05:39:29 GMT+0800 (China Standard Time)

About self-references: I looked at the message as is and there isn't much, just this short section http://cosmicos.github.io/message.html#section53. I think I played with this just to see if it would be possible to have a way to concretely talk about the message within itself, following some tips in Lincos. Nothing depends on this right now. Tempted to drop it for simplicity. Will see if there's a design that can reduce the coupling as you suggest.

About starting from this scheme-like format: I'm not committed to it, it is just the most practical I've come up with so far. Do you have an alternative in mind?

joha2 · Answer 10 · Sun Aug 14 2016 06:31:53 GMT+0800 (China Standard Time)

The question for the scheme-like format: No it was only for clarification.

So the building scheme of the message is like:

files generating scm, scm code
pure scm code (with some self-refs via primer command)
encoded message (at the moment four symbols)?

Is this correct? If it is correct: Is it possible to write down the full scm code in step 2 in the syntax scheme without / such that the encoded message is more easy as described above?

Paul Fitzpatrick · Answer 11 · Mon Aug 15 2016 05:32:45 GMT+0800 (China Standard Time)

That looks right, with the caveat that "scm" code is not in fact scheme code. I've tended to give it a .scm extension just to trigger a compatible syntax highlighter. From time to time I've referred to this format as "fritz/ftz" (for immodest reasons). Apologies for the confusion.

You can see the full combined code of step 2 in build/transform/assem.txt or the corresponding .json file. It would be possible to strip / from the message at that time. I've started work on a branch to allow this option.

joha2 · Answer 12 · Mon Aug 15 2016 05:40:21 GMT+0800 (China Standard Time)

Yeah I also referred to the message code by using the 'scm' abbreviation. But for clarification we could also speak of ftz or fritz. :-) I am curious for the new fork, you started :-)

Paul Fitzpatrick · Answer 13 · Tue Aug 16 2016 21:14:42 GMT+0800 (China Standard Time)

See #21 - by setting COSMIC_VARIANT to nested (using cmake gui or doing cmake -DCOSMIC_VARIANT=nested . in build directory), after the next make the build/index.json file should now show a preprocessed field that has your expansion in it, and the message should be appropriately encoded.

Paul Fitzpatrick · Answer 14 · Mon Aug 22 2016 10:48:43 GMT+0800 (China Standard Time)

#22 takes some steps to reduce unnecessary entanglement of parsing and evaluation.

Paul Fitzpatrick · Answer 15 · Fri Sep 07 2018 10:15:17 GMT+0800 (China Standard Time)

There's been quite a lot of cleanup since this issue was opened. I've updated the README.