gcl1: improvement proposals for GCL

Question

gcl1: improvement proposals for GCL

kervinck opened this issue 5 years ago · comments

Marcel van Kervinck commented 5 years ago

======================================================
Improvement proposals for GCL notation (gcl0x -> gcl1)
======================================================

Coming from gc0x, there are 3 big areas of improvement:

1. Consistency issues
2. Constant and label definitions
3. Macro definitions

For what can become a well-rounded `gcl1' lets focus on 1 and 2.
We can leave 3 to a later `gcl2'

Marcel van Kervinck · Answer 1 · Thu Jun 20 2019 05:54:11 GMT+0800 (China Standard Time)

==========================
Part 1: Consistency issues
==========================

Naming is a concept that is orthogonal to that of variables and
constants. gcl0x notation doesn't fully reflect this, and it is
inconsistent in some ways because of that. It got somewhat confused
because there are two namespaces at play: that of automatically
allocated user variables, and that of system defined constants
(some but not all may then refer to system variables).

As a result, there are shortcomings to gcl0. For example there is
no way to PEEK from a fixed known address, and accessing system
variables looks ugly in general.

But it can be improved. If we ignore byte-sized entities at first, we
then can have the following syntax scheme for the basic operations:

+-----------------------------------------------+-----------------------+
|                 GCL notation                  |   vCPU compilation    |
+-----------------------+-----------------------+-----------------------+
|    Named variables    |    Named constants    |                       |
+-----------------------+-----------------------+-----------------------+
|           old   new   |           old   new   |                       |
| Load       X     X    | Load      \C;   .C    |  LDW  $DD             |
| Store      X=    X=   | Store     \C:   .C=   |  STW  $DD             |
| Read       X;    X;   | Read            .C;   |  LDI  $DD + DEEK      |
| Write      X:    X:   | Write           .C:   |  DOKE $DD             |
| Address         &X    | Value     \C    \C   *|  LDI  $DD (or LDWI*)  | *See also table 3
+-----------------------+-----------------------+-----------------------+
|   Unnamed variables   |   Unnamed constants   |                       |
+-----------------------+-----------------------+-----------------------+
|           old   new   |           old   new   |                       |
| Load       i;   *i    | Load                  |  LDW  $DD             |
| Store      i:   *i=   | Store                 |  STW  $DD             |
| Read            *i;   | Read                  |  LDI  $DD + DEEK      |
| Write           *i:   | Write                 |  DOKE $DD             |
| Address    i     i    | Value      i ii  i ii |  LDI  $DD (or LDWI*)  | *See also table 3
+-----------------------+-----------------------+-----------------------+
        Table 1. Consistency improvements for word operations

Where:
  X     is a GCL variable's name, automatically allocated in the zero page
  i     is a small integer constant, eg. 123 or $30
  C     is a symbol representing a constant on the host platform.
        Typically predefined in interface.json when compiling with
        compilegcl.py, or set in preceding assembly code with
        define() or label() when compiling from ROMvX.py. Typical examples
        are `vLR', `rawSerial', `screenMemory' `fontData', `buttonUp'.
        (This is how GCL programs link to application-specific SYS functions.)

Change summary
--------------
 .C     Treats symbol C as a GCL variable with address taken from the symbol table
        All operations possible on normal variables are possible (e.g. `.vLR=')
 \C     Unchanged, but now `\C;' and `\C:' have a nicer alternative syntax
 *i     Treats a small integer as a GCL variable with zero page address i
 &X     Takes the address of the automatically assigned GCL variable X.
        Added for completeness. Not even sure if we've needed it already.

Code example
------------
Old:    [if<=0 do \frameCount; #\DOKE #\vPCH loop]
New:    [if<=0 do .frameCount .vPCH: loop]

Marcel van Kervinck · Answer 2 · Thu Jun 20 2019 05:54:47 GMT+0800 (China Standard Time)

===============
Byte operations
===============

GCL variables are always word sized, while many system variables
are single byte, for example `rawSerial' and `romType'.

We can already address the individual bytes of named variables
with prefixes `<' and `>', but again the notation is ugly:
        X<,     Get low byte of X
        X<.     Set low byte of X
        X<++,   Increment byte of X

Only peek and poke look good (`X,' and `X.') but there's no
notation for using an unnamed variable for these. This leads to
cases of inline assembly:

  $7400 [do \POKE# \vAC# \vAC>++ if>0loop]      {Racer_v1.gcl}

With the prefix operators it is only mildly better:
        <X. >X.
        <X++ >X++

The instructions of concern are LD, ST, INC, PEEK and POKE (and LDI).
Lets make a table as we did above for the word operations.

+-----------------------------------------------+-----------------------+
|                 GCL notation                  |   vCPU compilation    |
+-----------------------+-----------------------+-----------------------+
|    Named variables    |    Named constants    |                       |
+-----------------------+-----------------------+-----------------------+
|           old   new   |           old   new   |                       |
| Load      X<,   <X    | Load      \C,   <.C   |  LD   $DD             |
|           X>,   >X    |                 >.C   |  LD   $DD+1           |
| Store     X<.   <X=   | Store     \C.   <.C=  |  ST   $DD             |
|           X>,   >X=   |                 >.C=  |  ST   $DD             |
| Increment X<++  <X++  | Increment \C<++ <.C++ |  INC  $DD             |
|           X>++  >X++  |           \C>++ >.C++ |  INC  $DD+1           |
| Read      X,     X,   | Read             .C,  |  LDI  $DD + PEEK      |
| Write     X.     X.   | Write            .C.  |  POKE $DD             |
| Address         &<X   | Value           <\C   |  LDI  $DD             |
|                 &>X   |                 >\C   |  LDI  $DD             |
+-----------------------+-----------------------+-----------------------+
|   Unnamed variables   |   Unnamed constants   |                       |
+-----------------------+-----------------------+-----------------------+
|           old   new   |           old   new   |                       |
| Load      i,    <*i   | Load                  |  LD   $DD             |
|                 >*i   |                       |  LD   $DD+1           |
| Store     i.    <*i=  | Store                 |  ST   $DD             |
|                 >*i=  |                       |  ST   $DD+1           |
| Increment i<++  <*i++ | Increment             |  INC  $DD             |
|           i>++  >*i++ |                       |  INC  $DD+1           |
| Read             *i,  | Read                  |  LDI  $DD + PEEK      |
| Write            *i.  | Write                 |  POKE $DD             |
| Address         <i <ii| Value           <i <ii|  LDI  $DD             |
|                 >i >ii|                 >i >ii|  LDI  $DD             |
+-----------------------+-----------------------+-----------------------+
        Table 2. Consistency improvements for byte operations

Our Racer example then becomes

Old:    $7400 [do \POKE# \vAC# \vAC>++ if>0loop]
New:    $7400 [do .vAC. >.vAC++ if>0loop]

Marcel van Kervinck · Answer 3 · Thu Jun 20 2019 05:55:10 GMT+0800 (China Standard Time)

===============
Stack variables
===============

With prefix notation we can improve the LDLW/STLW operations:

        old     new
        ---     ---
        i%      %i      Get variable at stack offset i          LDLW  $DD
        i%=     %i=     Set variable at stack offset i          STLW  $DD

This hints better that '%i' is the name of a variable we can get and set.

For ALLOC, `i++' and `i--' are plain confusing. Without much prior
knowledge and coming from a C background, one expects it modifies
vAC. But it modifies vSP instead. The notation must therefore also
improve. We have a choice between two concepts:

Option A
        old     new
        ---     ---
        i++     i%+     Add i to vSP                            ALLOC $DD
        i--     i%-     Subtract i from vSP                     ALLOC -$DD

Option B
        old     new
        ---     ---
        i++     %i+     Add i to vSP                            ALLOC $DD
        i--     %i-     Subtract i from vSP                     ALLOC -$DD

I don't like option B, because we're not "adding" the stack variable `%i'.

Marcel van Kervinck · Answer 4 · Thu Jun 20 2019 05:55:45 GMT+0800 (China Standard Time)

======================================
Part 2. Constant and label definitions
======================================

Now we can use the same notation to define labels and use them.
In the above terminology these are nothing but named constants.

First I suggest to retire the `$300:' notation for setting the
compilation address, because in gcl0x the meaning of the postfix
colon depends on the magnitude... With prefix notation we have a
perfectly readable alternative:

        *=$300

This then leads to the folloing notation for defining labels:

        label=*

Or defining other constants:

        Blue=$20

We can use the normal operators with such constants:

        indent=2
        Pos \indent+ Pos=               { Same as: Pos 2+ Pos= }

We can even define our own zero page variables and bypass the
automatic allocation:

        V=$81
        1 .V=

Marcel van Kervinck · Answer 5 · Thu Jun 20 2019 05:56:59 GMT+0800 (China Standard Time)

==============
Implementation
==============

If constants or labels are defined later than used, a two-pass
compilation approach is needed. However, gcl.py is single-pass and
for simplicity we really like to keep it that way. Fortunately,
asm.py is already doing something like this with its own symbol
table. The only limitation is that its mechanism only works for
retro-fitting byte values, not for arbitrary word values.  In GCL
we can still make use the same mechanism if we tell it if we want
a byte or a word.

As a notation, I propose we use single-\ for byte values, and double-\
for word constants. This is the most in line with previous usage:

        \C              for forcing a LDI instruction
        \\C             for forcing a LDWI instruction

In code generation the difference is between
        emitOp('LDI'); emit(lo('C'))
and
        emitOp('LDWI'); emit(lo('C')); emit(hi('C')).

However, in the current implementation of asm.py the following
will NOT give a warning:
        \C { ...code... } C=ii

This is dangerous, because gcl0x automatically selects the correct
instruction sequence when we write `\C'. It therefore won't silently
lose its value. It's not too easy to  add '\\' to gcl0x, so we need
a slightly different approach: let the assembler do the range check.
It currently has _refsL[] and _refsH[] lists that end() uses to
pick the desired word half. We can add a third list that behaves
as _refsL[] but fails for out of range values: lets call it _refsB[]
and define:

def byte(name):
  _refsB.append((name, _romSize))
  return 0 # placeholder

We then get something like this:
        GCL     Generation
        ---     ----------
        \C      emitOp('LDI');  emit(byte('C'))
        \\C     emitOp('LDWI'); emit(lo('C')); emit(hi('C'))
        <C      emitOp('LDI');  emit(lo('C'))
        >C      emitOp('LDI');  emit(hi('C'))

        +-----------------------+-----------------------+
        |     GCL notation      |   vCPU compilation    |
        +-----------------------+-----------------------+
        |    Named constants    |                       |
        +-----------------------+-----------------------+
        |           old   new   |                       |
        | Value     \C    \C    |  LDI  $DD             |
        |                 \\C   |  LDWI $DDDD           |
        +-----------------------+-----------------------+
        |   Unnamed constants   |                       |
        +-----------------------+-----------------------+
        |           old   new   |                       |
        | Value      i     i    |  LDI  $DD             |
        |            ii    ii   |  LDWI $DDDD           |
        +-----------------------+-----------------------+
        Table 3. Consistency improvements for byte operations

Edit 2019-06-21:

    \\C             for forcing a LDWI instruction

I figure the \\-notation isn't necessary. We should simply always emit LDWI for labels that are still unresolved. Therefore we can always keep typing single-\ as before.

Marcel van Kervinck · Answer 6 · Thu Jun 20 2019 05:57:47 GMT+0800 (China Standard Time)

==============
Migration path
==============

gcl0x: Add warnings for notations that will be removed or change in gcl1
       Accept the new notations already, to help migration
                Warn i=   -> i:
                Warn \ii  -> \\ii
                Warn ii:  -> *=ii
                Support #i `text
                Support <X++ >X++
                Support <X, >X.
                Support \\ii

gcl1:  Make incompatible changes
                Remove i=
                Remove ii:
                Remove \ii
       Add warnings for old notations that have a nicer new alternative
                Warn i#   -> #i or `text
                Warn X<++ -> <X++
                Warn X>++ -> >X++
                Warn X<,  -> <X,
                Warn X>.  -> >X.

gclN:  Hypothetical future version
                Remove i#
                Remove X<++ X>++
                Remove X<, X>.

Marcel van Kervinck · Answer 7 · Thu Jun 27 2019 01:36:42 GMT+0800 (China Standard Time)

Labels now supported with this commit: 7347915

Marcel van Kervinck · Answer 8 · Tue Oct 22 2019 17:23:19 GMT+0800 (China Standard Time)

I plan another change:

The SYS call operator should change from i! to i!!, with i still the maximum number of needed cycles. For example 134!! when calling SYS_VDrawBits_134. The reason is the new CALLI operator needs a notation, and i! is by far the most logical (or actually ii! for an immediate call to address ii).

This way ! refers to RAM calls (e.g. F! and $2600!), and !! refers to ROM calls. Different address spaces, different instruction set, different meaning of the operand.

In the transition period, the compiler can emit SYS for i<256 and not complain. With gcl1 this usage should be flagged as deprecated.

Marcel van Kervinck · Answer 9 · Mon Apr 06 2020 18:01:34 GMT+0800 (China Standard Time)

From Docs/GCL-language.txt:

We foresee three versions of GCL: gcl0x, gcl1 and gcl2.

gcl0x is what we used to make the built-in applications of ROM v1. It is still evolving, sometimes in backward incompatible ways.

gcl1 will be the final update in notation once we've settled on what GCL should really look like. gcl0x has some inconsistencies in notation that are confusing. Some aren't easy to resolve while maintaining its spirit. We won't take this step easily.

gcl2 will add a macro system. The parenthesis are reserved for that.