Language cleanup: remove co-routines?

Question

Language cleanup: remove co-routines?

aardappel opened this issue 4 years ago · comments

Wouter van Oortmerssen commented 4 years ago

Co-routines have been in Lobster since almost the beginning, because they are in theory a great match for a language that aims to be used for programming games and other interactive things: games require you to program over time (across frames) which languages have no natural construct for, and co-routines allow code to resumed in steps (across frames). It seemed like a natural match.

But other than examples, I am not actually using them in my own code. Why?

They are fairly expensive. Upon each yield or resume, they copy the part of the stack they occupy, including stack metadata, and including managing reference counts etc. Over time, Lobster has moved from a dynamic and very high level language to a static, fast, and still fairly high level language, and I prefer to use features I know are fast.
For simple use cases the difference between using them and whatever handwritten code would replace them is fairly small. Only complex webs of co-routines with complex internal state and complex interactions would allow the feature to really shine, but.. I don't write such code.
As soon as you start using co-routines as your game objects, you want to start using them as "objects". But they can't have additional methods beyond "resume". Lobster even has this unique feature that allows you to access co-routine state while the co-routine is dormant, which to my knowledge is pretty unique. But it is all still pretty clumsy.

For comparison here's the shooter tutorial code in different forms:

Basic, imperative case with global variables: https://github.com/aardappel/lobster/blob/master/samples/shooter_tutorial/tut6.lobster. Note how you need global variables because the state needs to survive multiple iterations of the frame loop. This is the simplest for tutorial purposes, though.
With co-routines: https://github.com/aardappel/lobster/blob/274f76ca1f36742469090befbdb7d7eaa108af8f/samples/shooter_tutorial/tut_coro.lobster. And yes, this code indeed does look nicer: state is all local, and algoritms read more natural, in particular look at bullet and how it simply is a while loop that advances position and renders itself! But already here the cracks show: notice how enemy needs to access the state of the bullets. This works because of Lobster's special state access feature (see -> operator), but this would not scale to more complex interactions.
Using frame state: https://github.com/aardappel/lobster/blob/5f7a73fb718d4af87ae5b1affac4017fff2d5606/samples/shooter_tutorial/tut_log.lobster. Wait, what's this? This is an even less known and used Lobster feature that I would like to kill alongside co-routines. The ?= operator allows access to the value of that variable from the previous frame (!) magic! It is inspired by "Functional Reactive Programming", instead is the imperative version of it. It indeed appears to result in the simplest code: no global state, no co-routine abstractions necessary. But it scales even less than co-routines. It doesn't work well with for-loops so for that state we still have to use vectors. It would absolutely not scale to a complex game.
Using objects: https://github.com/aardappel/lobster/blob/master/samples/shooter_tutorial/tut_obj.lobster. I only just added this version, for the purpose of this discussion. Why did I add this very obvious version only just now? Well for one because I've always been a big hater of "object oriented" over engineering, and with the multimethods Lobster used to have, this code would have been somewhat clumsy. But frankly this might well be the nicest version, not any more complex than the coroutine version really, and easier to extend. It's not even object oriented in the sense that no dynamic dispatch is used anywhere.

Besides the above reasons why for programmers this may not be quite as amazing a feature as you'd think, co-routines have an overly large complexity footprint on the implementation. When I added the type inferencer and the lifetime checker, I must have spent at least 20% of time (representing months of work) making it all work with co-routines. co-routines deeply integrate with function calling, the stack, variables, and memory management, in complex ways.

My aim for Lobster is to keep moving it forward to become a faster language, and next step would be to rework how function calling works, which will require a ton of special cases to keep co-routines alive.

For these reason, I am considering cutting the feature out entirely.

As scary as that sounds, also think that most programming languages accumulate features indefinitely until they collapse under their own weight. You can't typically cut features without angering users. But co-routines is something that hardly anyone uses, so it can still be done, and would make Lobster a simpler, sleeker, faster language. Also, I am but one person so my ability to make the language better will be improved with less features to support.

You might say, but, can we make co-routines simpler/faster?

You could reduce co-routines to just "generators" (like in Python). The advantage of generators is that you don't need to do all this stack saving, since you can simply "call" the for-loop body that drives the use of the generator from the yield statement, leaving the generator on the stack. But wait! we already have this functionality in Lobster, with higher order functions! With non-local returns, you can even exit from these "generators" (the caller is in control, much like with generators). They are not composable in the same way as generators, but most importantly, they are super fast, and well supported by the language and type system.

Another alternative is to regard co-routines merely as a code transformation, similar to the C++20 co-routines. A co-routine would be translated to a class, with all its local variables becoming member variables, and all its control flow changed into a switch such that you can yield from anywhere. You add an extra variable that indicates where you yielded from, such that on the next resume you can return where you left of. This is not trivial, since it requires translating all control flow into this form, but at least you'd end up with predictable performance, and more importantly, not another runtime feature (since it would use existing constructs).

A further idea is something with explicit states, like UnrealScript used to have.

Either way, I don't want to commit to a new co-routine feature right now, but I am likely to remove it until we get there. So, please discuss, throw in your objections or alternative ideas :)

If I remove them, I will make sure it will be in one neat commit, so we can point to a clear diff that shows how they (used to) fit into the language. Just in case we ever change our minds :)

J0eCool · Answer 1 · Wed Jun 17 2020 05:21:24 GMT+0800 (China Standard Time)

Boiling down a complex idea into my own intuitions:

Remove coroutines for now. Don't think about them or design around "here's where I need to leave space for such a thing," because you've already ran the experiment and found them lacking. It will take time/energy/complexity budget away from other potential improvements, for a feature you don't personally use.

In principle it the amount of work you'll need to do for a hypothetical lobster2.0 that has coroutines is going to be some fixed amount, whether you do the work to add coroutines now or later. And if you're not totally sold on the concept, why add them in 1.1 when they can wait for 1.9? (version numbers made up to give a sense of relative timescale)

Wade Brainerd · Answer 2 · Wed Jun 17 2020 08:41:58 GMT+0800 (China Standard Time)

In my experience the problem with coroutines is that they fall short of what you really want, which is cooperative multithreading. I have only seen proprietary game scripting languages (like CoD's GSC) get this really right, though Lua's coroutine implementation comes reasonably near.

The issue with the above example is when the game flow gets more complicated than "bool playing". Here's a simple WIP from something my kid and I have been making called ScreamBox, I can provide the code but GitHub won't let me post it. Keeping track of the problem flow would be really hard without coroutines, yet they make it incredibly straightforward.

Essentially what you want is the ability to call a function which can internally freeze the entire callstack into a "waitable thread" which will be resumed after some condition. For example if you wanted ambient sparks you would write:

function spark_thread(pos)
  while true do
    spawn add_spark(pos)
    wait(random(0.1,0.5))
  end
end

function ambient_sparks()
  spawn spark_thread(1,5)
  spawn spark_thread(3,5)
  spawn spark_thread(5,5)
end

In this case spawn runs the function in a new thread context, and wait puts current thread to sleep for the given number of seconds. It's also important to have signals so wait can be given a signal name instead of a time, so threads can unlock other threads when conditions are met.

Doing this with Lua's asymmetric coroutines is possible since a yielding coroutine goes all the way up to the top level function. I believe it's even possible to write a debugger which shows the list of active threads and what they are waiting on, can quickly switch contexts while debugging, etc. This kind of fast and user friendly debugger is an essential part of any professional game scripting system.

Wouter van Oortmerssen · Answer 3 · Wed Jun 17 2020 09:10:47 GMT+0800 (China Standard Time)

@wadetb thanks for the feedback!

Lobster's current co-routines are indeed asymmetric much like Lua, and each have their own stack that is not restricted to the top level function. This is powerful, but indeed also expensive.

Something like wait can be implemented on top of this system, in the simplest case by having the yield return how long they want to wait, and the code that drives calling resume to take this into account. Or any other structure.

I agree that it makes for compelling examples with code that is drastically simpler than tracking the time yourself. I am not sure it scales though. Often, you may need to interrupt the wait, still check for other things (or advance an animation) while waiting, or generally do multiple things at the same time. You can try to bake that into the system but it gets complicated fast.

Wade Brainerd · Answer 4 · Wed Jun 17 2020 21:44:33 GMT+0800 (China Standard Time)

Ah cool, I didn't know Lobster's coroutines are asymmetric - that's great. TBH I just got to this ticket via Twitter and haven't used Lobster yet! Looks cool though, would be fun to put it into TIC-80. Also, I didn't know that Lua's coroutines are capable of this either until very recently, though I've been using TIC80/PICO-8 for quite awhile.

I totally get that coroutines are expensive to maintain, but I think part of the problem is people are still mentally stuck on FSMs when they should be using cooperative multithreading.

Regarding waiting on multiple things, the wait primitive works as you suggest- though it can return a time value (in frames even though I gave seconds above), a signal name, or a list of signal names and ideally a timeout value (like select in POSIX). This provides efficiency since the scheduler can wake up coroutines only when their conditions are met - rather than activating the coroutine and polling each frame. For instance a coroutine that is waiting for the player to hit a trigger costs nothing. Of course any arbitrary wait condition can still be implemented as a loop over a single-frame wait.

Animation and other asynchronous tracks are accomplished by just adding lots of cooperative threads - each with their own stack and wait state. Cleanup is also easy, just drop coroutines from the list of threads and let the GC collect their state, for instance when leaving a level via the pipe.

In my example above the toad floating animation and dialogue are managed in separate threads. The subtle bounce when you hit a block is a separate thread per block. Each coin is a thread. And then there is a principal thread which advances the problem state. The principal thread communicates to the hint thread through signals.

If a language just principally supported cooperative multithreading (via coroutines or something in the language runtime itself) with an excellent debugger (breakpoints on signals and so forth), it would be amazing. This can be built on Lua using the debugger hooks, and likely Lobster too.

Just to give a concrete example, here is the scheduler from the above game. It doesn't have everything you'd want and it's not that efficient, but hopefully gets the idea across:

function add_thread(id, fn, fields)
  eq.threads[id] = mergo({
    id = id,
    co = coroutine.create(fn),
    wait = 0
  }, fields or {})
end

function update_threads(id, fn)
  for k,t in sorted_pairs(eq.threads) do
    if t.wait > 0 then
      t.wait = t.wait - 1
    else
      local status, r = coroutine.resume(t.co, table.unpack(t.args or {}))
      if not status then
        trace("thread " .. k .. " died:\n" .. r)
        eq.threads[k] = nil
      else
        if r ~= nil then
          t.wait = r
        else
          eq.threads[k] = nil
        end
      end
    end
  end
end

function wait_frames(delay)
  coroutine.yield(delay)
end

function wait_signal(signals)
  if type(signals) == "string" then
    signals = {signals}
  end
  while true do
    for k, sig in pairs(signals) do
      if eq.signals[sig] ~= nil then
        eq.signals[sig] = nil
        return sig
      end
    end
    coroutine.yield(1)
  end
end

function set_signal(signal)
  eq.signals[signal] = true
end

And as an example of usage, here's the little floating hint guy:

  add_thread("boo_float", boo_float)

  add_thread("boo_sum_digits", function()
    local sum_exclaims = { "good! ", "hee hee! ", "ya ha! ", "wow! " , "" , "" , "" } 
    local carry_exclaims = { "", "well carried! ", "carry-ific! ", "carry-tastic! ", "...? " , "...? " , "...? " , "...? " } 
    local places = {"1s", "10s", "100s", "1000s", "10000s", "100000s", "millions"}
  
    local n_carried = 0
    local d = 1
    local exclaim = "hint: "
    
    while true do
      local sig = wait_signal({"boo_next_digit", "boo_carry", "boo_finished"})

      if sig == "boo_next_digit" then
        boo_say(exclaim .. "hit the box to add the " .. places[d])
        exclaim = sum_exclaims[d]
        d = d + 1

      elseif sig == "boo_finished" then
        boo_say("and now you have your answer. hee hee!")
        wait_frames(180)
        boo_disappear()

        break

      elseif sig == "boo_carry" then
        if n_carried == 0 then
          boo_say("ya ha! the sum of the " .. places[d - 1] .. " is greater than ten")
        else
          boo_say("wha?! the sum of the " .. places[d - 1] .. " is ALSO greater than ten")
        end
        n_carried = n_carried + 1
        exclaim = carry_exclaims[d]
      end
    end

  end)

Wouter van Oortmerssen · Answer 5 · Thu Jun 18 2020 02:27:26 GMT+0800 (China Standard Time)

@wadetb Yup, I had something similar in mind to your code, and this would work in Lobster as-is just the same. The signals are a nice addition, though in your current code that checks once per frame, which can be fixed.

As someone who doesn't use Lobster (yet?), one thing to be aware of is that Lobster never was meant as pure "gameplay scripting" language, one that sits in engines and is only called when an entity needs to react. It is meant as a language you write the whole game in, including parts you'd traditionally think of as part of the engine, with only the most performance sensitive bits of rendering/physics code remaining in C++ library code. As in, the language is the whole program. Even in the simple examples linked in my original post, you can see the language is in control of the frame-loop, not the engine. It's the language calling into the engine to render stuff, not the engine calling into the language to do gameplay stuff. As such, it has a focus on being a relative static, fast language that can do everything on a per frame basis, much like most C++ engine code.

I'd still like to enable the kind of gameplay patterns you are talking about, but I'd rather add lightweight features that enable people to program such a system in user code than the currently heavyweight language feature that is asymmetric co-routines.

You can implement your system in a language without co-routines, by passing to wait either a lambda / function value that is the continuation (execute this function once we're done waiting) or passing a string/enum and have a switch statement that tests for those values. That of course is excessively clumsy, especially in regards to variables that need to be available both before and after wait, or even worse if you wanted to do wait inside a loop. So the question is, what language features can we supply that make this easier?

Wouter van Oortmerssen · Answer 6 · Thu Jun 18 2020 02:49:49 GMT+0800 (China Standard Time)

To elaborate on my original post, when I mentioned that I'd rather have a co-routine feature that would translate to existing functionality, I mean something like this:

coroutine co(n:int):
    while n:
        yield n
        n--
    return 0

To get translated into:

struct co:
    n:int
    state:int = 0
    def resume():
        while true:
            switch state:
                case 0:
                    if n:          // while condition
                        state = 1
                        return n   // body up to yield
                    else:
                        state = 2  // exit loop
                case 1:            // body following yield
                    n--
                    state = 0      // loop back
                case 2:            // beyond loop
                    return 0

That may look like a horrible code explosion, but it is actually very efficient, certainly an order of magnitude more efficient than the current co-routines. And it requires zero support in the codegen and runtime, these things would behave like regular objects. And can still implement most of what @wadetb likes, with the exception that it is not stackful anymore, so you can't yield from a function you call.

That said, even implementing the above (unpacking a while loop to all these states) would be a bit of work, so not making any promises just yet :)

Wouter van Oortmerssen · Answer 7 · Wed Jul 01 2020 03:21:07 GMT+0800 (China Standard Time)

I've been thinking about other co-routine-like features that are less "heavy" than an actual co-routine.

Essentially, with a co-routine you have a function call that can be cut down the middle. Execute some of it, and then rest later, after the caller does some stuff in between.

This relates to another feature which I would like to have at some point: proper tail calls:

def f(a):
    // some stuff goes here
    return g(a + 1)

Normally, g would execute while f is still unfinished (and thus on the stack), so such calls, if they're recursive, risk blowing up the stack. But what if instead we unwind the call to f, and only then call g? It would be pretty simple, would just require a different kind of return that takes a function value in addition to the args as return value, and calls them as the last thing it does.

This is cool because you can use tail calls like a kind of goto: jump to this function next, without any stack "cost". E.g. to implement state machines.

But what if instead of calling this function value at the end of return.. we store it? Like a mini "closure" object (except that it doesn't close over anything, it just has arguments), to be called at any point in the future. This essentially would implement something like co-routines.

def iterate(n):
    if n:
        return n, tail iterate(n - 1)
    else:
         return 0, nil

v, c = iterate(10)
// do other stuff in between
// call the "tail"
v, c = resume c

(syntax experimental).
Maybe not as syntactically convenient as a while loop in a co-routine, but circumvents the whole need of turning all control structures into a switch, and moving all variables into a class. Here, the control flow is the function pointer, and the data is whatever you pass it, maximally making use of existing features.

You could extend this with a way to make these tail calls be unnamed functions, e.g.

return tail fn(n): ...

where n is a variable automatically passed in from the context.

Wouter van Oortmerssen · Answer 8 · Sat Jul 04 2020 03:41:40 GMT+0800 (China Standard Time)

Aaand.. it is done.. co-routines are gone.

The last version of the language that does contain co-routines is marked with the label last_coroutine in git, and the diff showing exactly what was all removed (and thus gives a good picture on how it used to work) is
9483b70.

Similarly, the last version that has frame log functionality is last_frame_log and the diff is 274f76c.

Job van der Zwan · Answer 9 · Tue Aug 04 2020 22:08:41 GMT+0800 (China Standard Time)

So I'm a bit late to the party, but I'd like to reply to this question:

So the question is, what language features can we supply that make this easier?

I'd like to suggest another paradigm to look into that hopefully has some good ideas to steal from, and might partially solve the problem: synchronous concurrency, as used in languages like Céu and Esterel[0][1]. (also, I have the feeling I have suggested this before, but I couldn't find anything in this repo)

Céu has a single-threaded and synchronous concurrency model and instead of threads it uses trails. Now, trails do not execute in parallel, but suspend execution and await in parallel. They execute like normal sequential code until they hit an await keyword, at which point they suspend execution until the event they are waiting for fires.

Here's a simple Céu code sample showing how this works:

loop do
    par/and do
        await 100ms;
        _printf("Hello ");
    with
        await 250ms;
        _printf("World!\n");
    end
end

We construct two trails with the par/and construct. Both immediately suspend and tell the scheduler to be woken up later: one in 100ms, and one in 250ms (as you can see Céu supports waiting for a certain time as an event). After 100ms pass, the scheduler wakes up the first trail, which prints "Hello" and then terminates. Then 150ms later the second trial resumes, prints "World!" and terminates.

The and in par/and means that the parallel composition waits until all trails are finished before rejoining and continuing with the code below it - in this case it loops. There is also the par/or construct, which terminates when any trail terminates (technically speaking it actually aborts resuming). For example:

input  none   BUTTON;
output on/off LED;
par/or do
    await BUTTON;
with
    loop do
        await 1s;
        emit LED(on);
        await 1s;
        emit LED(off);
    end
end
emit LED(off);

We have an external input event that reacts to a button press, and an output event that is wired up to a a bit of code that turns a LED on or off.

Here the first trail immediately suspends and waits for a button press, after which the second trail starts its loop and suspend itself for one second. Now both trails are suspended. Assuming no button is pressed, the second trail "awakes" one second later, emits LED(on), then suspends for a second again, after which it will awake and emit LED(off), loop and suspend for another second, etc.

However, as soon as a button press event is fired, the first trail resumes and terminates. Because this is a par/or group of trails, this aborts the second trail as well. In other words: this code says "blink every second until someone pushes a button, which turns off the LED and ends the program". I personally find it very intuitive to model concurrency this way.

This paradigm was originally designed for real-time embedded systems. On top of the built-in language support for reacting to timed events that gives it some very interesting properties:

the code is essentially organized around waking up suspended thread. That means that the above example basically lets an embedded device go into a sleep state for one second, do a tiny bit of work, and sleep again. That makes programming software that is energy efficient quite easy, which is useful in embedded situations where battery life is a concern
trails don't need a stack and are very memory-efficient - have only a mere handful of bytes of overhead per trail. Most forms of concurrency have at least a few kilobytes per (green) thread. The only examples I know of that can compete are "fibers" and other thread systems designed for embedded (seems to be a product of designing around embedded programming constraints).
the order of trail execution deterministic. When two trails resume execution in response to the same event, they do so in top-to-bottom order. Take the following rewrite of the first example:

par/and do
    // every X do is syntactic sugar that expands to
    // loop do
    //     await x;
    every 100ms do
        _printf("Hello ");
    end
with
    every 250ms do
        _printf("World! ");
    end
end

This will print Hello Hello World! Hello Hello Hello World! - after 500 ms both trails wake up "at the same time", in which case the trails are awakened in lexical order. Hence three Hellos, instead of Hello Hello World! Hello Hello World! Hello .

It should be noted that external events (like the button press) are inherently asynchronous inputs - when many small embedded devices communicate, they will not agree on "global" time (say, when two devices fire events at each other they might disagree on which event fired first). Céu guarantees that reaction to external events are scheduled in order of arrival, so the determinism mentioned before only applies to that, and the order of execution of local trails and internal (synchronous) events.

One downside to trails is that the scheduler does not scale very efficiently CPU-wise when using a lot of trails in one "program". I don't know if that is inherent to the paradigm or an implementation detail of Céu (I think it uses a simple O(n²) algorithm, possibly to minimize memory overhead). Either way, in the context of embedded devices that is rarely an issue - there aren't hundreds or thousands of trails running in parallel like other languages use (green) threads. In practice, languages in this paradigm design their programs with a "globally asynchronous, locally synchronous" model - one could say that each device is like a CPU running its own thread, and scaling happens by using multiple devices that communicate.

What Céu has been experimenting with over the years is trying to combine these concurrency approached with object-oriented approaches to create "reactive" data structures:

code/await Hello_World (none) -> NEVER do
    every 1s do
        _printf("Hello World!\n");  // prints "Hello World!" every second
    end
end
await Hello_World();                // never awakes

I think @wadetb's example of ambient sparks could easily be modeled with this, but this post is long enough already.

So anyway, I'm quite a fan of all this, because I really think that a lot of the time when reaching for concurrency we're really dealing with "small, local" concurrency where this would be the most intuitive approach. If it is combined with "big, heavy" thread- or agent-based concurrency to build more complicated systems the scaling problem might go away too. For example, each thread could handle its own synchronous events and they communicate via asynchronous events.

Sadly, Céu is basically a semi-abandoned language design research language at this point, so I'm always trying to get other people to steal cool ideas from it :p. The (fairly small) manual and the published research papers cover a lot more design ideas that were tried out and gives a feeling for how the language "ticks"[2][3].

Of particular interest might be:

Structured Synchronous Reactive Programming for Game Development — Case Study: On Rewriting Pingus from C++ to Céu

A GALS Approach for Programming Distributed Interactive Multimedia Applications

Hope there were some interesting ideas in here :)

Keep up the good work!

[0] http://ceu-lang.org/

[1] https://en.wikipedia.org/wiki/Esterel

[2] https://ceu-lang.github.io/ceu/out/manual/v0.30/

[3] http://ceu-lang.org/publications.html

Wouter van Oortmerssen · Answer 10 · Wed Aug 05 2020 00:01:19 GMT+0800 (China Standard Time)

@JobLeonard Thanks for the explanation! I am actually somewhat familiar with both Esterel and Céu already, as I've done a fair bit of design around dataflow languages, and find these synchronous languages super elegant.

There are a couple of things though that make these languages not directly suitable for games. First is that the amount of parallelism needs to be dynamic, i.e. you'd rarely have a par statement with exactly N cases, instead you want each game object to have a "cooperative thread" attached, and these objects spawn and die constantly. There may be hundreds or even thousands of them, and they may either run the same or different code.

Ideally they would support function calling inside that thread, though I am ready to give up on the idea that they should be able to yield from within a function call, as that is way too expensive implementation-wise.

Then there is timing, games are first and foremost frame based, and often want to do something every frame. You may not assume a frame is always 16.6ms, if things slow down for whatever reason, the simulation should work reasonably with a timestep of any amount of ms. That makes things less deterministic than these synchronous languages, though all "threads" will experience the same timing changes.

In the end, if Lobster re-gains a feature to support cooperative threads, I am looking for something fairly efficient that people can build their use case on, whether that looks like co-routines, async-await, or something else.

Job van der Zwan · Answer 11 · Wed Aug 05 2020 03:15:07 GMT+0800 (China Standard Time)

Hah, I don't know why but I feel like I should have expected you to be familiar with these languages! I'll just reply to your remarks as FYIs to keep in mind, not in an attempt to convince you to use this model. I'm pretty certain you'll come up with a good trade-off between an efficient yet elegant solution in the long run.

Regarding the dynamic threads, I assume you're already familiar with the pool containers of Céu and that this doesn't quite do the job for some reason? Just in case you are not: essentially, one can spawn those code/await objects inside a pool, which may be bound or unbound:

Bound pool that can spawn at most five Birds:

Unbound pool that is only limited by memory and scheduler constraints:

(screenshots from An overview of Céu, code can be found at https://github.com/fsantanna/ceu-sdl-birds)

If the second image had a bound pool, it would simply not spawn anything if there were five active birds - that makes it quite easy to, say, make a particle spawner with a limited the number of particles.

Regarding frames, the ceu-sdl implementation gives one example of how one could handle this: by treating every frame as an external (hence async) event that passes the time difference with the previous frame as a number. That way one can make the code respond to variable time steps time steps.

It's also easy to make multiple loops that react to an initiated frame, just do every DRAW do <...> end

Recall that Céu is also guaranteed to always schedule external events in order, so in the (hopefully rare) case that so much calculation happens in one frame that it takes more than the 16.6ms it takes to load the second frame, that second frame is still scheduled to fire immediately after all events triggered by the first frame end.

For example, the Bird ~~class~~.. ~~actor~~.. er.. the Bird code/await object above reacts to two events: UPDATE and DRAW, each of which are initiated by the SDL environment:

What's kind of cool here is that the code/await object frees up it's entry in the pool when it terminates. Similarly, pools are also lexically scoped, so that makes memory management quite simple:

(aside: honestly, to me this code/await stuff sounds just like a struct that can only be communicated with via events.. which sounds awfully actor-like to me but I guess those have formal, rigorous definitions that aren't quite met)

Anyway, you're absolutely right that the async external events make the code less deterministic, but it's still "locally deterministic": if one module has multiple trails that react to the same event, they still do so in a deterministic fashion. So that's still pretty easy to reason about.

Writing this out reminded me why I ended up not using Céu so much, aside from feeling a bit verbose: the language makes it impossible to hold on to references or pointers outside of scope, to ensure we never interact with a "dead" code/await instance. Which is an understandable goal for safety-critical embedded applications, but as a result the ability to interact with code/await structures is really limited: they can only be really interacted with through events or iterators over pools. All in all that makes it really hard or maybe even impossible to write efficient data structures. For example, in the demo hit detection between Birds is implemented like so:

We just iterate over the pool twice and compare rectangles - that's O(n²). That's fine for drawing twenty flappy birds on the screen, but it isn't going to cut it in larger games with hundreds or thousands of interacting objects on the screen! What we'd like to do instead is, say, use spatial binning or a quad-tree and only compare Birds that are guaranteed to be close enough to each other, but as far as I can tell there's no way to do that directly in Céu. I don't think that's a feature inherent to the paradigm though, but I just mention this in case it is.

Wouter van Oortmerssen · Answer 12 · Thu Aug 06 2020 09:08:05 GMT+0800 (China Standard Time)

@JobLeonard was not aware it had "unlimited" pools, indeed!

While I love the strict lexical / nested resource / lifetime management (I am actually designing a language currently that does something similar, but for different purposes than co-routines), it is a limitation, and not sure how well it would fit Lobster, that does have mostly arbitrary structure. The birds example is very helpful to look thru, but it also makes it clear how much relies on that strict structure.. once you must be able to do all creation/communicating actions in arbitrary places, it may not work so well.

Which I guess comes from the description: its synchronous. Meaning underneath it works most efficiently if it can interleave all these parallel functions statically. I bet the unbounded pools are a lot less efficient than the bounded ones for that reason.

It is theoretically possible to take set of co-routines, as long as there's statically a bounded number of types of them, and compile it all down to one big soup of gotos, meaning one yield jumps directly into the code of the next one of the same type or to the next type. If there's an exact known amount of them that gets even more efficient. I bet Ceu can do that too, as it often has even more information (it can use static timing information to know ordering ahead of time) but I'm not sure if the compiler is that advanced.

If this goto translation could be guaranteed, that would make it very suitable for a future Lobster feature.

One other fun language to look at in this context is Flogo II by Chris Hancock (https://llk.media.mit.edu/papers/ch-phd.pdf) which has one of the nicest high-level parallel control structures I know of.

Wouter van Oortmerssen · Answer 13 · Tue May 03 2022 02:56:52 GMT+0800 (China Standard Time)

Relevant: mistakes made in C++ co-routines: https://probablydance.com/2021/10/31/c-coroutines-do-not-spark-joy/