Evaluate languages

Question

Evaluate languages

joe-no-body opened this issue 5 years ago · comments

joe-no-body commented 5 years ago

https://github.com/joe-no-body/beyond-spreadsheets/wiki/Language-selection

Raymond Neilson · Answer 1 · Mon Sep 02 2019 22:53:48 GMT+0800 (China Standard Time)

Couple quibbles:

I'm not actually convinced we need to run client code in the browser. In fact, architecturally I think we really have to run it all server-side for multi-user, so we can properly do locking, and should anyways, because we can do better sandboxing there.
I don't think Scheme is actually beginner-friendly at all, tbh.

I'm also going to add Elixir to the list, if that's alright...

joe-no-body · Answer 2 · Tue Sep 03 2019 05:28:23 GMT+0800 (China Standard Time)

client code in the browser

Agreed. Running server-side also lets data be cached, provides a mechanism for users to write code that interacts with non-HTTP services like databases, and enables secure credential storage. Server-side is definitely the way to go.

Scheme

Agreed it's not a great option. I also kinda conflated it with Lisps in general in my head. My main reasons were: the simple syntax would be easy to marshal and unmarshal between text and UI representation, it'd be easy to roll our own if desired, most Lisps have good DSL support, and it's functional. However, the syntax is pretty tough for me to get over, so I'm not really inclined to go with it.

I'm also going to add Elixir to the list, if that's alright...

Yep! Feel free to modify any of the pages, open issues, comment, push branches, etc. as you see fit. There's only one hard boundary (for me as well): don't push to master without a PR. I've applied branch protection to master to enforce that.

P.S. I've started another thread on #2 for general discussion if that works for you.

Raymond Neilson · Answer 3 · Tue Sep 03 2019 05:32:20 GMT+0800 (China Standard Time)

Oh gods, I never push to master without a PR -- hell, even in my personal repos I'll do work only in branches (I'll cop to skipping PRs for simple fast-forwards).

I'm going to strike out client-side code and Scheme from the lang page, then, since (I think) we're both agreed on rejecting them.

joe-no-body · Answer 4 · Tue Sep 03 2019 07:02:35 GMT+0800 (China Standard Time)

Cool. I kinda figured it was a given but I thought I'd throw it out just in case, especially since I locked down the branch. Saw the strikeouts and I agree with that as well. I think we're all good.

Raymond Neilson · Answer 5 · Wed Sep 04 2019 07:20:43 GMT+0800 (China Standard Time)

I glanced at your PoC, and we are way more similar than I thought. That JSON-based DSL-that's-almost-an-AST is something I've built three times in the last year at work.

With that, and Venn of our strong points, I'm going to say we should go with Python (3.7, obvs) for the implementation language. We're both fluent, it's got tons of useful libraries for data (it's basically the lingua franca of data analysis these days), we can optimize with Numpy (and Pandas, and and and) as we need, and it's got our dynamic code generation covered.

In fact, I'll also say I'm heavily leaning towards Python for our user-facing language, too. Specifically, limiting (inline) user code to expressions gets us at least most of the functional paradigm, and controlling the namespace contexts gives us quite a bit of sandboxing. I've been going over the docs for the ast module (as well as the excellent Green Tree Snakes expansion on same), and I think it's totally doable. Constraining user-supplied modules is another, trickier problem - mostly because useful things for users are also useful for attackers - but there are approaches we can use. Plus, it means exporting to Jupyter notebooks is easier, interfacing with ML libraries is easier, et cetera ad nauseam.

Bonus: we can comb through PySpread and see if there's anything useful to borrow...

joe-no-body · Answer 6 · Thu Sep 12 2019 10:48:02 GMT+0800 (China Standard Time)

Yo that's awesome. I was debating whether the PoC is maybe too low level for what we actually want but I'm glad you've been thinking similarly.

No objections on Python here for me, on both ends. Common fluency is good and the syntax is so close to existing spreadsheet formula syntax that it fits the language need.

One part that may be tricky with Python is the bidirectional serialization from code to UI for blocks. Seems doable but potentially painful. Green Tree Snakes (which is indeed excellent 👍) mentions astor for AST->Python code so that could work. (I also may be totally overlooking some easier approach we could take here.)

Re: sandboxing/security, I know I've read that Python sandboxing is a pretty intractable problem (see e.g. pysandbox), so to start with, I think taking a Jupyter-style approach and basically just running localhost-only with some client security makes sense (see their doc). At the point where we have to worry about untrusted user code running on a server, running the whole app in an isolated environment would probably be easiest, although we could also outsource execution of user code to a sandboxed Python interpreter or something.