Extremys / refpersys

Reflexive & Persistent System (artificial intelligence)

Home Page:http://refpersys.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

refpersys

This project is on https://gitlab.com/bstarynk/refpersys/ and has its own web site on http://refpersys.org/ where more details are given.

A research project

The Reflective Persistent System language is a research project, taking many good ideas from Bismon, sharing a lot of goals (except static source code analysis) with it but avoiding bad ideas from it.

For Linux/x86-64 only. Don't even think of running that on non-Linux systems, unless you provide patches for that. And we need a 64 bits processor.

We have multi-threading in mind, but in some limited way. We think of a pool of a few dozen Pthreads at most (but not of a thousand Pthreads).

We absolutely want to avoid any GIL

Don't expect anything useful from RefPerSys before at least 2023. But you could have fun sharing our ideas and experimenting yours.

A rewrite of RefPerSys in C happens on refpersys-in-c.

We considered previously to use the garbage collector from Ravenbrook MPS. Since that project is now obsolete, we gave up that idea.

Don't expect RefPerSys to be a realistic project. It is not (and certainly not before 2025).

Some draft design ideas are written in the RefPerSys design draft which is very incomplete work in progress.

If you happen to know about any research call for proposals or funding opportunities in Europe (Euro zone) about this (e.g. related to artificial general intelligence goals) please mention them to Basile Starynkevitch (France) by email to basile@starynkevitch.net.

persistent values

Like Bismon, RefPerSys is managing an evolving, persistable, heap of dynamically typed, garbage-collected, values, exactly like Bismon does (see §2 Data and its persistence in Bismon of the Bismon draft report...). The semantics -but not the syntax- of values is on purpose close to those of Lisp, Python, Scheme, JavaScript, Go, or even Java, etc.... Most of these RefPerSys values are immutable; for example boxed strings, sets -with dichotomic search inside them- or tuples of references to objects, closures, etc ...- But some of these RefPerSys values are mutable objects, and by convention every mutable value is called an object. Each mutable object has its own lock, and any access or update of mutable data inside objects is generally made under its lock. By exception, some very few, and very often accessed, mutable fields inside objects (e.g. their class) are atomic pointers, for performance reasons. Objects have (exactly like in Bismon) attributes, components, and some optional payload. An attribute is an association between an object (called the key of that attribute) and some RefPerSys arbitrary non-nil value (called the value of that attribute), and each object has its mutable associative table of attributes. A component is an arbitrary RefPerSys value, and each object has some mutable vector of them. The payload is any additional mutable data (e.g. a string buffer, an mutable vector or hashtable of values, some class metadata, etc...), owned by the object. So the data model of a RefPerSys object is as flexible as the data model of JavaScript. However, RefPerSys objects have a mutable class defining their behavior (not their fields, which are represented as attributes) so used for dynamic message dispatching.

Worker threads and agenda of tasklets

RefPerSys will have a small fixed set of worker threads (perhaps a dozen of them), each running some agenda loop; we would have some central data structure (called the agenda, like in Bismon (see §1.7 of the Bismon draft report...) organizing runnable tasklets (e.g. a few FIFO queues of them). A tasklet should conceptually run quickly (in a few milliseconds) and is allowed to add or remove runnable tasklets (including itself) to the agenda. Each worker thread is looping: fetching a runnable tasklet from the agenda, then running that tasklet.

License and copyright

This research project is GPLv3+ licensed and copyrighted by the RefPerSys team, currently made of:

  •  Basile Starynkevitch <basile@starynkevitch.net>, 
     homepage http://starynkevitch.net/Basile/
     near Paris, France. So usual timezone `TZ=MEST`
    
  •  Abhishek Chakravarti <abhishek@taranjali.org>
    
  •  Nimesh Neema <nimeshneema@gmail.com>
    

Some files might be "borrowed" from other similar GPLv3+ licensed projects (notably from Bismon...) and could retain their original copyright owner.

Contributing

Please ask, by email, the above RefPerSys team for C++ coding conventions before starting non-trivial contributions to the C++ runtime of RefPerSys. If you are contributing to its C++ runtime, please run make clean after any git pull.

The GPLv3+ license of RefPerSys is unlikely to change before 2025 (and probably even after).

File conventions

The RefPerSys runtime is implemented in C++17, with hand-written C++ code in *_rps.cc, and has a single C++ header file refpersys.hh. We don't claim to be C++ gurus. Most C++ experts could write more genuine C++ code than we do and will find our C++ code pityful. We just want our runtime to work, not to serve as an example of well written C++17 code.

The prefered C++ compiler (in 2020Q1) for RefPerSys is GCC version 8 or 9.

It could be worthwhile to sometimes compile RefPerSys with clang++ (see http://clang.llvm.org/ for more). In practice make clean then make RPS_BUILD_CXX=clang++. The Clang static analyzer could be useful, but expect a lot of warnings, since C++ dont have flexible array members but we need something similar.

RefPerSys may later also use generated C++ code in some _*.cc file, some generated C code in some _*.c and generated C or C++ headers in some _*.h files. By convention, files starting with an underscore are generated (but they may, or not, being git versioned). Some generated C++ files which are git add-ed are under generated/ subdirectory.

We could need later some C++ generating program (maybe similar in spirit to Bismon's BM_makeconst.cc. it would then be named rps_* for the executable, and fits in a single self-sufficient rps_*.cc C++ file. Perhaps we'll later have some rps_makeconst executable to generate some C++, and its source in some rps_makeconst.cc. So the convention is that any future C++ generating source code is in some rps_*.cc C++ file. In commit 65a8f84aeffc9ba4e468 or newer the dumping facility is scanning hand-written C++ source files to emit generated/rps-constants.hh

Building and dependencies.

The build automation tool used here is GNU make since commit 6d56f50660c7cc41b9 (it was omake before).

You should have compiled and installed Ian Taylor's libbacktrace, e.g. under /usr/local/. You may need to add /usr/local/lib/ in your /etc/ld.so.conf and run ldconfig -v -a after installation of that libbacktrace.

The JsonCPP and and also a mail command in your $PATH.

To install the dependencies on a recent Debian 10 buster or Ubuntu 20 or 21 system, you could run the following steps

  • sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test (for Ubuntu 20.04)
  • sudo apt install -y gcc-11 g++-11 clang-11 libc++-11-dev libc++abi-11-dev (for Ubuntu 20.04)
  • sudo apt install libunistring-dev
  • sudo apt install libjsoncpp-dev
  • sudo apt-get install libssl-dev
  • sudo apt install ccache g++ make build-essential remake gdb automake
  • sudo apt install ttf-unifont ttf-mscorefonts-installer unifont msttcorefonts fonts-ubuntu fonts-tuffy fonts-spleen fonts-roboto fonts-recommended fonts-yanone-kaffeesatz fonts-play fonts-eurofurence fonts-ecolier-court fonts-dejavu fonts-croscore fonts-cegui fonts-inter fonts-inconsolata
  • git clone https://github.com/ianlancetaylor/libbacktrace.git
  • cd libbacktrace
  • ./configure
  • make
  • make install

compiling FLTK with DWARF debug information

RefPerSys is using (e.g. in its commit 843a6f0ddf1c22...) the FLTK graphical user interface toolkit (e.g. FLTK version 1.3.8 or newer...). That toolkit should be compiled with both debug information and optimization, by configuring it with ./configure --enable-debug --with-optim="-O2"

Build instructions

You need a recent C++17 compiler such as g++ (We use GCC version GCC 10) or GCC 11 or clang++ version Clang 11. Look into, and perhaps improve, our Makefile. Build using make -j 3 or more.

You also should do a make clean after any git pull

You may want to edit your $HOME/.refpersys.mk file to contain definitions of GNU make variables for your particular C and C++ compiler, like e.g.

 # file ~/.refpersys.mk
 RPS_BUILD_CC= gcc-11
 RPS_BUILD_CXX= g++-11

You then build with make -j4 refpersys && make all

Garbage collection

RefPerSys is a multi-threaded and garbage-collected system. We are fully aware that multi-thread friendly and efficient garbage collection is a very difficult topic.

The reader unaware of garbage collection terminology (precise vs. conservative GC, tracing garbage collection, copying GC, GC roots, GC locals, mark and sweep GC, incremental GC, write barrier) is advised to read the GC handbook and is expected to have read very carefully the Tracing Garbage Collection wikipage.

We have considered to use Ravenbrook MPS. Unfortunately for us, that very good GC implementation seems unmaintained, and with almost a hundred thousand lines of code is very difficult to grasp, understand, and adopt. Finally, using MPS is not reasonable in our eyes.

We also did consider using Boehm GC. That conservative GC is really simple to use (basically, use GC_MALLOC instead of malloc, etc...) and is C++ friendly. However, it is rather slow (even for allocations of GC-ed zones, and we would have many of them) and might be quite unsuitable for programs having lots of circular references, and reflexive programs have lots of them.

Garbage collection ideas

So we probably are heading towards developing our own precise and multi-thread friendly GC (hopefully "better" than Boehm, but worse than MPS), with the following ideas:

  • local roots in the local frame are explicit, like in Bismon (LOCALFRAME_BM macro of bismon/cmacros_BM.h) or Ocaml (see its §20.5 Living in harmony with the garbage collector and CAMLlocal* and CAMLparam* and CAMLreturn* macros). The local call frame is conventionally reified as the _ local variable, so an automatic variable GC-ed pointer foo is coded _.foo in our C++ runtime. A local frame in RefPerSys should be declared in C++ using RPS_LOCALFRAME.

  • our garbage collector manages memory zones inside a set of mmap-ed memory blocks : either small blocks of a megaword that is 8 megabytes (i.e. RPS_SMALL_BLOCK_SIZE), or large blocks of 8 megawords (i.e. RPS_LARGE_BLOCK_SIZE). Values are inside such memory zones. Mutable objects may contain -perhaps indirectly- pointers to quasivalues (notably in their payload), that is to garbage collected zones which are not first-class values. A typical example of quasivalue could be some bucket in some (fully RefPerSys-implemented) array hash table (appearing as the payload of some object), in which buckets would be some small and mutable dynamic arrays of entries with colliding hashes. Such buckets indeed garbage collected zones, but are not themselves values (since they are mutable, but not reified as objects).

  • The GC allocation operations are explicitly given the pointer to the local frame (i.e. &_, named RPS_CURFRAME), which is linked to the previous call frame and so on. That pointer is passed to every routine needing the GC (i.e. allocating or mutating values); only functions which don't allocate or mutate (e.g. accessor or getter functions) can avoid getting that local frame pointer.

  • The C++ runtime, and any code generated in RefPerSys, should explicitly be in A-normal form. So coding z = f(g(x),y) is forbidden in C++ (where f and g are C++ functions using the GC). Instead, reserve a local slot such as _.tmp1 in the local frame, then code _.tmp1 = g(RPS_CURFRAME, _.x); _.z = f(RPS_CURFRAME, _.tmp1, _.y); In less pedantic terms, we should do only one call (to GC-aware functions) or one allocation per statement; and every such call to some allocation primitive, or to a GC-aware function, should pass the RPS_CURFRAME and use RPL_LOCALFRAME in the calling function.

  • A write barrier should be called after object or quasivalue updates, and before any other allocation or update of some other object, value, or quasivalue. In practice, code _.foo.rps_write_barrier(RPS_CURFRAME) or more simply _.foo.RPS_WRITE_BARRIER()

  • Every garbage-collection aware thread (a thread allocating GC-ed values, mutating GC-ed quasivalues or objects, running the GC forcibly) should call quite often, typically once per few milliseconds, the Rps_GarbageCollector::maybe_garbcoll routine. If this is not possible (e.g. before a potentially blocking read or poll system call), special precautions should be taken. Forgetting to call that maybe_garbcoll function often enough (typically every few milliseconds) could maybe crash the system.

  • Consequently, as a rule of thumb, any routine which can directly or indirectly allocate GC-ed values or quasi-values, or directly or indirectly mutate GC-ed values or quasi-values, should take a calling callframe argument. We might need to consider: putting that specific callframe argument in some global register, using GCC register ... asm extension to define global register variables and compile with the -ffixed-reg code generation option. By coding convention, that calling callframe argument should be preferably named callingfra, and should be the first argument of every function or methods (member functions in C++ classes) requiring the GC.

useful references

For Bismon, see http://github.com/bstarynk/bismon and read its dfraft Bismon report (updated quite often).

For the C++17 language, see this C++ reference.

For Linux programming, see Advanced Linux Programming and the syscalls(2) man page.

For GCC, see notably its Invoking GCC chapter.

For garbage collection, read Paul Wilson's Uniprocessor Garbage Collection Techniques old paper, then read the GC handbook

useful and relevant libraries

We already need the following libraries:

We may want to use, either soon or within a few years, (usually after 2022) interesting C or C++ libraries such as:

We should list other libraries interesting for us here, just in case (to avoid forgetting them).

past contributors

Thanks to Niklas Rosencrantz (Sweden) for past minor contributions.

HTTP service

We are adding HTTP service in RefPerSys. So libonion is required. For many months, we just hope to use http://localhost:9090/ in a recent (e.g. Firefox 80) web browser.

We really need to be able to show a demo of RefPerSys on a laptop without Internet connection. So all required resources should be copied here, under webroot/. Be careful about copyright and licensing issues.

Web conventions.

The webroot/ subdirectory holds resources useful for HTTP requests. In particular the following subdirectories:

  • webroot/css/ for -hand-written- style sheets.

  • webroot/img/ for additional images. Prefer SVG or PNG formats.

  • webroot/js/ for JavaScript code.

Dependency installation notes (Ubuntu 20.04 Focal Fossa)

  • apt install make
  • apt install pkg-config
  • apt install libcurl4-openssl-dev
  • apt install zlib1g-dev
  • apt install libreadline-dev
  • apt install libjsoncpp-dev
  • apt install qt5-default
  • apt install cmake
  • apt install build-essential

About

Reflexive & Persistent System (artificial intelligence)

http://refpersys.org/

License:GNU General Public License v3.0


Languages

Language:C++ 68.4%Language:JavaScript 28.6%Language:HTML 1.7%Language:Makefile 0.6%Language:Shell 0.4%Language:CSS 0.3%Language:Yacc 0.0%Language:GDB 0.0%