AtomSpace MOSES Port (part I)

Question

AtomSpace MOSES Port (part I)

ngeiswei opened this issue 6 years ago · comments

This is the initial plan to get started with the AS-MOSES port. It
only treats the first steps, though does so in details.

Initiate the as-moses repo

AS-MOSES represents a rather major departure from the existing MOSES,
I believe it is best to move it to its own repository. This will
minimize confusions to the user and increase the awareness that MOSES
is transitioning to something different.

Here are the steps involved to seed the as-moses repository with the
old MOSES:

Create a doc/as-moses folder under this repo
Move the root README.md under doc/as-moses
Fork the moses repo by rebasing as-moses onto
git@github.com:opencog/moses.git and force-push it to
git@github.com:opencog/as-moses.git (I'll temporary disable branch
protection on the master after step 1 and 2 have been carried).

Replace Combo by Atomese

This is a rather big undertaking and will be done
progressively. He start here with 2 tasks

Port fitness evaluation to Atomese
~~Port Reduct to Atomese~~ For now rely on combo reduct

More tasks like storing the population and meta-population in
AtomSpaces will come next.

Port Fitness Evaluation to Atomese

This task can be decomposed into 3 subtasks

Implement Combo to Atomese converter.
Represent problem data in AtomSpace.
~~Update the Atomese interpreter to handle problem data.~~ Implement dedicated atomese interpreter.
Add Atomese interpreter in fitness evaluation.

Convert Combo to Atomese

To make Atomese program as compact as possible we will use higher
level operators, that is working with predicates or concepts as
opposed to individuals, let me give some examples.

Let's assume the following data to fit

+--+--+--+--+
|  |i1|i2|o |
+--+--+--+--+
|r1|0 |1 |1 |
+--+--+--+--+
|r2|1 |0 |1 |
+--+--+--+--+
|r3|0 |0 |0 |
+--+--+--+--+

with 2 input features i1, i2, an output feature o, and 3
observations r1 to r3.

In the boolean domain a possible Combo candidate would be:

(or $i1 $i2)

The corresponding Atomese candidate will look like

(Or (Predicate "i1") (Predicate "i2"))

you may notice that the variables have been replaced by
predicates. That is because Or is not operating on individuals, but
rather on predicates, so here Or actually represent the union of the
satisfying sets of i1 and i2. Doing that allows us not to
generates more compact candidates.

Let's assume that the domain was actually real numbers, not Boolean.
Then a possible Combo candidate would be:

(+ $i1 $i2)

Likewise in Atomese this will be translated into

(Plus (Schema "i1") (Schema "i2"))

Notice that i1 and i2 are schema, not predicates. That is because
since the domain is Real, not Boolean, they can no longer be
predicates. Likewise Plus can be overloaded for schemata, similarly
to how one can add 2 mathematical functions where h = f + g means
ForAll x, h(x) = f(x) + g(x)

Let us give another example

(+ $i1 $i2 3)

Its Atomese translation could be

(Plus (Schema "i1") (Schema "i2") 3)

but what is 3? In principle 3 should be a constant function that
returns the number 3 for each input. However I suppose it would be
acceptable to overload Plus so that it can mix schemata with
constants and assume constants are in fact constant functions. So that

(Plus (Schema "i1") (Schema "i2") (Number 3))

would be understood as

(Plus (Schema "i1") (Schema "i2") (Lambda X (Number 3)))

We will see how it goes but I think it's doable.

Note that there is an existing Combo to Atomese converter here

https://github.com/opencog/moses/blob/master/moses/comboreduct/main/combo-fmt-converter.cc

but it is extremely limited and outputs strings, while what we want is
something that builds atoms directly.

The code can probably be added in a converter folder of

https://github.com/opencog/moses/tree/master/moses/comboreduct

it should not require an atomspace (it should be up to the user to add
it to the atomspace of his/her choice). So rather than using functions
such as AtomSpace::add_link it should use createLink, etc.

Represent Problem Data in AtomSpace

In order to use the Atomese interpreter to measure the fitness of a
program over some data, the data need to be loaded to some
AtomSpace.

Let us reconsider the example data

+--+--+--+--+
|  |i1|i2|o |
+--+--+--+--+
|r1|0 |1 |1 |
+--+--+--+--+
|r2|1 |0 |1 |
+--+--+--+--+
|r3|0 |0 |0 |
+--+--+--+--+

obtained from the CSV file

i1,i2,o
0,1,1
1,0,1
0,0,0

over the Boolean domain for now.

We need to tell how instances/observations relate to features. Here is
how it could be done. For instance r1 could be represented as

(Evaluation (stv 0 1)
  (Predicate "i1")
  (Node "r1"))
(Evaluation (stv 1 1)
  (Predicate "i2")
  (Node "r1"))
(Evaluation (stv 1 1)
  (Predicate "o")
  (Node "r1"))

Assuming the domain of data is Real instead of Boolean, then
Execution must be used instead of Evaluation. For instance r1
would be represented as

(Execution
  (Schema "i1")
  (Node "r1")
  (Number 0))
(Execution
  (Schema "i2")
  (Node "r1")
  (Number 1))
(Execution
  (Schema "o")
  (Node "r1")
  (Number 1))

This can be made more compact by considering that a function, say f,
is a set of pairs (x, f(x)). For instance i1 can be described with

(Similarity (stv 1 1)
  (Schema "i1")
  (Set
    (List (Node "r1") (Number 0))
    (List (Node "r2") (Number 1))
    (List (Node "r3") (Number 0))))

Or even more compact considering the Cartesian product over features,
using ListLink, to represent a Cartesian product between functions
over the same domain.

(Similarity (stv 1 1)
  (List (Schema "o") (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (List (Number 1) (Number 0) (Number 1)))
    (List (Node "r2") (List (Number 1) (Number 1) (Number 0)))
    (List (Node "r3") (List (Number 0) (Number 0) (Number 0)))))

I recommend to use the last representation as it is more compact and I
suspect might actually be easier for the interpreter to process.

Update the Atomese Interpreter to Handle Problem Data

Assuming the AtomSpace is loaded with the table above, how to evaluate

(Plus (Schema "i1") (Schema "i2"))

?

Such schema i1+i2 is expected to be represented as

(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r1")
  (Number 1))
(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r2")
  (Number 1))
(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r3")
  (Number 0))

Or equivalently, seeing a function as a set of input/output pairs

(Similarity (stv 1 1)
  (Plus (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (Number 1))
    (List (Node "r2") (Number 1))
    (List (Node "r3") (Number 0))

However when invoking the cog-execute! on

(Plus (Schema "i1") (Schema "i2"))

the result could simply be

(Set
  (List (Node "r1") (Number 1))
  (List (Node "r2") (Number 1))
  (List (Node "r3") (Number 0)))

The Atomese interpreter would recognize that Plus is applied to
schemata, gather their associated data, iteratively apply the lower
level operator Plus to the associated numbers, and reconstruct the
result as a mapping from inputs (observations r1 to r3) to outputs
((Number 1), etc).

Replace Combo by Atomese Interpreter in Fitness Evaluation

Due to Atomese Reduction not supported, we can only support Atomese
right after reduction. Basically, intances will be turned into reduced
combo trees, then turned into Atomese program, then fitness
evaluated. Let's recall how the fitness function is being called, in
the hill-climbing algorithm, the call occurs here

https://github.com/opencog/moses/blob/master/moses/moses/optimization/hill-climbing.cc#L234

which is an iscorer (for instance scorer), so we want to implement a
new class inheriting iscorer_base

https://github.com/opencog/moses/blob/master/moses/moses/representation/instance_scorer.h#L35

that turns the instance into an Atomese program and evaluate its
fitness. Then upgrade fitness functions to support Atomese programs.

To break it down, the subtasks are

Rename complexity_based_scorer to combo_based_scorer
Implement atomese_based_scorer similar to
combo_based_scorer, that turns the instance into a reduced
combo_tree, convert it to atomese, then call the fitness on this
atomese program.
Overload bscore_base::operator() (with a default dummy
implementation to allow it to be optional for now). Since
instance_scorer.h will start growing, it would be best to create
a instance_scorer.cc and move the implementations there.
Start implementing it for various fitness functions. I
recommand to start with logical_bscore which is probably the
simplest fitness function type.
Test MOSES using atomese_based_scorer instead
combo_based_scorer for the implemented fitness function types.
Will be done in the next iteration.

Port Reduct to Atomese

The approach suggested at the moment is to explicitely represent the
result of a reduction as a relationship between Atomese program and
normal form. I think it makes sense to introduce a link type just for
that called ReductLink For instance

(ReductLink 
  (And
    (Predicate "f1")
    (Predicate "f1"))
  (Predicate "f1")))

would represent that f1 and f1 reduces to f1.

To acheive that, ReductLink as well as the operators involved in the
Atomese programs must be axiomatized. Then the URE alone should be
able to perform the reduction.

This work is already under way, so far by Yidne.

If its completion takes too long, once we have a Combo to Atomese
converter and vise versa, we could probably just wrap the existing
Combo reduct engine into a URE rule, and use this as a temporary
replacement for Atomese reduction just so that we can keep going with
the remaining of the port.

Linas Vepštas · Answer 1 · Fri Apr 27 2018 00:41:04 GMT+0800 (China Standard Time)

Instead of Similarity (stv 1 1) Don't you want to use Equivalence ? I know that in PLN, these are the "same thing". However, equivalence suggests a definition, while similarity suggests a possibility. I'd like to keep formally defined, externally imposed definitions distinct from statistically-discovered similarities.

Linas Vepštas · Answer 2 · Fri Apr 27 2018 00:47:50 GMT+0800 (China Standard Time)

This thing: (Span (Schema "i1") (Schema "i2") (Schema "o")) is not a category-theoretic span in any way.

Nil Geisweiller · Answer 3 · Fri Apr 27 2018 00:48:36 GMT+0800 (China Standard Time)

According to the PLN book Similarity is for concepts, while Equivalence is for predicates. Since these functions are seen as sets, which are concepts, Similarity should be the right link type.

Linas Vepštas · Answer 4 · Fri Apr 27 2018 00:53:07 GMT+0800 (China Standard Time)

OK, I guess we can use Similarity for concepts. However, I want to abolish SetLink in favor of MemberLink ...

Linas Vepštas · Answer 5 · Fri Apr 27 2018 00:55:39 GMT+0800 (China Standard Time)

So:

(Member (stv 1 1)
  (List (Node "r1") (Number 0))
  (Schema "i1"))
  
(Member (stv 1 1)
  (List (Node "r2") (Number 1))
  (Schema "i1"))

(Member (stv 1 1)
  (List (Node "r3") (Number 0))
  (Schema "i1"))

Nil Geisweiller · Answer 6 · Fri Apr 27 2018 00:55:39 GMT+0800 (China Standard Time)

Regarding SpanLink I'd love to have the right term. I'm not category theory expert and I couldn't find a perfect term. In my Category Theory bible, Category Theory for the Sciences, they do mention some form of Cartesian product over the same domain, but they don't give it a name! They just say, page 44, Section 3.1.1.9
"
We say this function is induced by f and g, and we denote it
<f,g>:A->X x Y, where <f,g>(a)=(f(a),g(a))
"
So what I am supposed to call that? InducedLink?

Then I saw further down the book the introduction of span, which I know is no is not exactly that, but is the closest I could find. Other suggestions are welcome.

Nil Geisweiller · Answer 7 · Fri Apr 27 2018 00:58:54 GMT+0800 (China Standard Time)

I don't think we want to abolish SetLink, in some places it really makes things more compact, and unlike sets built with MemberLink, it is immutable.

However what we do need is a construct to recursively step into a set, like Cons for List, perhaps should it be called ConstSet or something...

Nil Geisweiller · Answer 8 · Fri Apr 27 2018 01:07:31 GMT+0800 (China Standard Time)

Regarding span, just to be absolutely clear, the reason I'm not using List (which could be a good candidate to represent Cartesian product), or perhaps more clearly CartesianProd is because it is not exactly what I want. CartesianProd would duplicate the inputs, and in order to minimize redundancy I don't want that.

To reuse the same example, using CartesianProd instead of Span would give

(Similarity (stv 1 1)
  (CartesianProd (Schema "i1") (Schema "i2") (Schema "o"))
  (Set
    (List (List (Node "r1") (Node "r1") (Node "r1")) (List (Number 0) (Number 1) (Number 1)))
    (List (List (Node "r1") (Node "r1") (Node "r2")) (List (Number 0) (Number 0) (Number 1)))
    ...

You see the duplicated inputs, (List (Node "r1") (Node "r1")... That "span" link, or "Induced" link, or whatever-is-the-right-term link, is just to avoid that.

Nil Geisweiller · Answer 9 · Fri Apr 27 2018 01:20:59 GMT+0800 (China Standard Time)

Actually CartesianProd is a lot worse than that, because it would consider all combinations of inputs (I've updated the message above to reflect that). So we really need another link type, it just needs a name.

Linas Vepštas · Answer 10 · Fri Apr 27 2018 01:28:31 GMT+0800 (China Standard Time)

There's no such thing as a Cartesian product in category theory. The Cartesian product is a completely unrelated concept from some other completely different area of mathematics, got nothing to do with anything.

That said, the Cartesian product is an example of a https://en.wikipedia.org/wiki/Pullback_(category_theory) - its an example of a pullback, taken over a singleton (a single point). Pullbacks are generically a way of gluing two other objects together, over some common object to which they are both related. That is, if X and Y are two objects, having Z in common between them, then there exists a unique, universal object P which is the gluing-together of X and Y over Z.

The Cartesian product is only defined over the category of sets (i.e. over set theory). The only thing that two generic sets have in common is that they are made out of points (singletons, sets containing only one element). Thus, when gluing together sets, one takes Z == a singleton == some arbitrary (any) set containing a single element. The pullback is built by mapping each and every element of X to the singleton. Also map each and every element of Y to the singleton. Think of each of these as being like peacocks, with a fan of arrows. Glue these two fan-shaped things together at the singleton (two peacocks having homo sex!?). The result looks like a fan of arrows all pointing into Z, and that result can be uniquely, universally represented by pairs of objects, one chosen from X and one from Y. But the set of all possible pairs is conventionally called the "Cartesian product".

Not all categories have Cartesian products. Set theory is an example of a category that does have a Cartesian product. Not all categories have pullbacks. If they exist, they have to be proven to exist.

Most of what we do in opencog is much closer to model theory than to category theory, and so I'm kind-of sorry I ever talked about category theory w.r.t. opencog.

Linas Vepštas · Answer 11 · Fri Apr 27 2018 02:02:06 GMT+0800 (China Standard Time)

Part 2:

"
We say this function is induced by f and g, and we denote it
<f,g>:A->X x Y, where <f,g>(a)=(f(a),g(a))
"

That is an example of a Cartesian product. For example: https://en.wikipedia.org/wiki/Product_(category_theory) where the product can be thought of as the "trivial" pullback. In the category of sets, if you have a function f:A->X and a function g:A->Y, then you can prove that there exists a unique, universal function <f,g>:A->X x Y that maps an element a to the pair (f(a), g(a)).

For set theory, this is kind of trivial and goofy and "self-evident" (to some typical college freshman). The issue being tackled here is that there are categories for which products do not exist. When products do exist, then there are certain generic statements and proofs that can be made, that would apply to all categories that have products. This avoids the trouble of having to write the same proof, over and over, once for each different kind of category. (viz, once for set theory, once for group theory, once for topology, etc.) Its a short-cut.

For moses, pretty much everything is naive set theory (well, that, and relational algebra). The notion of a table of rows exists only in set theory: each row is a cartesian product, a "tuple", and the table itself is a set. In set theory, a map from a "row" or "tuple" to some value is just a plain-old ordinary function. In set theory, morphisms are just plain-old functions.

So what I am supposed to call that? InducedLink?

Does it need a name? Call it RowLink or SequenceLink or OrderedSetLink or TupleLink or or VectorLink or ArrayLink or just .. ListLink.

The biggest problem in opencog is that ListLinks are not computer-science lists; they aren't editable (the way a singly-linked list is). They should have been called VectorLinks or TupleLinks or ArrayLinks, to emphasize their immutability.

Linas Vepštas · Answer 12 · Fri Apr 27 2018 02:11:40 GMT+0800 (China Standard Time)

SetLink makes things more compact, and unlike sets built with MemberLink, it is immutable.

During learning, while moses is running, one might want to alter the set of inputs. This becomes hard and klunky with set links.

Linas Vepštas · Answer 13 · Fri Apr 27 2018 02:24:51 GMT+0800 (China Standard Time)

You see the duplicated inputs, (List (Node "r1") (Node "r1")), ... That "span" link, or "Induced" link, or whatever-is-the-right-term link, is just to avoid that.

I don't understand why there is some need "duplicate the inputs". At any rate, Cartesian products and functions have nothing to do with one-another; they are unrelated concepts.

The tables in moses are just tables. You do NOT have to specify which column is the dependent column in advance. Just build the table; then, later on, you can indicate that column p42 is the one that moses should train on. This way, you can just throw some generic tables into the atomspace, and decide, at a later point in time, which columns should be trained on. Maybe a table has two or three dependents, and you want to run two or three instances of moses, training on each.

Linas Vepštas · Answer 14 · Fri Apr 27 2018 02:27:20 GMT+0800 (China Standard Time)

Re tables: I have a comment, kind-of off to the side, but -- in language learning, I have tables that are extremely sparse, with only one out of a million or one out of a billion entries that are non-zero. Representing these as ordinary tables would blow out RAM. (They already blow out RAM, even when they are sparse...)

Linas Vepštas · Answer 15 · Fri Apr 27 2018 05:56:46 GMT+0800 (China Standard Time)

I'm working on an alternate way of representing tabular data right now, but decided to distract myself and read email today. Upshot, I think there's a simpler way of representing data than what's given above.

Nil Geisweiller · Answer 16 · Fri Apr 27 2018 13:19:32 GMT+0800 (China Standard Time)

@linas totally open to have alternate ways of representing tabular data, spare, dense, etc. However, keep in mind that this table will then have to be used to evaluate Atomese programs. Wouldn't it be really cool if the semantics corresponding to these tables directly guides how to evaluate Atomese programs?

This is the reason why I didn't use a mere Link to group together columns. One way to see columns is as function descriptions, mapping inputs to outputs. Likewise one way to see tuples of columns (table) could be as groups of functions (mapping inputs to outputs for each function), etc. Doing that means we don't need to hack the Atomese interpreter (aka pattern matcher) specifically to handle MOSES tables. If we use arbitrary links, without worrying about their semantics, I think we get into the risk of bloating the Atomese interpreter.

Nil Geisweiller · Answer 17 · Fri Apr 27 2018 13:31:20 GMT+0800 (China Standard Time)

For example: https://en.wikipedia.org/wiki/Product_(category_theory) where the product can be thought of as the "trivial" pullback.

Well, that is exactly what I was looking for. So it's just a mere product (with a codomain that is defined as the Cartesian product of the codomains of the features, that is what confused me). So the link type name I'm after is really a ProductLink. So LinkLink might be just fine, if we agree to give it the semantics of a product.

Linas Vepštas · Answer 18 · Sat Apr 28 2018 04:54:50 GMT+0800 (China Standard Time)

LinkLink

??? ListLink is already a link that implements a Cartesian product. In fact, every link type that derived from OrderedLink is a Cartesian product. Why in the heck do we need yet another Cartesian product link? What's wrong with just OrderedLink?

Nil Geisweiller · Answer 19 · Sat Apr 28 2018 11:02:49 GMT+0800 (China Standard Time)

I meant ListLink.

In fact, every link type that derived from OrderedLink is a Cartesian product.

That feels true.

Yidnekachew Wondimu · Answer 20 · Wed May 09 2018 21:29:31 GMT+0800 (China Standard Time)

I'm working on an alternate way of representing tabular data right now, but decided to distract myself and read email today.

@linas Can I have a look what that alternative way looks like & when are we going to get it?