opencog / link-grammar

The CMU Link Grammar natural language parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sharing parse choice elements

linas opened this issue · comments

I have an idea of how to memory-spare parse choice elements, just like parse set elements are shared through the x-table.
Currently, they cannot be shared because each has a unique value.
I thought of these data structures (typedefs omitted):

struct Parse_set_struct
{
   Parse_choice_desc *desc;
   ...
};
struct Parse_choice_desc_struct
{
   Parse_choice_desc *next;
   Parse_choice_lr *l_pc, *r_pc;
};
struct Parse_choice_lr
{	
   Parse_set * set;
   Disjunct    *md;           /* the chosen disjunct for the middle word */
   int32_t id;                /* the tracon (l_id for l_pc and r_id for r_pc  */
};

(What is now set[1] and set[2] will be in l_pc and r_pc, correspondingly.)
Of course, there is a question of how much sharing will be done (if any, I don't know even that) and what will be the CPU overhead.

Originally posted by @ampli in #1402 (reply in thread)

I was going to write something clever here, but I now realize I don't understand the idea. It would seem like this results in two copies of Disjunct *md; instead of just one. There's also an extra int32_t id; that uses more space. I don't understand what Connector *le, *re; are. I don't understand which of these structs you think is sharable...

Sharing this one:

struct Parse_choice_lr
{	
   Parse_set * set;
   Disjunct    *md;           /* the chosen disjunct for the middle word */
   int32_t id;                /* the tracon (l_id for l_pc and r_id for r_pc  */
};

two copies of Disjunct *md; instead of just one

You are right. It should be moved to Parse_choice_desc_struct.

I don't understand what Connector *le, *re; are.

Where? They are not used here.

Also note that id replaces what is now Parse_choice::lc and Parse_choice::rc (depending if this is l_pc or r_pc).
A connector pointer can be used instead, but it seems that using tracon_id will lead to more sharing (to be checked).

I didn't explain the basis of this idea:
Currently Parse_choise has information of both sides at once., This make all of its elements unique so they cannot be shared. The idea is to separate it to LHS and RHS like set[0] and set[1].

id replaces what is now Parse_choice::lc

Ah! OK, that makes sense now.

The Connector *le, *re; are in the Parse_set. I'm guessing wildly that they are connectors on Parse_set_struct::lw and rw? If so, can they be removed and replaced by get_tracon(Parse_set_struct::l_id) ?

The Connector *le, *re; are in the Parse_set. I'm guessing wildly that they are connectors on Parse_set_struct::lw and rw? If so, can they be removed and replaced by get_tracon(Parse_set_struct::l_id) ?

Yes. But the "encoding for parsing" step would then need to produce a tracon table (to get the tracon address by its number). Such a table is produced by the "encoding for pruning" step, which shares most of its code, so it is easy to produce it. However, we will need to find how much overhead, if any, this adds.

would then need to ...

OK. I guess we can leave this alone for now. Things are complicated enough, as it is.

I guess we can leave this alone for now. Things are complicated enough, as it is.

But the opposite thing, i.e. remove int32_t l_id, r_id; seems easy and without overhead (unless I missed something) so I will try that.

The Connector *le, *re; are in the Parse_set. I'm guessing wildly that they are connectors on Parse_set_struct::lw and rw? If so, can they be removed and replaced by get_tracon(Parse_set_struct::l_id) ?

I found a way to replace Connector *lc, *rc; in Parse_choice with int32_t l_id, r_id; (saving 8 bytes) and get rid of int32_t l_id, r_id; in Parse_set w/o a noticeable overhead (PR soon).

BTW, by producing a tracon/disjunct mapping table (while encoding for parsing), it seems possible to also get rid of Disjunct *md; in Parse_choice. Even w/o additional table mapping, with some small overhead, it is possible to get rid of uint8_t lw, rw; in Parse_set, but then we still have uint8_t null_count that prevents the saving of 8 bytes. However, it is possible to get rid even of it if an x_table per null_count is implemented (which can be a good idea anyway).

it seems possible to also get rid of Disjunct *md

I Could not do it after all.