hashashin / SimpleChat

I've got fed up - heartily fed up - with AIML, so this is a chat bot system based on ChatScript (but in Java) (and cruder)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SimpleChat

SimpleChat is a system for creating chatbots, written in Java. It provides the (unique, as far as I can tell) facility for multiple instances of individual bots, and for multiple bots and their instances to be running at the same time. It provides the core for the ChatCitizen project, a Minecraft plugin which uses Citizens2 to make conversational NPCs.

Rationale

There are a few chatbot systems out there already - why write another? One of the most popular is AIML, which is written in Java and has already been incorporated into a Minecraft plugin. However, AIML is terribly verbose, and writing complex conversational structures is a nightmare. Yes, you can do it - and it’s really powerful - but the XML nature of everything and the way topics work really burns my brain.

Chatscript is another system which is easier, much easier, but it’s in C++. And my essential usecase is Minecraft chatbots, so I can’t use that. It also has quite a hefty runtime with a lot of (to me) unnecessary stuff. I don’t want to pass the Turing test just yet, I just want to have some simple conversations.

So SimpleChat is a simplified (hah) thing, written in Java, but based heavily on Chatscript. There is also an emphasis on actions which are procedural rather than necessarily being just text responses: I want my NPCs to do things, rather than just say them.

This documentation is really just for me, at the moment.

Note
THIS WHOLE PROJECT IS VERY EARLY WORK. (See the todo list)

Building

For SimpleChat itself, there are two Maven manifests POMs provided:

  • pom.xml just builds the library

  • pom-test.xml builds a test program with a simple interface.

There are no dependencies. The Minecraft plugin is a separate project which uses the library, and is in the very early days.

Bots

  • The primary object is a Bot, which has a set of topics (see below)

  • Each Bot can have multiple BotInstance objects, which are a single speaker sharing the same chat data (but having some private, perhaps). Thus you could have a whole lot of "soldier" or "shopkeeper" BotInstances, all based on one Bot.

  • The users are represented by Object objects, and each conversation between a BotInstance and a source has some private data, called "conversation data". It’s a raw Object because it’s likely to be something in your code.

Creating a bot and using it

Because bots sometimes need to load other bots in order to run, and because we don’t necessarily know where those bots live in the file system (due to oddities like how people install Minecraft plugins), we need to tell the system how to turn bot names into filenames. This is done with a Bot.PathProvider object you have to provide. This simply takes a bot name and returns a path to its directory (all bots have their own directories).

Having done this, we can then load the bot by using the static Bot.loadBot() method - this is not a constructor; it will return a previously loaded bot if one of the same name exists.

Here’s an example, which assumes all the bots are in the current directory.

Bot.setPathProvider(new Bot.PathProvider() {
    public Path path(String name) {
	return Paths.get(name);
    }
});
Bot b = Bot.loadBot("mybot");
// we provide a name - the instance variable (q.v.) @botname is set to this.
instance = new BotInstance(b,"some name");
instance.runInits();
source = new Object(); // could be anything, it's for you to use.

See the source of the test program org.pale.chattest.MainWindow.java to see some different code, which takes a bot directory from a dialog box and assumes any bots it might inherit live in sister directories to it. Typically a conversation is done by repeatedly calling

String reply = instance.handle(string,source);
Note
  • handle() can return a string, or null if a none value was on the stack at the end of the action (you’ll find out what this means). Be prepared to deal with null return values!

  • The init block is not run automatically, you have to invoke it with runInits(). This is because some applications need to do slightly odd things. For example, Minecraft loads and unloads bots quite often as it loads and unloads the chunks they are in. It automatically saves and loads the instance variables when it does this. We therefore only run the inits when the bot is first created.

Custom commands you write might need to access some special data connected to your bot. In this case, an Object can be passed to the instance constructor, so the bot can find the data inside your command code:

Bot b = new Bot(Paths.get("/some/directory/or/other");
instance = new BotInstance(b,"Fred the Bot",mythingy);
instance.runInits();
source = new Object();
Note

Don’t get confused with the source and the instance object. The source is your user, the instance object is something attached to individual bots.

Be careful with the casting you’ll have to do inside the commands!

Calling action language functions

Sometimes you might want to get a string or perform an action in response to some kind of event rather than user text. Pattern matching here (see below) would be wasteful, so it’s possible to run user-defined action language functions directly:

// assuming instance and source are set, and funcName is the name
// you defined the function with..

String msg;
if(instance.bot.hasFunc(funcname)){
    msg = instance.runFunc(funcName,source);
    sendMessageToThePlayer(msg);
}

The action function should return a string on top of the stack or leave a string in the string builder, just like an action called from a topic pattern (see below). However, it can also return none, in which case the result from the Java call will be null.

Note
  • It’s a good idea to check the function exists and to do (or not do) something if it doesn’t!

  • As before, runFunc() can return a null if none was on the stack. Deal with this case.

  • By convention, functions that get called externally are named ALLINCAPS, like RANDSAY in the example config file below. I haven’t enforced this in the language, however.

Bot directory

The bot directory should contain

  • config.conf file listing the topics, substitutions, categories, lists etc.

  • subsidiary .conf files containing more of the above included with include

  • .sub files with substitutions

  • .topic files each containing a topic

The configuration file

The config file must be called config.conf. It contains the following:

  • a # starts a comment

  • topics entries each giving a list of topics, each of which is loaded from a .topic file. A topic is a set of pattern/action pairs: when a pattern is matched, the action fires and pattern matching stops.

  • subs entries each giving the name of a substitution set, which is loaded from a .sub file

  • an optional init entry followed by a block of Action language (see below) which will set up initial values for conversation variables and maybe do some other things.

  • an optional global entry followed by a block of Action language (see below) which will set up values for bot-global variables, which are rather more system-friendly than instance variables but cannot be changed (see bot-global variables below)

  • category and phrase list definitions (q.v.)

  • any number of action language functions, which can be called from action language or from your application code.

  • include "filename" lines to include subsidiary conf files

  • message "some string" items to print messages to standard out

  • ifskip..endskip blocks to skip code under certain conditions (see Skip blocks)

  • abort "some string" items to abort the load (typically used in skip blocks)

Here is an example:

# This is a test bot!

skipif extension ChatCitizen
    # skip this block if we are running as part of the ChatCitizen
    # plugin and so actually have minecraft commands. This will
    # load a set of stubs to replace them.

    message "Minecraft not detected"
    include "minecraftstubs.conf"
endskip

# The calling program might invoke this function with runFunc() to
# respond to some kind of event in the world or a random tick.

:RANDSAY
    [
        "It's exciting here!",
        "Hello trees! Hello flowers!",
        "SPOON!",
        "Bored now."
    ] choose;


# here are some substitution files.

subs "subs1.sub"
subs "subs2.sub"

# primary topics, which can be rearranged in priority from within
# action code.

topics {main cats dogs}

# topics in different lists can be promoted and demoted but not
# outside their list, so these will always run after the topics
# above. The last topic list is generally for "catch-all" patterns.

topics {bottom}

# and here's an init block which just sets the instance variable
# `foo` to zero.
init
    0 int !@foo
;

Regex substitutions

Each bot can have a file (or set of files) containing regex substitutions associated with it. These will be processed before any other input, and are always processed. They are typically used to substitute things like "I’m" and "I am" with "IAM" to make parsing easier. Multiple bots can share substitution sets.

A substitution file is appended to a bot’s substitutions by using a line of the form

subs <subfilename>

in the config file. The file path is relative to the bot directory.

The format for the files is lines consisting of a regex and a replacement string, separated by default by a colon. Two directives exist, which should be on their own lines. The "#include" directive has a file argument and will include a file of substitutions. The "#sep" directive has a string (actually regex) argument and changes the separator for this file. The argument is separated by a space. All other "#" lines are comments. A (very brief) example:

# a comment
[iI]'m:Iam
[Ii]\s+am:Iam
[yY]ou\s+are:youre
[yY]ou're:youre
#include more.subst

Initial action

This is written in the action language (see below and here) and runs when an instance of this bot is created, but just throws away the output. It is typically used to initialise instance variables. Setting a conversation variable will cause a runtime error, because the bot isn’t in a conversation.

Topics

Topics are (loosely speaking) subjects of conversation. Each topic consists of a list of pattern/action pairs, which are run through in order when the user provides input. When a pattern matches, the action runs and produces some output which is passed to the user (as well as perhaps doing other things). All processing then stops. More specific patterns should therefore be at the top of the topic file, so they get a chance to match first.

Sometimes a special "pseudotopic" can be in play, such as when the next command is used in action code to specify a set of patterns to try to match with the next input. This is done to produce dialogue tree effects. In this case, the pseudotopic will try to match its patterns before any real topics.

Topics are arranged into lists. Within each list, topics can be promoted or demoted to the top and bottom of the list by actions. There can be any number of lists, but the example config above is a typical case, using only two: a main list for all the general conversational topics, and a bottom list for catch-all phrases. The topics are processed within their list, and their lists are processed in order. This is so that you can (say) demote a topic, but have it still try to match its patterns before any catch-all patterns try.

The topics command in the config file specifies a new topic list. Following it, in curly braces, are the topic names. These are loaded from .topic files in the same directory as the bot, so the line topics {main} will load the main.topic file.

Here is an example topic file:

# this is a named pattern/action pair. Following the '+' is an optional
# pattern name (preceded by a slash if present). Then a pattern node,
# in this case a sequence. The bit between the end of the sequence,
# which is delimited by brackets (other pattern nodes  have other delimiters)
# and the semicolon is the action. This one stacks the output "Hi, how are you?",
# and then sets up a subpattern tree and tells the system to use it to parse
# responses to this output.

+/hellopattern ([hello hi] .*)
    "hi how are you?"
    {
        # each subpattern is a pattern/action pair.
        # the pattern is this bit. It matches:
        # - possibly "I am"
        # - then either good, fine or well
        # - then everything else.

        +(?(I am) [good fine well] .*)

            # and this is the action, which just stacks an output

            "Glad to hear it.";

        # This pattern matches
        # - "I am" optionally
        # - then "bad" or the sequence "not too"
        # - then everything else

        +(?(I am) [bad (not too)] .*)
            "Oh, I'm sorry";
    }
    # "next" tells the system to try to match from the subpattern list
    # we have just put on the stack, the next time we get input.
    next;

# this anonymous pattern catches everything, and runs when nothing
# else in the topic has matched. It captures the input as "$foo"
# and this gets used to generate the output. You'd normally
# put this in a topic in the bottom topic list.

+$foo=.*
    "I don't know how to respond to " $foo +;

Note that each pair is preceded by +, and if the next character is '/' the optional name. Then comes a single pattern node, followed by the actions and a semicolon. The pattern name can be used to disable and enable a pattern in a topic from inside an action.

Whole topics can also be enabled and disabled, as well as being promoted and demoted to the top or bottom of their list.

Inherited topics

If you want to create a sub-bot which has exactly the same topics as its parent, you can just write

topics inherit

instead of a full topics block. If you do this, you can’t add any more topics blocks: your bot will have exactly the same topics as the parent. Naturally, you must have used inherit to set a parent bot. This is useful for creating sub-bots which just have different variables and maybe functions. I use it for creating different kinds of "shopkeeper", all of which have the same topics but sell and buy different items.

Patterns

For matching, the input is lower-cased, all punctuation is removed and finally it is split into words. Pattern matching is done per-word. The entire pattern must be in a pair of quotes. Most patterns will be sequences, so you’ll see a lot of (…​).

Pattern Elements

  • plain words match themselves

  • ^ negates the next pattern

  • [..] matches any of the included patterns

  • (..) matches all the included patterns in sequence it always succeeds

  • ? matches the next pattern, but carries on if it fails

  • ` matches at least one token of the previous node until the next node matches; so the `. in (.+ foo) will match one or more tokens until it hits a "foo";

  • * is similar, but matches zero or more of the previous node;

  • ^ negates the following pattern, but does not consume - it should be followed by what you want in that place. A common pattern might be ^cat . which will match "not a cat"

Note
  • Negate nodes are "fun".

Labels

Putting $labelname= before a pattern node marks it so that the data it matches will be stored in a variable. In the case of '*' and '+', the variable $labelname_ct is set to the match count.

Reductions

Following AIML usage, a "reduction" is a pattern/action pair which replaces some text with a shorter or canonical form, and then sends that straight back into the pattern matcher. For example, there are lots of ways of saying "Hello". We could reduce them to one pattern by something like this:

+ (hi .*)" "HELLO" recurse;
+ (wotcher .*) "HELLO" recurse;
+ (good [morning afternoon evening]) "HELLO" recurse;
+ ([awright (all right)] .*) "HELLO" recurse;
+ (hello .+) "HELLO" recurse
+ (hey .*) "HELLO" recurse

and so on. The recurse command sends the string on top of the stack back into the interpreter. Naturally we could do a lot of this with string substitutions (and it’s probably faster), but often reductions are easier to read, and are able to do more complicated things. More complex reductions could be:

+ (I think $a=.+) "${$a}" recurse;
+ (do you think that $a=.+ is $b=.+)  "is ${$a} ${$b}" recurse;

Reductions typically live in a topic of their own.

Actions

These are in the form of a sequence of instructions in an RPN language, which should either leave a string on the stack or build one using print statements. They are always terminated by a semicolon. The simplest is just a string:

+([hello hi] $name=.*)
    "Hi, how are you?";

One special and complex instruction is an entire set of subpatterns and actions. When these are set using the next command, the conversation will try these patterns first. They are pattern/action pairs as normal, but defined in curly brackets:

+pat ([hello hi] .*)
    "hi how are you?"
    {
        +([good fine well] .*)
            "Glad to hear it.";
        +([bad (not too)] .*)
            "Oh, I'm sorry";
    }

More details on the action language here.

Note

If the action doesn’t leave anything behind on the stack (or in the string builder, see the action language docs) the system considers the whole pattern as having failed to match, and moves on to try the next one. This can be useful for adding additional code to test things.

Categories

Words can belong to hierarchies categories, rather like (OK, very like) "concepts" in ChatScript. They can be defined in topic files, and are local to each bot. Here’s an example of a category block from a topic file:

~animal=
    [
        "small dinosaur"
        big_dinosaur
        bird pig aardvark yak
        ~dog=[dog dogs puppy puppies]
        ~cat=[cat cats kittens "puddy tat"]
    ]
~human= [
        ~man=[Steve Dave "Big Paul" him he]
        ~woman=[Sharon Alice her she]
        they them
    ]

This defines two top level categories, ~animal and ~human, each of which have some subcategories. Steve is in both the categories human and man, while bird is only in animal. There are two kinds of "leaf" entry in a category tree: single words and word lists. Single words are entered just using the word; while lists are entered either using space-separated lists of words in quotes, or by separating the words with underscores. Words just match words, while lists of words have to match all the words in order.

Matching in a pattern is done with the `~categoryname' symbol. Here’s an example:

+(is $n=(?a ~cat) a cat) "Yes ${$n} is a cat";
+(is ?a ~dog a cat) "No, it's a dog";
+(is $n=(?a ~animal) a cat) "No, but a ${$n} is some kind of animal!"+;
+(is $n=.+ cat) "No, I don't know what ${$n} is"+;
Note
You can use categories inside other categories before the former categories are declared; the outer category will create an entry which the later declaration will fill in.

Category post-modifiers

Some useful hacks are available for modifying a category list. After the square brackets, the / symbol precedes a set of modifiers. These are characters followed by some data.

  • /+suffix adds an optional suffix to a category. If the match fails, then we can try again with the suffix removed from the matching data. Thus [say talk]/+ing will match "saying" and "talking."

While these are occasionally useful, we don’t use them often.

Categories - an example

For an example of how use categories to handle pluralisation and synonyms, look at the structures in matlists.conf in the ChatCitizens2 root robot. This handles material names in Minecraft. Here, we set up two ACTIONS.adoc, both of which map from categories (the materials) to lists of strings (possible plurals and singulars). These maps give us choices of strings to output.

After these maps, we define a category for each material giving all the possible singular and plural strings. We can match on these categories. Finally, a single ~material category contains all these categories so we can match on it to see if we have a material.

A very useful function here is subcat (string category — category) This takes a string and a category which contains other categories (like ~material in our example). The string must match the input category. The function will return the subcategory of the input category which matches the string. Given our example, this means that if we call

"blocks of stone" ~material subcat

we will get the category ~stone as the result.

Note
  • We could add multiple levels of subcategory in our materials exampe, but this would make it harder to use subcat.

  • The file matlists.conf was generated using an Angort script script from a CSV file. You could write a similar thing in a more mainstream language fairly easily.

Phrase lists

Lists are lists of strings which are accessible from action language. They are in many ways like categories, but cannot be matched on - instead they are intended for generating content and customising this content to sub-bots of a bot, and so can be inherited (see the section on Inheritance).

Phrase lists are specified in a config file by data of the form:

^listname = [word "a phrase" another_phrase]

So very similar to categories. They cannot, however, be nested.

Bot inheritance

It’s often the case that many disparate bots share many characteristics, from some of the more basic substitutions, through the so-called "reduction" topics, up to full conversational topics. To help do this without copying code or requiring more memory, a bot can inherit the properties of another bot. To do this, put a line of the form

inherit "botpath"

near the top of your config file, for example

inherit "bots/rootbot"

The new bot will inherit its parents categories and functions, unless they are overriden in the child. Topics are also inherited, but not topic lists - you have to add the topic into the topic list by name as usual, but if it already exists in the parent it will not require loading. The init function of a parent bot will run before that of the child bot. Substitutions are also inherited, but the system needs to be told where they should run relative to the bot’s own substitutions. To do this, add a subs parent line into the lines where you load your substitutions. For example:

subs "subs1.subs"
subs parent
subs "subs2.subs"

Quite often you’ll just have a subs parent line by itself, since most English substitutions should be in your "root" bot.

Bots can be nested to any level - if a category, topic or function does not exist, the system will go "up the family tree" to find it. Init functions will run so that the root init function runs first.

Skip blocks

Skip blocks in a config file let the system ignore blocks of code under certain conditions. They have the syntax:

skipif condition
    ...
endskip

or

skipif !condition
    ...
endskip

to negate the condition. Currently the only condition supported is extension <name>, which returns true if InstructionCompiler.addExtension() has been called with <name>. This is done when a new set of action language commands is added, as described in this document. A typical use is to provide action language "stub" functions for functions which don’t exist when an extension is not loaded, as shown in the example in the example config file.

Bot-global variables and global blocks

If you create many instances of the same bot, each instance will have its own copy of the instance variables set up in the init block. Often, they’ll be identical and never change. This is wasteful. Even worse, some systems will often delete and re-instantiate bots — the Minecraft Citizens2 plugin does this whenever the instance’s chunk is unloaded and reloaded. This causes a lot of activity if you have a lot of instance variables, or very large instance variables - consider the materials lists we discussed in Categories above.

To avoid this, we can create "bot-global" variables using a global block. Here, the code runs exactly as it does for an init block, but it runs on a "dummy" instance which is not visible to the user. Whenever we request an instance variable, and it cannot be found in the instance, we look in the dummy instance. If it cannot be found there, we look in the dummy instance created for the parent, and so on up the inheritance tree.

The upshot is that

  • bot-global variables look exactly like instance variables

  • there is only one copy of them for each loaded bot, not separate copies for each instance of that bot

  • they can be inherited

  • writing to a bot-global variable will create a new instance variable visible only to the instance, it will not change the variable for other instances

This last point is important. Reading bot-global variables will get the shared data; writing to them will cause a new instance variable to be created which will override the shared data in the instance which did the writing. If you really want to have writable global data, you can do this by having a bot-global map, and storing your data there: since it’s always the same map and we never write to the variable holding the map, stored data in the map will always be global.

About

I've got fed up - heartily fed up - with AIML, so this is a chat bot system based on ChatScript (but in Java) (and cruder)


Languages

Language:Java 99.9%Language:Shell 0.1%