SimpleChat is a system for creating chatbots, written in Java. It provides the (unique, as far as I can tell) facility for multiple instances of individual bots, and for multiple bots and their instances to be running at the same time. It provides the core for the ChatCitizen project, a Minecraft plugin which uses Citizens2 to make conversational NPCs.
There are a few chatbot systems out there already - why write another? One of the most popular is AIML, which is written in Java and has already been incorporated into a Minecraft plugin. However, AIML is terribly verbose, and writing complex conversational structures is a nightmare. Yes, you can do it - and it’s really powerful - but the XML nature of everything and the way topics work really burns my brain.
Chatscript is another system which is easier, much easier, but it’s in C++. And my essential usecase is Minecraft chatbots, so I can’t use that. It also has quite a hefty runtime with a lot of (to me) unnecessary stuff. I don’t want to pass the Turing test just yet, I just want to have some simple conversations.
So SimpleChat is a simplified (hah) thing, written in Java, but based heavily on Chatscript. There is also an emphasis on actions which are procedural rather than necessarily being just text responses: I want my NPCs to do things, rather than just say them.
This documentation is really just for me, at the moment.
Note
|
THIS WHOLE PROJECT IS VERY EARLY WORK. (See the todo list) |
For SimpleChat itself, there are two Maven manifests POMs provided:
-
pom.xml just builds the library
-
pom-test.xml builds a test program with a simple interface.
There are no dependencies. The Minecraft plugin is a separate project which uses the library, and is in the very early days.
-
The primary object is a
Bot
, which has a set of topics (see below) -
Each Bot can have multiple
BotInstance
objects, which are a single speaker sharing the same chat data (but having some private, perhaps). Thus you could have a whole lot of "soldier" or "shopkeeper" BotInstances, all based on one Bot. -
The users are represented by
Object
objects, and each conversation between aBotInstance
and a source has some private data, called "conversation data". It’s a rawObject
because it’s likely to be something in your code.
Because bots sometimes need to load other bots in order to
run, and because we don’t necessarily know where those bots
live in the file system (due to oddities like how people install
Minecraft plugins), we need to tell the system how to turn bot names
into filenames. This is done with a Bot.PathProvider
object
you have to provide. This simply takes a bot name and returns
a path to its directory (all bots have their own directories).
Having done this, we can then load the bot by using the static
Bot.loadBot()
method - this is not a constructor; it will
return a previously loaded bot if one of the same name exists.
Here’s an example, which assumes all the bots are in the current directory.
Bot.setPathProvider(new Bot.PathProvider() {
public Path path(String name) {
return Paths.get(name);
}
});
Bot b = Bot.loadBot("mybot");
// we provide a name - the instance variable (q.v.) @botname is set to this.
instance = new BotInstance(b,"some name");
instance.runInits();
source = new Object(); // could be anything, it's for you to use.
See the source of the test program
org.pale.chattest.MainWindow.java
to see some different code,
which takes a bot directory from a dialog box and assumes any bots it might
inherit live in sister directories to it.
Typically a conversation
is done by repeatedly calling
String reply = instance.handle(string,source);
Note
|
|
Custom commands you write might need to access some special data
connected to your bot. In this case, an Object
can be passed
to the instance constructor, so the bot can find the data inside
your command code:
Bot b = new Bot(Paths.get("/some/directory/or/other");
instance = new BotInstance(b,"Fred the Bot",mythingy);
instance.runInits();
source = new Object();
Note
|
Don’t get confused with the source and the instance object. The source is your user, the instance object is something attached to individual bots. |
Be careful with the casting you’ll have to do inside the commands!
Sometimes you might want to get a string or perform an action in response to some kind of event rather than user text. Pattern matching here (see below) would be wasteful, so it’s possible to run user-defined action language functions directly:
// assuming instance and source are set, and funcName is the name
// you defined the function with..
String msg;
if(instance.bot.hasFunc(funcname)){
msg = instance.runFunc(funcName,source);
sendMessageToThePlayer(msg);
}
The action function should return a string on top of the stack or leave a string
in the string builder, just like an action called
from a topic pattern (see below). However, it can also return none
, in which
case the result from the Java call will be null.
Note
|
|
The bot directory should contain
-
config.conf
file listing the topics, substitutions, categories, lists etc. -
subsidiary
.conf
files containing more of the above included withinclude
-
.sub
files with substitutions -
.topic
files each containing a topic
The config file must be called config.conf
. It contains the following:
-
a
#
starts a comment -
topics
entries each giving a list of topics, each of which is loaded from a.topic
file. A topic is a set of pattern/action pairs: when a pattern is matched, the action fires and pattern matching stops. -
subs
entries each giving the name of a substitution set, which is loaded from a.sub
file -
an optional
init
entry followed by a block of Action language (see below) which will set up initial values for conversation variables and maybe do some other things. -
an optional
global
entry followed by a block of Action language (see below) which will set up values for bot-global variables, which are rather more system-friendly than instance variables but cannot be changed (see bot-global variables below) -
category and phrase list definitions (q.v.)
-
any number of action language functions, which can be called from action language or from your application code.
-
include "filename"
lines to include subsidiary conf files -
message "some string"
items to print messages to standard out -
ifskip..endskip
blocks to skip code under certain conditions (see Skip blocks) -
abort "some string"
items to abort the load (typically used in skip blocks)
# This is a test bot! skipif extension ChatCitizen # skip this block if we are running as part of the ChatCitizen # plugin and so actually have minecraft commands. This will # load a set of stubs to replace them. message "Minecraft not detected" include "minecraftstubs.conf" endskip # The calling program might invoke this function with runFunc() to # respond to some kind of event in the world or a random tick. :RANDSAY [ "It's exciting here!", "Hello trees! Hello flowers!", "SPOON!", "Bored now." ] choose; # here are some substitution files. subs "subs1.sub" subs "subs2.sub" # primary topics, which can be rearranged in priority from within # action code. topics {main cats dogs} # topics in different lists can be promoted and demoted but not # outside their list, so these will always run after the topics # above. The last topic list is generally for "catch-all" patterns. topics {bottom} # and here's an init block which just sets the instance variable # `foo` to zero. init 0 int !@foo ;
Each bot can have a file (or set of files) containing regex substitutions associated with it. These will be processed before any other input, and are always processed. They are typically used to substitute things like "I’m" and "I am" with "IAM" to make parsing easier. Multiple bots can share substitution sets.
A substitution file is appended to a bot’s substitutions by using a line of the form
subs <subfilename>
in the config file. The file path is relative to the bot directory.
The format for the files is lines consisting of a regex and a replacement string, separated by default by a colon. Two directives exist, which should be on their own lines. The "#include" directive has a file argument and will include a file of substitutions. The "#sep" directive has a string (actually regex) argument and changes the separator for this file. The argument is separated by a space. All other "#" lines are comments. A (very brief) example:
# a comment [iI]'m:Iam [Ii]\s+am:Iam [yY]ou\s+are:youre [yY]ou're:youre #include more.subst
This is written in the action language (see below and here) and runs when an instance of this bot is created, but just throws away the output. It is typically used to initialise instance variables. Setting a conversation variable will cause a runtime error, because the bot isn’t in a conversation.
Topics are (loosely speaking) subjects of conversation. Each topic consists of a list of pattern/action pairs, which are run through in order when the user provides input. When a pattern matches, the action runs and produces some output which is passed to the user (as well as perhaps doing other things). All processing then stops. More specific patterns should therefore be at the top of the topic file, so they get a chance to match first.
Sometimes a special "pseudotopic" can be in play, such as when
the next
command is used in action code to specify a set
of patterns to try to match with the next input. This is done
to produce dialogue tree effects. In this case, the pseudotopic
will try to match its patterns before any real topics.
Topics are arranged into lists. Within each list, topics can be promoted or demoted to the top and bottom of the list by actions. There can be any number of lists, but the example config above is a typical case, using only two: a main list for all the general conversational topics, and a bottom list for catch-all phrases. The topics are processed within their list, and their lists are processed in order. This is so that you can (say) demote a topic, but have it still try to match its patterns before any catch-all patterns try.
The topics
command in the config file specifies a new topic
list. Following it, in curly braces, are the topic names. These
are loaded from .topic
files in the same directory as the bot,
so the line
topics {main}
will load the main.topic
file.
Here is an example topic file:
# this is a named pattern/action pair. Following the '+' is an optional # pattern name (preceded by a slash if present). Then a pattern node, # in this case a sequence. The bit between the end of the sequence, # which is delimited by brackets (other pattern nodes have other delimiters) # and the semicolon is the action. This one stacks the output "Hi, how are you?", # and then sets up a subpattern tree and tells the system to use it to parse # responses to this output. +/hellopattern ([hello hi] .*) "hi how are you?" { # each subpattern is a pattern/action pair. # the pattern is this bit. It matches: # - possibly "I am" # - then either good, fine or well # - then everything else. +(?(I am) [good fine well] .*) # and this is the action, which just stacks an output "Glad to hear it."; # This pattern matches # - "I am" optionally # - then "bad" or the sequence "not too" # - then everything else +(?(I am) [bad (not too)] .*) "Oh, I'm sorry"; } # "next" tells the system to try to match from the subpattern list # we have just put on the stack, the next time we get input. next; # this anonymous pattern catches everything, and runs when nothing # else in the topic has matched. It captures the input as "$foo" # and this gets used to generate the output. You'd normally # put this in a topic in the bottom topic list. +$foo=.* "I don't know how to respond to " $foo +;
Note that each pair is preceded by +
, and if the next character is '/' the optional name.
Then comes a single pattern node, followed by the actions and a semicolon.
The pattern name can be used to disable and enable a pattern in a topic
from inside an action.
Whole topics can also be enabled and disabled, as well as being promoted and demoted to the top or bottom of their list.
If you want to create a sub-bot which has exactly the same topics as its parent, you can just write
topics inherit
instead of a full topics block. If you do this, you can’t add any more topics blocks: your bot
will have exactly the same topics as the parent. Naturally, you must have used inherit
to set
a parent bot.
This is useful for creating sub-bots which just have different variables and maybe functions. I
use it for creating different kinds of "shopkeeper", all of which have the same topics but sell
and buy different items.
For matching, the input is lower-cased, all punctuation is removed
and finally it is split into words. Pattern matching is done per-word.
The entire pattern must be in a pair of quotes. Most patterns
will be sequences, so you’ll see a lot of (…)
.
-
plain words match themselves
-
^
negates the next pattern -
[..]
matches any of the included patterns -
(..)
matches all the included patterns in sequence it always succeeds -
?
matches the next pattern, but carries on if it fails -
` matches at least one token of the previous node until the next node matches; so the `.
in(.+ foo)
will match one or more tokens until it hits a "foo"; -
*
is similar, but matches zero or more of the previous node; -
^
negates the following pattern, but does not consume - it should be followed by what you want in that place. A common pattern might be^cat .
which will match "not a cat"
Note
|
|
Putting $labelname=
before a pattern node marks it so that
the data it matches will be stored in a variable. In the case of '*' and
'+', the variable $labelname_ct
is set to the match count.
Following AIML usage, a "reduction" is a pattern/action pair which replaces some text with a shorter or canonical form, and then sends that straight back into the pattern matcher. For example, there are lots of ways of saying "Hello". We could reduce them to one pattern by something like this:
+ (hi .*)" "HELLO" recurse; + (wotcher .*) "HELLO" recurse; + (good [morning afternoon evening]) "HELLO" recurse; + ([awright (all right)] .*) "HELLO" recurse; + (hello .+) "HELLO" recurse + (hey .*) "HELLO" recurse
and so on. The recurse
command sends the string on top of the stack
back into the interpreter. Naturally we could do a lot of this
with string substitutions (and it’s probably faster), but often
reductions are easier to read, and are able to do more complicated
things. More complex reductions could be:
+ (I think $a=.+) "${$a}" recurse; + (do you think that $a=.+ is $b=.+) "is ${$a} ${$b}" recurse;
Reductions typically live in a topic of their own.
These are in the form of a sequence of instructions in an RPN language, which should either leave a string on the stack or build one using print statements. They are always terminated by a semicolon. The simplest is just a string:
+([hello hi] $name=.*) "Hi, how are you?";
One special and complex instruction is an entire set of subpatterns and
actions. When these are set using the next
command, the conversation will
try these patterns first. They are pattern/action pairs as normal, but
defined in curly brackets:
+pat ([hello hi] .*) "hi how are you?" { +([good fine well] .*) "Glad to hear it."; +([bad (not too)] .*) "Oh, I'm sorry"; }
More details on the action language here.
Note
|
If the action doesn’t leave anything behind on the stack (or in the string builder, see the action language docs) the system considers the whole pattern as having failed to match, and moves on to try the next one. This can be useful for adding additional code to test things. |
Words can belong to hierarchies categories, rather like (OK, very like) "concepts" in ChatScript. They can be defined in topic files, and are local to each bot. Here’s an example of a category block from a topic file:
~animal= [ "small dinosaur" big_dinosaur bird pig aardvark yak ~dog=[dog dogs puppy puppies] ~cat=[cat cats kittens "puddy tat"] ] ~human= [ ~man=[Steve Dave "Big Paul" him he] ~woman=[Sharon Alice her she] they them ]
This defines two top level categories, ~animal
and ~human
, each of which
have some subcategories. Steve
is in both the categories human
and man
,
while bird
is only in animal
. There are two kinds of "leaf" entry in a
category tree: single words and word lists. Single words are entered just
using the word; while lists are entered either using space-separated lists of
words in quotes, or by separating the words with underscores. Words just match
words, while lists of words have to match all the words in order.
Matching in a pattern is done with the `~categoryname' symbol. Here’s an example:
+(is $n=(?a ~cat) a cat) "Yes ${$n} is a cat"; +(is ?a ~dog a cat) "No, it's a dog"; +(is $n=(?a ~animal) a cat) "No, but a ${$n} is some kind of animal!"+; +(is $n=.+ cat) "No, I don't know what ${$n} is"+;
Note
|
You can use categories inside other categories before the former categories are declared; the outer category will create an entry which the later declaration will fill in. |
Some useful hacks are available for modifying a category list. After
the square brackets, the /
symbol precedes a set of modifiers.
These are characters followed by some data.
-
/+suffix
adds an optional suffix to a category. If the match fails, then we can try again with the suffix removed from the matching data. Thus[say talk]/+ing
will match "saying" and "talking."
While these are occasionally useful, we don’t use them often.
For an example of how use categories to handle pluralisation and synonyms, look at the structures in matlists.conf in the ChatCitizens2 root robot. This handles material names in Minecraft. Here, we set up two ACTIONS.adoc, both of which map from categories (the materials) to lists of strings (possible plurals and singulars). These maps give us choices of strings to output.
After these maps, we define a category for each material giving all the possible singular and plural
strings. We can match on these categories. Finally, a single ~material
category
contains all these categories so we can match on it to see if we have a material.
A very useful function here is subcat (string category — category)
This takes a string
and a category which contains other categories (like ~material
in our example). The
string must match the input category. The function will return the subcategory of the input
category which matches the string. Given our example, this means that if we call
"blocks of stone" ~material subcat
we will get the category ~stone
as the result.
Lists are lists of strings which are accessible from action language. They are in many ways like categories, but cannot be matched on - instead they are intended for generating content and customising this content to sub-bots of a bot, and so can be inherited (see the section on Inheritance).
Phrase lists are specified in a config file by data of the form:
^listname = [word "a phrase" another_phrase]
So very similar to categories. They cannot, however, be nested.
It’s often the case that many disparate bots share many characteristics, from some of the more basic substitutions, through the so-called "reduction" topics, up to full conversational topics. To help do this without copying code or requiring more memory, a bot can inherit the properties of another bot. To do this, put a line of the form
inherit "botpath"
near the top of your config file, for example
inherit "bots/rootbot"
The new bot will inherit its parents categories and functions, unless
they are overriden in the child. Topics are also inherited, but not
topic lists - you have to add the topic into the topic list by
name as usual, but if it already exists in the parent it will
not require loading.
The init function of a parent bot will run before that of the child
bot.
Substitutions are also inherited, but the system
needs to be told where they should run relative to the bot’s own
substitutions. To do this, add a subs parent
line into the lines
where you load your substitutions. For example:
subs "subs1.subs" subs parent subs "subs2.subs"
Quite often you’ll just have a subs parent
line by itself, since
most English substitutions should be in your "root" bot.
Bots can be nested to any level - if a category, topic or function does not exist, the system will go "up the family tree" to find it. Init functions will run so that the root init function runs first.
Skip blocks in a config file let the system ignore blocks of code under certain conditions. They have the syntax:
skipif condition ... endskip
or
skipif !condition ... endskip
to negate the condition.
Currently the only condition supported is extension <name>
, which
returns true if InstructionCompiler.addExtension()
has been called
with <name>
. This is done when a new set of action language commands is added,
as described in this document. A typical use is
to provide action language "stub" functions for functions which don’t
exist when an extension is not loaded, as shown in the example in
the example config file.
If you create many instances of the same bot, each instance will have its own
copy of the instance variables set up in the init
block. Often, they’ll be identical
and never change. This is wasteful. Even worse, some systems will often delete and re-instantiate
bots — the Minecraft Citizens2 plugin does this whenever the instance’s chunk is unloaded
and reloaded. This causes a lot of activity if you have a lot of instance variables, or
very large instance variables - consider the materials lists we discussed in Categories
above.
To avoid this, we can create "bot-global" variables using a global
block.
Here, the code runs exactly as it does for an init
block, but it runs on a "dummy" instance
which is not visible to the user. Whenever we request an instance variable, and it cannot
be found in the instance, we look in the dummy instance. If it cannot be found there,
we look in the dummy instance created for the parent, and so on up the inheritance tree.
The upshot is that
-
bot-global variables look exactly like instance variables
-
there is only one copy of them for each loaded bot, not separate copies for each instance of that bot
-
they can be inherited
-
writing to a bot-global variable will create a new instance variable visible only to the instance, it will not change the variable for other instances
This last point is important. Reading bot-global variables will get the shared data; writing to them will cause a new instance variable to be created which will override the shared data in the instance which did the writing. If you really want to have writable global data, you can do this by having a bot-global map, and storing your data there: since it’s always the same map and we never write to the variable holding the map, stored data in the map will always be global.