SANTO

System to gather data for ontology-driven Slot filling tasks with a web-based ANnotation TOol.

Description

Supervised machine learning algorithms require training data whose generation for complex relation extraction tasks tends to be difficult. Being optimized for relation extraction at sentence level, many annotation tools lack in facilitating the annotation of relational structures that are widely spread across the text. This leads to non-intuitive and cumbersome visualizations, making the annotation process unnecessarily time-consuming. We propose SANTO, an easy-to-use, domain-adaptive annotation tool specialized for complex slot filling tasks which may involve problems of cardinality and referential grounding. The web-based architecture enables fast and clearly structured annotation for multiple users in parallel. Relational structures are formulated as templates following the conceptualization of an underlying ontology. Further, import and export procedures of standard formats enable interoperability with external sources and tools.

Citation

If you use this project please cite

Hartung M, ter Horst H, Grimm F, Diekmann T, Klinger R, Cimiano P (In Press) In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (System Demonstrations). Association for Computational Linguistics. available at https://pub.uni-bielefeld.de/publication/2919923

Online Demo

A demo instance that is routinely reset is available at http://psink.techfak.uni-bielefeld.de/santo/

Requirements

Apache >= 2.4
PHP 5.6
MySQL / MariaDB 5.5

Installation

System setup

Clone this repository into your web root.
```
 git clone https://github.com/ag-sc/SANTO.git
```
Alternatively, you can also download a zipped version of the repository and unzip it in your webservers root directory.
```
wget https://github.com/ag-sc/SANTO/archive/master.zip
unzip master.zip -d SANTO/
```
Create a MySQL database and user for the project.
```
mysql -u<user> -p
```
```
CREATE DATABASE anno;
```
Please ensure to create a user with access privileges to the newly created database. Test your login at https://serveruri/
Import schema.sql into the database you created induring step 2.
```
mysql -u<user> -p <database name> < schema.sql
```
Adjust config/annodb.config and provide connection details (hostname, username, password, schema (= database name)) for your installation.
```
[database]
host=localhost
user=anno_username
password=anno_password
schema=anno
```

Adding users and ontology data

In the repository root, create an admin user (replace admin by a username and secret by a password):
```
php php/cli_createuser.php admin secret 1
```
The "1" indicates that the created user should be granted curator privileges.
For the slot filling functionality to be more useful, you need to specify which groups of ontology classes should be visible on the right hand side. For this, create a file called groups.txt in which you specify (tab-delimited, see file examples/groups.csv): (internal) group name, the heading of the section in the UI, the name of the Class and a numeric order (1-N) in which the sections will be rendered.
```
Country Countries Country 2
```
The above would generate two sections of persons and country objects with the respective plural form as their heading.
Upload your ontology descriptor files (see examples/ folder) under https://serveruri/Upload.html The descriptor files, containing tab-separated fields are:
- examples/dbpedia_2014_classes.csv Classname, "true"/"false", Description, where the boolean indicates whether or not the class describes individual names
- examples/dbpedia_2014_subclasses.csv Superclass, Subclass
- examples/dbpedia_2014_relations.csv domainClass, relation, rangeClass, from ("1", "m"), to ("1", "m"), isDataTypeProperty ("true", "false"), mergedName, description
- groups.csv from step 6.
Several notes here:
- Please be aware that this drops all existing data, we will likely provide a more convenient upgrade mechanism in the future.
- Since there is quite a mismatch between ontologies and what annotators are actually required to utilize, you currently have to provide those files manually. We are working on an addon that let's you pick and choose which parts of an ontology file you want to import.
The configuration also let's you configure URI prefixes for the triples in the RDF and annotation export files (examples are for dbpedia URIs).

Adding your (pre-annotated) corpus

Import a zipped dataset (tokenized publication + annotations). The bulk import script will automatically assign users to their respective publications. Filenames follow the scheme PublicationName_username.extension, where extension is csv for tokenizations and annodbfor pre-existing annotations (see example files).
```
php php/cli_import.php /path/to/importfile.zip
```
The import script automatically maps tokenized publications and their annotations to existing users by following the naming convention: <Publication Name>_<Username>.<Extension>. Make sure to add users beforehand (see step 5). Note that existing annotations must match class names as defined in the ontology.

Acknowledgements

This work has been funded by the Federal Ministry of Education and Research (BMBF, Germany) in the PSINK project (grant 031L0028A).

License

See the LICENSE file in the root directory of this repository.

floatSDSDS / SANTO