This package is an adaptation of ReVerb to the French language. It has been created at RALI (Université de Montréal) by Philippe Langlais and his team.
The original ReVerb (on GitHub) was developed by the following people at the University of Washington's Turing Center as part of the KnowItAll Project.
- Anthony Fader http://www.cs.washington.edu/homes/afader
- Michael Schmitz http://www.schmitztech.com/
- Robert Bart (rbart at cs.washington.edu)
- Janara Christensen http://www.cs.washington.edu/homes/janara
- Niranjan Balasubramanian http://www.cs.washington.edu/homes/niranjan
- Jonathan Berant http://www.cs.tau.ac.il/~jonatha6
It is worth noting that the French version does not provide a confidence function for the triples produced.
By default, this package works exactly like the English version of ReVerb, documented on the original GitHub repository here, and therefore produces English triples from English text.
However, additions have been made for the user wishing to produce French triples from French text.
Start by building ReVerb from source, using Apache Maven
(http://maven.apache.org).
Go in the core
directory and run this command to download the required
dependencies, compile, and create a single executable jar file.
mvn clean compile assembly:single
If all goes well, the file
target/reverb-core-fr-1.0.0-jar-with-dependencies.jar
is produced and you can run the command line utility for French extraction by
running the script ./reverb-fr
and piping in input sentences, one per
line.
For instance,
echo "La mairesse a été élue en 2017." | ./reverb-fr
will produce
Arg1=La mairesse
Rel=a été élue en
Arg2=2017
RelLemma=avoir être élire en
RelCanon=être élire en
where Arg1, the relation and Arg2 are output. Also, a lemmatized version and a simplified version of the relation are produced. The simplified version is the result of applying a few heuristics on the relation in order to provide a canonical representation. This reduces the number of different relations when ReVerb is applied to large amounts of text. This in turn facilitates collating triples for various purposes.
It is possible to provide multiple sentences to the program, by piping in a file containing one sentence per line, e.g.
./reverb-fr < input-file
The program uses the UTF-8 encoding.
See the class ca.umontreal.rali.reverbfr.FrenchReVerbApplication
for the source code for the program mentioned above, as well as an example on
how to use this package. Make sure that the instruction
ReverbConfiguration.setLocale(Locale.FRENCH);
precedes all
extraction logic. This switches ReVerb from English to French. Without this,
ReVerb will expect English. Switching back to English within the same program
does not work.
For more information, please visit the ReVerb homepage at the University of Washington: http://reverb.cs.washington.edu.
Please contact Philippe Langlais for the adaptation to French.
The licenses for this package explicitly forbid commercial use.
If you use ReVerb in your academic work, please cite ReVerb with the following BibTeX citation:
@inproceedings{ReVerb2011,
author = {Anthony Fader and Stephen Soderland and Oren Etzioni},
title = {Identifying Relations for Open Information Extraction},
booktitle = {Proceedings of the Conference of Empirical Methods
in Natural Language Processing ({EMNLP} '11)},
year = {2011},
month = {July 27-31},
address = {Edinburgh, Scotland, UK}
}
If you use the French adaptation, please also cite:
@article {GottiLanglaisReverb2016,
title = {From French Wikipedia to Erudit: A test case for cross-domain open information extraction},
journal = {Computational Intelligence},
year = {2016},
keywords = {Entity classification, Named entities, Natural language processing, Open information extraction},
doi = {10.1111/coin.12120},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/coin.12120},
pdf = {http://rali.iro.umontreal.ca/rali/sites/default/files/publis/coin.12120.pdf},
author = {Fabrizio Gotti and Philippe Langlais}
}
The embedded French lemmatizer uses OpenNLP and the work of Nicolas Hernandez. See his website for more information.