fanfannothing / ProPPR

Graph-algorithm inferences over local groundings of first-order logic programs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ProPPR: PROGRAMMING WITH PERSONALIZED PAGERANK
==============================================
This is a Java package for using graph walk algorithms to perform inference tasks over local groundings of first-order logic programs. The package makes use of parallelization to substantially speed processing, making it practical even for large databases.

Contents:
1. Build
2. Run
   2.0. Overview of Java main() classes
   2.1. Experiment pipeline
        2.1.0. Experiment: full pipeline
        2.1.1. ExampleCooker: Grounding or "Cooking"
        2.1.2. Trainer: SGD learning
        2.1.3. Tester: Evaluating results
   2.2. Utilities
        2.2.0. QueryAnswerer: Answering queries against a program
        2.2.1. Prompt: Interact with ProPPR machinery in a shell
3. File Formats


1. BUILD
========

ProPPR $ ant clean build


2. RUN
======

For all run phases, control logging output using conf/log4j.properties.


2.0. RUN: JAVA MAIN CLASSES
===========================

edu.cmu.ml.Experiment - Full pipeline: ground, train, test.

edu.cmu.ml.praprolog.ExampleCooker - for grounding (used by grounding.sh)

edu.cmu.ml.praprolog.trove.Trainer - single-threaded parameter-learning over labeled examples
edu.cmu.ml.praprolog.trove.MultithreadedTrainer - multithreaded parameter-learning
    - if the trove *Trainer classes have trouble, there are less memory-efficient but potentially easier-to-debug versions which use string keys:
      edu.cmu.ml.praprolog.Trainer
      edu.cmu.ml.praprolog.MultithreadedTrainer

edu.cmu.ml.praprolog.Tester - single-threaded evaluation of trained parameters over labeled examples

edu.cmu.ml.praprolog.prove.Prompt - interactive prompt for exploring state trees in grounding proofs - good for debugging rulefiles

edu.cmu.ml.praprolog.QueryAnswerer - compute query solutions, using trained or untrained parameters


2.1. RUN: EXPERIMENT PIPELINE
=============================

2.1.0. RUN: EXPERIMENT PIPELINE: EXPERIMENT
===========================================

Use the file formats below to create files that specify your program's
rules and fact DBs, and optionally a graph. Create training and testing data
files of positive and negative examples for queries on the
program. Details on file format are below. Place these files in their
own directory.

ProPPR $ ls my_program
facts.facts	graph.graph	rules.rules train.data test.data

Use the provided script 'compile.sh' to convert these files into 
ProPPR-readable formats:

ProPPR $ scripts/compile.sh my_program
compiling my_program/rules.rules to my_program/rules.crules
parsing my_program/rules.rules
Converting my_program/facts.facts

Now you can run the full pipeline on the resulting files. The compile script
generates the --programFiles argument for you:

ProPPR $ export PROGRAMFILES=`cat my_program/programFiles.arg` 
ProPPR $ java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.Experiment \
--programFiles ${PROGRAMFILES%:}
--train my_program/toytrain.data --output my_program/train.cooked \
--test my_program/toytest.data --params my_program/params.wts
 INFO [Component] Loading from file 'my_program/rules.crules' with alpha=0.0 ...
 INFO [Component] Loading from file 'my_program/facts.cfacts' with alpha=0.0 ...
 INFO [Component] Loading from file 'my_program/graph.graph' with alpha=0.0 ...
 INFO [Experiment] Cooking training examples from my_program/toytrain.data...
 INFO [Experiment] Training model parameters...
 INFO [Trainer] Importing cooked examples from my_program/train.cooked
 INFO [Trainer] Imported 11 examples
 INFO [Trainer] Training on cooked examples...
 INFO [Trainer] epoch 1 ...
 INFO [Trainer] epoch 2 ...
 INFO [Trainer] epoch 3 ...
 INFO [Trainer] epoch 4 ...
 INFO [Trainer] epoch 5 ...
 INFO [Trainer] epoch 6 ...
 INFO [Trainer] epoch 7 ...
 INFO [Trainer] epoch 8 ...
 INFO [Trainer] epoch 9 ...
 INFO [Trainer] epoch 10 ...
 INFO [Trainer] epoch 11 ...
 INFO [Trainer] epoch 12 ...
 INFO [Trainer] epoch 13 ...
 INFO [Trainer] epoch 14 ...
 INFO [Trainer] epoch 15 ...
 INFO [Trainer] epoch 16 ...
 INFO [Trainer] epoch 17 ...
 INFO [Trainer] epoch 18 ...
 INFO [Trainer] epoch 19 ...
 INFO [Trainer] epoch 20 ...
 INFO [Trainer] Finished in 1816 ms
 INFO [Experiment] Saving parameters to my_program/params.wts...
 INFO [Experiment] Testing on my_program/toytest.data...
 INFO [Tester] pairTotal 7.0 pairErrors 0.0 errorRate 0.0 map 1.0
result= running time 2836
result= pairs 7.0 errors 0.0 errorRate 0.0 map 1.0

The full list of options is available (including setting details on prover type,
number of threads, loss output, etc) by running Experiment with no arguments.


2.1.1. RUN: EXPERIMENT PIPELINE: EXAMPLECOOKER
==============================================

Use the provided script 'ground.sh'

Use the file formats below to create files that specify your program's
rules and fact DBs, and optionally a graph. Create a training data
file of positive and negative examples for queries on the
program. Details on file format are below. Place these files in their
own directory.

ProPPR $ ls my_program
facts.facts	graph.graph	rules.rules train.data

Run the grounding script on the directory, and specify the training data file explicitly:

ProPPR $ scripts/ground.sh my_program my_program/train.data
--- Compiling my_program first:
scripts/compile.sh my_program
compiling my_program/rules.rules to my_program/rules.crules
parsing my_program/rules.rules
Converting my_program/facts.facts
--- Cooking my_program/train.data:
java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.ExampleCooker --programFiles my_program/rules.crules:my_program/facts.cfacts:my_program/graph.graph --data my_program/train.data --output my_program/train.cooked
 INFO [Component] Loading from file 'my_program/rules.crules'...
 INFO [Component] Loading from file 'my_program/facts.cfacts'...
 INFO [Component] Loading from file 'my_program/graph.graph'...
171
Done.
--- Done.
ProPPR $ ls my_program/
programFiles.arg  rules.crules  rules.rules  facts.cfacts  facts.facts  train.cooked  train.data  graph.graph


You can pass additional arguments to the cooker, if you wish to
specify prover or multhreading settings:

	[--prover { ppr | dpr[:eps] | tr[:depth] }]
		(default prover is ppr)
		(default dpr epsilon=0.0001)
		(default tr depth=5)
	[--threads NUMBER]
		use multithreading (default implementation is single-threaded)

ProPPR $ scripts/ground.sh my_program my_program/toytrain.data --prover dpr:0.00001 --threads 3
--- Compiling my_program first:
scripts/compile.sh my_program
compiling my_program/textcat.rules to my_program/textcat.crules
parsing my_program/textcat.rules
Converting my_program/toylabels.facts
--- Cooking my_program/toytrain.data:
java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.ExampleCooker --programFiles my_program/textcat.crules:my_program/toylabels.cfacts:my_program/toywords.graph --data my_program/toytrain.data --output my_program/toytrain.cooked --prover dpr:0.00001 --threads 3
 INFO [Component] Loading from file 'my_program/textcat.crules'...
 INFO [Component] Loading from file 'my_program/toylabels.cfacts'...
 INFO [Component] Loading from file 'my_program/toywords.graph'...
1124
Done.
--- Done.


To reset your project directory to just the original rule, fact,
graph, and data files, use the script:

ProPPR $ scripts/clean.sh my_program/
ProPPR $ ls my_program
facts.facts	graph.graph	rules.rules train.data


2.1.2. RUN: EXPERIMENT PIPELINE: TRAINER
========================================

Given a cooked file we can learn the parameter weights that best
support the training examples:

ProPPR $ java edu.cmu.ml.praprolog.trove.Trainer my_program/toytrain.cooked my_program/toytrain.params

You can specify additional options if you like:
    --epochs {int}	Number of epochs (default 5)
    --traceLosses	Turn on traceLosses (default off)
		             	NB: example count for losses is sum(x.length() for x in examples)
		             	and won't match `wc -l cookedExampleFile`


For multithreaded learning, use:

ProPPR $ java edu.cmu.ml.praprolog.trove.MultithreadedTrainer my_program/toytrain.cooked my_program/toytrain.params

With the same options as above, plus:
    --threads {int}  Number of threads (default 4)
    --rr             Use round-robin scheduling (default:queue) (RECOMMENDED)


2.1.3. RUN: EXPERIMENT PIPELINE: TESTER
=======================================

Given a cooked file of testing examples and the model parameters output from 
Trainer, we can get the MAP of the trained model over the test data.

If you are running this step you should already have a compiled program 
my_program, likely cooked training data, and the trained parameters:

ProPPR $ ls my_program/
facts.cfacts  facts.facts  graph.graph  params.wts programFiles.arg  
rules.crules  rules.rules  test.data train.cooked  train.data

Run the tester:

ProPPR $ export PROGRAMFILES=`cat my_program/programFiles.arg` 
ProPPR $ java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.Tester \
--programFiles ${PROGRAMFILES%:}
--test my_program/test.data --params my_program/params.wts
 INFO [Tester] flags: 0x59
 INFO [Component] Loading from file 'my_program/rules.crules' with alpha=0.0 ...
 INFO [Component] Loading from file 'my_program/facts.cfacts' with alpha=0.0 ...
 INFO [Component] Loading from file 'my_program/graph.graph' with alpha=0.0 ...
 INFO [Tester] Testing on my_program/test.data...
 INFO [Tester] pairTotal 7.0 pairErrors 0.0 errorRate 0.0 map 1.0
result= running time 113
result= pairs 7.0 errors 0.0 errorRate 0.0 map 1.0


2.2. RUN: UTILITIES
===================

2.2.0. RUN: UTILITIES: QUERYANSWERER
====================================

If you want to use a program to answer a series of queries, you can
use the QueryAnswerer class.  If you are running this step you should
already have a compiled program and a file containing a list of
queries, one per line.  Each query is a single goal.

ProPPR $ cat testcases/family.queries
sim(william,X)
sim(rachel,X)

ProPPR $ java edu.cmu.ml.praprolog.QueryAnswerer \
  --programFiles testcases/family.cfacts:testcases/family.crules \
  --queries testcases/family.queries --output answers.txt
 INFO [Component] Loading from file 'testcases/family.cfacts' with alpha=0.0 ...
 INFO [Component] Loading from file 'testcases/family.crules' with alpha=0.0 ...

ProPPR $ cat answers.txt
# proved	sim(william,-1)	47 msec
1	0.8838968104504825	-1=c[william]
2	0.035512510088781264	-1=c[lottie]
3	0.035512510088781264	-1=c[rachel]
4	0.035512510088781264	-1=c[sarah]
5	0.002391414820793351	-1=c[poppy]
6	0.0017935611155950133	-1=c[lucas]
7	0.0017935611155950133	-1=c[charlotte]
8	0.0017935611155950133	-1=c[caroline]
9	0.0017935611155950133	-1=c[elizabeth]
# proved	sim(rachel,-1)	18 msec
1	0.9094251636624519	-1=c[rachel]
2	0.0452874181687741	-1=c[caroline]
3	0.0452874181687741	-1=c[elizabeth]


2.2.1. RUN: UTILITIES: PROMPT
=============================

An interactive prompt can be useful while debugging logic program issues, because you can examine a single query in detail. If you are running this step you should already have a compiled program.

Starting up the prompt:

"""
ProPPR $ java -cp conf/:bin/:lib/* edu.cmu.ml.praprolog.prove.Prompt --programFiles ${PROGRAMFILES%:}
Starting up beanshell...
prv set: edu.cmu.ml.praprolog.prove.TracingDfsProver@57fdc2d
 INFO [Component] Loading from file 'kbp_prototype/doc.crules' with alpha=0.0 ...
 INFO [Component] Loading from file 'kbp_prototype/kb.cfacts' with alpha=0.0 ...
 INFO [Component] Loading from file 'kbp_prototype/lp_predicate_SF_ENG_001-50doc.graph' with alpha=0.0 ...
lp set: edu.cmu.ml.praprolog.prove.LogicProgram@2225a091

Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing.

BeanShell 2.0b4 - by Pat Niemeyer (pat@pat.net)
bsh % 
"""

When it starts up, Prompt instantiates the logic program from the command line as 'lp', and a default prover which prints a depth-first-search-style proof of a query (default maximum depth is 5). You can specify a different prover on the command line if you wish. For information on built-in commands and interpreter syntax, type 'help();':

"""
bsh % help();
This is a beanshell, a command-line interpreter for java. A full beanshell manual is available at <http://www.beanshell.org/manual/contents.html>.

Type java statements and expressions at the prompt. Don't forget semicolons.

Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing.

'show();' will toggle automatic printing of the results of expressions. Otherwise you must use 'print( expr );' to see results.

'javap( x );' will list the fields and methods available on an object. Be warned; beanshell has trouble locating methods that are only defined on the superclass.

'[sol = ]run(prover,logicprogram,"functor(arg,arg,...,arg)")' will prove the associated state.

'pretty(sol)' will list solutions first, then intermediate states in descending weight order.

bsh %
"""
 

3. FILE FORMATS
===============

****** File format: *.rules

Example:
predict(X,Y) :- hasWord(X,W),isLabel(Y),related(W,Y)  #r.
related(W,Y) :- # w(W,Y).

Grammar:
    line= rhs ':-' lhs ('#' featureList)? '.'
    rhs= goal
    lhs=
      |= goal (',' goal)*
    featureList=
              |= goal (',' goal)*
    goal= functor
       |= functor '(' argList ')'
    argList= constantArgList
          |= variableArgList
          |= constantArgList ',' variableArgList
    constantArgList= constantArg (',' constantArg)*
    variableArgList= variableArg (',' variableArg)*
    constantArg= [a-z][a-zA-Z0-9]*
    variableArg= [A-Z][a-zA-Z0-9]*
    functor= [a-z][a-zA-Z0-9]*

****** File format: *.facts

Example:
isLabel(pos)
isLabel(neg)

Grammar:
    line= goal

****** File format: *.graph

Example:
hasWord bk      punk
hasWord bk      queen
hasWord bk      barbie
hasWord bk      and
hasWord bk      ken
hasWord rb      a
hasWord rb      little
hasWord rb      red
hasWord rb      bike
hasWord mv      a
hasWord mv      big
hasWord mv      7-seater
hasWord mv      minivan
hasWord mv      with
hasWord mv      an
hasWord mv      automatic
hasWord mv      transmission
hasWord hs      a
hasWord hs      big
hasWord hs      house
hasWord hs      in
hasWord hs      the
hasWord hs      suburbs
hasWord hs      with
hasWord hs      crushing
hasWord hs      mortgage

Grammar:
    line= edge '\t' sourcenode '\t' destnode
    edge= functor
    sourcenode,destnode= constantArg

****** File format: *.data

Example:
predict(bk,Y)   -predict(bk,neg)        +predict(bk,pos)
predict(rb,Y)   -predict(rb,neg)        +predict(rb,pos)
predict(mv,Y)   +predict(mv,neg)        -predict(mv,pos)
predict(hs,Y)   +predict(hs,neg)        -predict(hs,pos)

Grammar:
    line= query '\t' exampleList
    query= goal
    exampleList= example ('\t' example)*
    example= positiveExample
          |= negativeExample
    positiveExample= '+' goal
    negativeExample= '-' goal

About

Graph-algorithm inferences over local groundings of first-order logic programs


Languages

Language:Java 73.7%Language:Python 19.5%Language:Perl 3.5%Language:Shell 2.0%Language:Makefile 0.9%Language:Prolog 0.2%Language:R 0.2%