dig-team / amie

Mavenized AMIE+Typing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OutOfMemoryError

davidshumway opened this issue · comments

Is there any way to increase the available memory space within the program? The TSV file size here is 340MB.

$ java -jar amie.jar -const -nc 2 -verbose sample.tsv 
Using the default schema relations
Assuming 9 as type relation
Loading files... 
  Starting sample.tsv

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"

Hello,

How much memory do you have? How big is your file? You could try with -XX:-UseGCOverheadLimit -Xmx[AVAILABLE-MEMORY], e.g., AVAILABLE_MEMORY=16G

Best,
Luis

So just for example that would be:

$ java -XX:-UseGCOverheadLimit -Xmx16G  -jar amie.jar -const -nc 2 -verbose sample.tsv

Hello,

How much memory do you have? How big is your file? You could try with -XX:-UseGCOverheadLimit -Xmx[AVAILABLE-MEMORY], e.g., AVAILABLE_MEMORY=16G

Best, Luis

The file size is a little over 1GB.

Still seems to run into the same issue (Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0")

Hi David,

Well, it seems your input KG is just too complex for a system with 16GB of RAM (is that the actual amount of RAM in your system?). How many triples do you have? How many different predicates are there? These two factors define how much resources are needed.

Cheers,
Luis

Hi lagarra@lagarra,

Recently, I met a similar issue like davidshumway@davidshumway mentioned above. I have 18863779 triples with 9876686 different entities and 550 different predicates. Here is my instruction: java -XX:-UseGCOverheadLimit -Xmx32G -jar amie-milestone-intKB.jar all_triples.txt --bias lazy --minpca 0.8 --htr similarTo --maxad 5 --full > mined_rules.txt where all_triples.txt is my input file, mined_rules.txt is my expected output file and similarTo is my target rule head. However, it met some exceptions after running nearly a day (24 hours), here is the results:

Assuming 9 as type relation
Loading files... 
  Starting all_triples
  Finished all_triples, still running: 0
Loaded 18863779 facts in 1 min, 38 s using -696 MB
Using HeadCoverage as pruning metric with minimum threshold 0.01
Using recursivity limit 3
Using the FULL configuration.
Enabling functionality heuristic with ratio for pruning of low confident rules
Building overlap tables for confidence approximation...... Overlap tables computed in 1 min, 8 s using 16 threads.
Lazy mining assistant that stops counting when the denominator gets too high
No minimum threshold on standard confidence
Filtering on PCA confidence with minimum threshold 0.1
Constants in the arguments of relations are disabled
Lossless (query refinement) heuristics enabled
MRT calls: 0
Starting the mining phase... Using 16 threads
Rule	Head Coverage	Std Confidence	PCA Confidence	Positive Examples	Body size	PCA Body size	Functional variable
Using the default schema relations
java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at amie.mining.assistant.MiningAssistant.applyMiningOperators(MiningAssistant.java:1389)
        at amie.mining.AMIE$RDFMinerJob.run(AMIE.java:422)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap.rehash(Int2IntOpenHashMap.java:1128)
        at it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap.insert(Int2IntOpenHashMap.java:267)
        at it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap.put(Int2IntOpenHashMap.java:275)
        at amie.data.U.increase(U.java:58)
        at amie.data.U.increase(U.java:47)
        at amie.data.KB.countBindings(KB.java:2899)
        at amie.data.KB.countProjectionBindings(KB.java:3070)
        at amie.mining.assistant.DefaultMiningAssistant.getClosingAtoms(DefaultMiningAssistant.java:173)
        ... 7 more
Exception in thread "Thread-23" java.lang.NullPointerException
        at amie.mining.AMIE$RDFMinerJob.run(AMIE.java:434)
        at java.base/java.lang.Thread.run(Thread.java:829)

Would you have any suggestions? I also find that AMIE3 can obtain rules in Wikidata 2019 (a really big dataset) with nearly 16 hours according to Table 3 in AMIE3 paper, could you provide the corresponding bash script as a reference? Furthermore, considering that we only use a specific predicate as the rule head, maybe we can obtain rules with less time? Thanks in advance!

Hi @nxznm,

If you need the bash script, @lajus is the right person to ask.

Furthermore, considering that we only use a specific predicate as the rule head, maybe we can obtain rules with less time?

That is probably the best you can do. You can use the -htr "relation1,relation2,...." argument and limit your search space to rules that predict a subset of the relations. That will for sure reduce your memory consumption.

I also realized that you are aiming for very long rules (up to 5 atoms). Is it a hard constraint for you? Could you for example check whether mining shorter rules is viable?

Cheers,
Luis

Thanks for your valuable suggestions! I will follow your advice to check whether fewer atoms (--maxad) would help.
And sorry to bother you @lajus , would you please provide the bash scripts on some large datasets like DBpedia 3.8 or Wikidata 2019 in AMIE3 as a reference for me? I will appreciate your help!