c-amr / camr

Transition-based tree-to-graph AMR Parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Preprocessing data

kexinliao opened this issue · comments

Hi there,

I got empty outputs after running the preprocessing command:
python amr_parsing.py -m preprocess [input_sentence_file](My input sentence file was raw document with one sentence per line.)
This generated .tok, .prp and .charniak.parse.dep files, but all of them were just empty.
Anyone can help with this issue?

Log info:
Start Stanford CoreNLP...
java -Xmx2500m -cp stanfordnlp/stanford-corenlp-full-2013-06-20/stanford-corenlp-3.2.0.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/stanford-corenlp-3.2.0-models.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/joda-time.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/xom.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/jollyday.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props stanfordnlp/default.properties
Loading Models: 0/4
Loading Models: 1/4
Loading Models: 2/4
Loading Models: 3/4
Loading Models: 4/4
Read token,lemma,name entity file test_input.txt.sent.prp...
Loading Charniak parser model: WSJ+Gigaword ...
Begin Charniak parsing ...
Convert Charniak parse tree to Stanford Dependency tree ...
Read dependency file test_input.txt.sent.tok.charniak.parse.dep...
Done preprocessing!

commented

Hi, try removing all the old empty file and re-run it again, the parser did this cache thing and will not overwrite the file if it is there.

So my input file in the command should be the original file name or file
name with .sent extension?

On Thu, Jul 14, 2016 at 10:54 AM, IceIceRabbit notifications@github.com
wrote:

Make a copy of the sentence file and rename it the same with the .txt.sent
extension it should work provided your sentences file is only sentences


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#4 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ATS0NPmecj61GEbvWKRaDLazywYAk_cvks5qVk27gaJpZM4JJayN
.

your original file , your .sent file should have all the sentences extracted from the original file.
If the original file is only sentences you can just copy it and rename it with .sent extension from what I have understood.

@IceIceRabbit Thanks! Finally I'm able to preprocess the sentence file.
Now I got the following error when parsing the sentence file to amr.

Traceback (most recent call last):
File "amr_parsing.py", line 439, in
main()
File "amr_parsing.py", line 390, in main
if args.section != 'all':
File "/home/kexin/AMRParsing/model.py", line 352, in load_model
model = pickle.load(f)
cPickle.UnpicklingError: invalid load key, 'B'.

The pre-trained model 'LDC2013E117.train.basic-abt-charniak.m' was downloaded from the link in the readme file.

yeah, there seems to be problem with the old model,I believe that was trained on the parser before its latest update , you can try to train your own model that should work.

@Juicechuan Hi Chuan,
Could you provide us a pre-trained model which is trained on the updated parser?

commented

@kexinliao Hi I've uploaded the new model (there is still some problem with the semeval model, but I should be able to upload it recently). Let me know if there is any question or problem.

@Juicechuan Thanks! It works now.