Preprocessing data

Question

Preprocessing data

kexinliao opened this issue 8 years ago · comments

Hi there,

I got empty outputs after running the preprocessing command:
python amr_parsing.py -m preprocess [input_sentence_file](My input sentence file was raw document with one sentence per line.)
This generated .tok, .prp and .charniak.parse.dep files, but all of them were just empty.
Anyone can help with this issue?

Log info:
Start Stanford CoreNLP...
java -Xmx2500m -cp stanfordnlp/stanford-corenlp-full-2013-06-20/stanford-corenlp-3.2.0.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/stanford-corenlp-3.2.0-models.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/joda-time.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/xom.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/jollyday.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props stanfordnlp/default.properties
Loading Models: 0/4
Loading Models: 1/4
Loading Models: 2/4
Loading Models: 3/4
Loading Models: 4/4
Read token,lemma,name entity file test_input.txt.sent.prp...
Loading Charniak parser model: WSJ+Gigaword ...
Begin Charniak parsing ...
Convert Charniak parse tree to Stanford Dependency tree ...
Read dependency file test_input.txt.sent.tok.charniak.parse.dep...
Done preprocessing!

Chuan · Answer 1 · Fri Jul 15 2016 12:01:50 GMT+0800 (China Standard Time)

Hi, try removing all the old empty file and re-run it again, the parser did this cache thing and will not overwrite the file if it is there.

kexinliao · Answer 2 · Mon Jul 18 2016 13:52:25 GMT+0800 (China Standard Time)

So my input file in the command should be the original file name or file
name with .sent extension?

On Thu, Jul 14, 2016 at 10:54 AM, IceIceRabbit notifications@github.com
wrote:

Make a copy of the sentence file and rename it the same with the .txt.sent
extension it should work provided your sentences file is only sentences

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#4 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ATS0NPmecj61GEbvWKRaDLazywYAk_cvks5qVk27gaJpZM4JJayN
.

IceIceRabbit · Answer 3 · Mon Jul 18 2016 20:02:56 GMT+0800 (China Standard Time)

your original file , your .sent file should have all the sentences extracted from the original file.
If the original file is only sentences you can just copy it and rename it with .sent extension from what I have understood.

kexinliao · Answer 4 · Mon Jul 18 2016 20:49:39 GMT+0800 (China Standard Time)

@IceIceRabbit Thanks! Finally I'm able to preprocess the sentence file.
Now I got the following error when parsing the sentence file to amr.

Traceback (most recent call last):
File "amr_parsing.py", line 439, in
main()
File "amr_parsing.py", line 390, in main
if args.section != 'all':
File "/home/kexin/AMRParsing/model.py", line 352, in load_model
model = pickle.load(f)
cPickle.UnpicklingError: invalid load key, 'B'.

The pre-trained model 'LDC2013E117.train.basic-abt-charniak.m' was downloaded from the link in the readme file.

IceIceRabbit · Answer 5 · Tue Jul 19 2016 00:00:15 GMT+0800 (China Standard Time)

yeah, there seems to be problem with the old model,I believe that was trained on the parser before its latest update , you can try to train your own model that should work.

kexinliao · Answer 6 · Wed Jul 20 2016 04:49:55 GMT+0800 (China Standard Time)

@Juicechuan Hi Chuan,
Could you provide us a pre-trained model which is trained on the updated parser?

Chuan · Answer 7 · Sun Jul 24 2016 12:01:00 GMT+0800 (China Standard Time)

@kexinliao Hi I've uploaded the new model (there is still some problem with the semeval model, but I should be able to upload it recently). Let me know if there is any question or problem.

kexinliao · Answer 8 · Sun Jul 24 2016 21:48:17 GMT+0800 (China Standard Time)

@Juicechuan Thanks! It works now.