Problem of the input data format in preprocessing.

Question

Problem of the input data format in preprocessing.

Rainbow0625 opened this issue 4 years ago · comments

"The input data format for parsing should be raw document with one sentence per line."

I put a sentence in a file without a suffix ending in a period like the above, but the files after preprocessing are all 0 bytes.
Why is that?

Please help me, thank you very much!!!!

Rainbow0625 · Answer 1 · Wed Jun 24 2020 22:18:23 GMT+0800 (China Standard Time)

After I change the suffix of the input file to 'xxx.sent', there is a new error:

Start Stanford CoreNLP...
java -Xmx2500m -cp stanfordnlp/stanford-corenlp-full-2015-04-20/stanford-corenlp-3.5.2.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/stanford-corenlp-3.5.2-models.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/joda-time.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/xom.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/jollyday.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/protobuf.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/javax.json.jar:stanfordnlp/stanford-corenlp-full-2015-04-20/ejml-0.23.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props stanfordnlp/default.properties
Loading Models: 4/4
Read token,lemma,name entity file rawData.sent.prp...

[ERROR] Timeout
Traceback (most recent call last):
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 508, in parse
data = parse_parser_results_new(result)
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 154, in parse_parser_results_new
seqs = re.split("\r\n", text)
File "/anaconda3/lib/python3.7/re.py", line 213, in split
return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or bytes-like object

Traceback (most recent call last):
File "amr_parsing.py", line 437, in
main()
File "amr_parsing.py", line 170, in main
instances = preprocess(amr_file,START_SNLP=True,INPUT_AMR=args.amrfmt, PRP_FORMAT=args.prpfmt)
File "/Users/Rainbow/Desktop/AMR/AMRParsing/preprocessing.py", line 439, in preprocess
instances = proc1.parse(tmp_sent_filename)
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 511, in parse
raise e
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 508, in parse
data = parse_parser_results_new(result)
File "/Users/Rainbow/Desktop/AMR/AMRParsing/stanfordnlp/corenlp.py", line 154, in parse_parser_results_new
seqs = re.split("\r\n", text)
File "/anaconda3/lib/python3.7/re.py", line 213, in split
return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or bytes-like object