atulkum / pointer_summarizer

pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Decoder output

ankitnit opened this issue · comments

I am getting same output for all the batches

You mean you are getting similar output for example in a single batch. During decode the batch size is same as beam size and all similar output in s single batch is as expected.
You can compare with the output which I got here

No, I have trained the model with 410000 iter with is_coverage=true,
and i am getting same summary for each batch when i am running decode.py

You might not want to train initially with is_coverage=true, that will make training unstable. You can try is_coverage=false and train and then compare the result.

Can you send me some example output, I want to see if it is a random string?

i tried with is_coverage=false and run 175000 iter and also i am comparing the result , it generate same summary for every batch

previous result with is_coverage=True https://drive.google.com/open?id=1vjMsFpoQxdSQCukLxnRNK_ghjJD3ENnG

Just check your input. The batcher doesn't expect binary file input, but string. If your inputs are binary files, just decode it to string.

have you solved the problem? @ankitnit

As @DominicSong said, I solved my problem by convert the artcile and abstract from bytes to string in about line 217, batcher.py . I run in python3.
article = str(article,encoding='utf-8')
abstract = str(abstract,encoding='utf-8')
abstract_sentences = [sent.strip() for sent in data.abstract2sents(abstract)]

You mean you are getting similar output for example in a single batch. During decode the batch size is same as beam size and all similar output in s single batch is as expected.
You can compare with the output which I got here

Hi, I don't understand why beam size and batch size are equal in current decode. As I play with it and set them equal the code works fine, otherwise it throws dimension mismatch error. I believe these two are independent and there might be a better decode implementation?

This is done to take advantage of GPU batch processing. If the beam size is B you need to run B rnns in parallel. Using a batch for beam makes decoding code cleaner and computationally efficient on GPU. If you want to increase beam size increase the batch size.