atulkum / pointer_summarizer

pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python3 support?

johntiger1 opened this issue · comments

Any chance we'll get a Python 3 version? Thanks

I think this code will run on python 3. Let me know if you find any problem.

I migrated it to python 3 but still not tested fully https://github.com/atulkum/pointer_summarizer/tree/transformer_encoder
You can try this branch

Thanks! Will take a look

Hey I noticed that this branch uses the transformer as encoder? So it is not exactly comparable to the main branch right?

You can disable transformer encoder by setting
use_lstm=True
in config.py

Thanks! And just realized we can merge the branches, it is just readme.md that differs (and some config stuff)

I will merge it after testing transformer encoder

Thanks, it seems to be OK with my current testing. Will let you know once it is done training. Btw, have you heard of tqdm? It could help reduce some of the logging/timing code

So I left it running overnight on a Titan X and just have this line printed a bunch: INFO:tensorflow:Bucket queue size: 100, Input queue size: 800; is that normal? No loss or acc printed anywhere

you can start the tensorboard and see the progress of learning curve

Thanks! This is currently what I get, I guess it's working OK?

image

Btw, do you know how long your 500k iterations took? I'm worried the training is taking too long right now. (seems like 1 day for just 5k)

Hey @atulkum is there any update to this?

1 day for 5k is too long on my GTX 1070 8GB it took 3 days to train 500k iterations.
By the way if you want you can try transformer encoder. It might improve the speed but you might need to fine-tune the parameters. These are the set of parameters.

https://github.com/atulkum/pointer_summarizer/blob/transformer_encoder/training_ptr_gen/model.py#L53

I am thinking about this idea that you can replace an LSTM with a transformer encoder. I already got a similar result for NER on conll2003 data set.

https://github.com/atulkum/sequence_prediction/blob/master/neural_ner/model_tx.py
https://github.com/atulkum/sequence_prediction/blob/master/neural_ner/model_lstm.py

Let me know if you are interested in running these experiments.