Hellisotherpeople / CX_DB8

a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)

Home Page:https://huggingface.co/spaces/Hellisotherpeople/Unsupervised_Extractive_Summarization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support *actual* textrank

Hellisotherpeople opened this issue · comments

I'm not actually doing the proper TextRank algorithm and I should experiment with that to see how effective it is.

Going to implement it with networkx most likely, shouldn't be difficult. Might be slow for large documents with word level models.

Wow! Implementing TextRank properly dramatically increased the coherency of my summaries - I guess that it makes sense that doing a walk through the word-embedding powered graph will give more coherent summaries.

Unfortunate side effect - speed of summarization takes a sizeable hit unless I can find a better implementation of PageRank.

I haven't actually merged that code yet to the repo - I'll do that soon so that other people can try textrank or other graph algorithms