blester125 / text-rank

Text Rank with Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scipy sparse

blester125 opened this issue · comments

Eventually we may need a SparseAdjacencyMatrix graph class to handle large graphs, the AdjacencyList might be enough to handle it though, it depends on speed.

This is low priority and should only be done when we find a graph we can't handle.

One second thought this seems rather pointless. Text rank is done on each document separately so if you have a giant corpus that should be ok, because the memory usage will be bounded by the usage for the largest document.

Even in the case of a massive document a sparse implementation won't help much for summarization. The summary graph is created with a similarity score between each sentence and every other sentence in the text. This results in a fully connected graph so a sparse implementation won't help. In the case of keyword find it could help for massive documents because that graph should be sparser.