Athanasius / ed-devtracker

Scraper and RSS generation for Frontier Developments (makers of Elite: Dangerous) forum developer posts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fetch and store+index (for search) full text of each post.

Athanasius opened this issue · comments

There are two downsides to not currently doing this:

  1. We only get so much of the text on larger replies.
  2. We don't get the context of any text quoted in a reply.
  • Retrieve full text on new posts.
  • Back-fill older stored posts with their full text.
  • Index the full text, both with and without quotes.
  • Further options on the search facility to include, or not, the quote text if present

9c851e9...7c88a0b is the code to do the fulltext scraping and storage in DB.

Output to the RSS file is controlled by a new config.txt parameter 'rss_fulltext'. Only if this contains the text 'true' (case insensitive), and nothing else, will it attempt to use something other than the old 'precis' text as description output per item.

Next up will be the back-fill.

The following commits have implemented the backfill script.

579ce4c
bfd318c

The collector now outputs both the old format file with just the precis text and a new one with the fulltext.

2f6fe48
04fa72a