eklem / nrk-sapmi-crawler

Crawler for NRK Sapmi news bulletins that will be the basis for Sami stopword lists and an example search engine for content in Sami.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get list data and process

eklem opened this issue · comments

function to

  • get JSON data
  • read file if exists, create if not
  • if not, just write all IDs to file
  • if exists, find IDs currently not present and add. Needs a crawled-flag.

Array of objects to write (and read)

[
  {
    "id": "1.15778840",
    "parent-id": "1.13572943",
    "crawled": false
  }
]
  • Working: Writing list when no data previously exists.
  • Missing: Merging arrays of objects

Done with 6e0ac43