eklem / nrk-sapmi-crawler

Crawler for NRK Sapmi news bulletins that will be the basis for Sami stopword lists and an example search engine for content in Sami.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dependencies

eklem opened this issue · comments

  • Batr - Playwright is the important part, but testing could be good too
  • Cheerio or daq-proc. Depending on use case: Just crawling or Crawling and document processing.
  • node-fetch. To get the list of JSON.

Don't think I need cheerio since Playwright has CSS selectors built in.