neuml / txtchat

💭 Retrieval augmented generation (RAG) and language model powered search applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wikipedia dataset improvements

davidmezzetti opened this issue · comments

Make the following improvements to the Wikipedia datasets builder.

  • Use argparse and reduced number of parameters hardcoded
  • Change page views process to use pageviews complete
  • Add benchmark dataset in this format for evaluation testing