venkataramanab / ads-txt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update ads.txt

Ads.txt Scraper

Steps

  • Open command prompt on this directory OR 'cd' into this directory.

  • Initialize dependencies (required only on first install)

      pip3 install -r requirements.txt
    
      OR
    
      pip install -r requirements.txt
    
  • Run command

      python3 start.py
    
      OR
    
      python start.py
    
  • Enter required filenames:

    1. Targets filename - includes the urls/bundle_ids to be targeted - must be a txt file
    2. Searches filename - includes texts to be searched - must be a txt file - if not provided, takes in default columns from program itself
    3. Output filename - a csv file that will contain the output - if not provided, "results.csv" is taken as default

Info

  • In case of any error while scraping a page, the scraper will retry after 2 seconds of wait.
  • In case of any error at target level, the scraper will retry after 5 seconds of wait.
  • Whatever maybe the case, the failed target will be dumped into a file named "failed.txt"
    • This automatically generated file can be used again as the targets file.

About


Languages

Language:Python 100.0%