KarlErickson / inventory-tool

A simple content inventory tool to run across local folders containiner .md files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

These scripts use the Windows findstr command-line tool and some Python processing to search for terms across multiple documentation repositories. (Repositories are assumed to use the metadata formats for docs.microsoft.com.)

To use this tool:

  1. Make sure you have Python 3 installed. Download from https://www.python.org/downloads.

  2. Run pip install -r requirements.txt to install needed libraries. (If you want to use a virtual environment instead of your global environment, run python -m venv .env then .env\scripts\activate before running pip install.)

  3. Modify folders.txt to list the local folders you want to search and a base URL for the publish target. Each line contains a docset name, the local path (not including "*.md" which is appended automatically), and a base URL separated by any amount of whitespace. In the output, the URL is generated by replacing the folder in a file path with the base URL, removing ".md", and changing \ to /, as is suitable for Microsoft documentation platforms.

  4. Modify terms.txt to list the case-insensitive terms you want to search. Each line has an individual term and can include spaces and regular expressions (which are allowed by findstr). (If you need a case-sensitive search, remove /I from the findstr command line in take-inventory.py.)

  5. Run "python take-inventory.py" and output is generated in results_<date>_<random_int>.csv and results_<date>_<random_int>-with-metadata.csv files, the latter of which includes various metadata values extracted from the files in question (see extract-metadata.py, which is invoked at the end of take-inventory.py).

Note that after a run, the text_results folder contains intermediate files from the findstr command line, which are of the form <docset>-<search-term>.txt. These can be deleted once you have the .csv files.

The first time you run a search in a particular folder, the findstr command probably takes a minute or two, depending on the number of files. Subsequent runs, however, will happen much more quickly thanks to Windows' file system caching. This characteristic means that it's very quick and easy to modify search terms and run the tool again...you won't be waiting as long.

About

A simple content inventory tool to run across local folders containiner .md files


Languages

Language:Python 96.0%Language:Batchfile 4.0%