michaelnetbiz / parse-smcl

Parse SMCL Help Files into Markdown and HTML

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PARSE-SMCL | Parse SMCL Help Files into Markdown and HTML

This project contains a Python script that converts Stata help files in SMCL format (usually with the .sthlp or .hlp extensions) into proper HTML5 files.

It has multiple advantages to the current best alternative, log2html, such as:

  • Instead of copying the text in monospaced font and forgetting about most of the structure and meaning of the document, it preserves it.
  • Links, anchors, headings, etc. are preserved and tagged, so it's very easy to change their look with CSS files.
  • Tables are automatically generated (standard tables as well as syntax tables).
  • Navigation menus ("see also" and "jump to") are preserved.
  • When possible, fragments that contain enumerations are translated into actual or ordered/unordered lists. Similarly, code samples are translated into pre tags.
  • Because it uses CSS, you can change the styles freely, and for instance add/remove heading numeration, add syntax coloring for code samples, etc.
  • It uses responsive CSS so it's easier to read on tablets, large screens, etc. Similarly, it's easier to print.

Sample Output

Some examples include:

  • generate and summarize by StataCorp. They work without problems.
  • regress and var by StataCorp work well, but are missing a few directives.
  • reghdfe and [hdfe](http://scorreia.com/demo/hdfe.html; work without problems.
  • psmatch2 by Ewin Leuven and Barbara Sianesi.
  • a2reg by Amine Ouazad. Even though it uses the old version of the help files, it still works.
  • bayesmh by StataCorp. It uses many advanced (Stata 14) directives but is still quite readable.

(Note: I do not own the copyright of the original files, they are used merely as an example of the use case)

Usage

To use this script, just run smcl2html.py:

> smcl2html
usage: smcl2html.py [-h] [--output OUTPUT] [--adopath ADOPATH] [--standalone]
                    [--view] [--xml]
                    filename

The arguments and flags are:

  • filename: the name of the file with .sthlp or .hlp extension.
  • output: (optional) the name of the output file. If not given, same as filename but with a .html extension.
  • adopath: the path of the stata/ado/base folder. Needed to replace the INCLUDE xyz directives.
  • standalone instead of outputting a simple
    -contained file, it will wrap the output with full html tags, including CSS and font links. Always use this option unless you want to embed the results into another page.
  • view: opens the resulting file in the browser.
  • xml: outputs an intermediate file, only for debug purposes.
  • help: shows this information

A typical command line would be:

smcl2html.py somehelpfile.sthlp --adopath=C:\Stata13\ado\base --view --standalone

Installation

  1. Download the latest Python 3.x: https://www.python.org/downloads/
  2. Install the lxml library: http://lxml.de/installation.html (or http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml for Windows; read this if you get stuck).

Missing Features

Since this is work-in-progress, there are still a few limitations:

  • The existing CSS is a proof-of-concept, and can be improved a lot more
  • Some SMCL directives are not supported. For instance, {c} and {space}. However, these are relatively minor and easy to implement.
  • Some advanced directives such as {findalias} are still not supported, but implementing them is quite doable

About

Parse SMCL Help Files into Markdown and HTML

License:MIT License


Languages

Language:Python 78.8%Language:CSS 21.2%