MarshalX / libprisma

Code highlight tokenizer written in C++

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a C++ porting of prism.js library. The code depends on Boost.Regex, as it's a faster and more comprehensive than STD's.

Grammars file is generated from prism.js source code itself, instructions later in the file.

Key concepts:

string text = ReadFile("grammars.dat");
m_highlighter = std::make_shared<SyntaxHighlighter>(text);
TokenList tokens = m_highlighter->tokenize(code, language);

for (auto it = tokens.begin(); it != tokens.end(); ++it)
{
    auto& node = *it;
    if (node.isSyntax())
    {
        const auto& child = dynamic_cast<const Syntax&>(node);
        // child.type() <- main token type (eg "include")
        // child.alias() <- "base" token type (eg "keyword")
        // child.begin() + node.end() <- list of tokens
    }
    else
    {
        const auto& child = dynamic_cast<const Text&>(node);
        // child.value() <- the actual text to highlight
    }
}

How to update

As mentioned, grammars dictionary is generated starting from prism.js source code. Currently, this is done manually by visiting prism's test drive. Once on the page, it is necessary to select all the languages, open the browser console and paste in both isEqual.js and generate.js. After a few seconds, the file grammars.dat will be downloaded.

TODO: would be great to automate this step, or at least to make the script auto-execute rather to require all the user input.

About

Code highlight tokenizer written in C++

License:MIT License


Languages

Language:JavaScript 71.0%Language:C++ 29.0%