rekow / lexis

a very simple lexer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#lexis a very simple lexer.

##Installation ###node Add to your dependencies in package.json:

  ...
  "dependencies": {
    "lexis": "~0.0.4",
    ...
  },
  ...

or install directly:

npm install --save lexis

then just import the lexis module:

var lexis = require('lexis');

###browser Include the minified javascript in your project. The library is available at window.lexis, and is also defined as a requireable module:

// using the globally exported library
var lexis = window.lexis;

// using requireJS
define('myModule', ['lexis'], function (lexis) { ... });

##Usage ###instantiating

var lexer = lexis({
  'LPAREN': '(',
  'RPAREN': ')',
  'ALPHANUM': 'RE::[a-zA-Z0-9]+'
});

Call lexis to retrieve a new Lexer instance. lexis() optionally accepts an instance of Lexicon, or a mapping of rule definitions to construct one with. If no lexicon is provided at construction time, lexer.setLexicon() should be called with one prior to lexing. ###defining rules Rules are how a Lexer matches input to a valid Token, and the full set of a rules a Lexer can match against forms its Lexicon. They are declared as a map of Token types to rule matcher strings:

var rules = {
  'LPAREN': '(',
  'RPAREN': ')',
  'ALPHANUM': 'RE::[a-zA-Z0-9]+'
};

There are two kinds of matcher strings - the first is a simple string match. In the example above, ( would be processed as a Token with a type of LPAREN. Note that this is not limited to single character matches - in the sample below, ( will only match LPAREN and (( will be processed as a single DBL_LPAREN, rather than two LPARENs:

var rules = {
  'LPAREN': '(',
  'DBL_LPAREN': '(('
};

The second kind of matcher string is a complex RegExp matcher. These consist of a special prefix RE::, followed by a regex string. If using special characters containing backslashes (such as \b), make sure the backslash is escaped in the matcher string so the final RegExp is constructed correctly:

var rules = {
  'ALPHANUM': 'RE::[a-zA-Z0-9]+',
  'BOUNDARY': 'RE::\\b'
};

###lexing

var futureTokenList = lexer.lex('(abc1234)');

Calling lex() on a Lexer with a source string returns a Future (promise) for a list of tokens. If no source is passed, lexer.setSource() must have been called prior to lexing with a string, or an error will occur.

The Future will be resolved asynchronously when the lexing has completed - to access the results, call the Future's then method and pass a callback:

futureTokenList.then(function (tokens, error) {
  if (!tokens && error) {
    // falsy tokens means a lexing error, which will be passed through as error
    return;
  }

  // tokens here is an array of Tokens, do whatever you'd like with them
  ...
});

###control flow Future.then() returns a new Future awaiting the resolution of the previous, providing a simple but powerful control flow mechanism - simply return a value from the passed callback to fulfill the pending Future with it:

var lexTask = lexer.lex('(abc1234)')
  .then(function (tokens, error) {
    if (!tokens && error) {
      // Rethrow runtime errors to pipe them to the next step
      throw error;
    }

    // Do something with the tokens - we'll filter out whitespace
    return tokens.filter(function (token) {
      return token.type !== 'WHITESPACE';
    });
  })
  .then(function (filteredTokens, error) {
    if (!filteredTokens && error) {
      // For custom error handling use significant return values
      // rather than rethrowing
      return -1;
    }

    var result = new Future();
    setTimeout(function () { result.fulfill(filteredTokens); }, 1000);

    // If a Future is returned from a callback, the current
    // execution flow is paused until that Future is resolved
    return result;
  })
  .then(function (filteredTokens, error) {
    // This will execute 1000ms after the previous step returned
    if (!filteredTokens && error) {
      // If error is passed, we know a runtime error occurred
      // during the previous step, since we didn't rethrow it
      // in the previous callback
      throw error;
    }

    if (filteredTokens === -1) {
      // Handle the significant return value from the previous
      // control step as an error
      throw new Error('An unknown error occurred.');
    }

    // continue to do something with the tokens...
  });

##API coming soon

About

a very simple lexer

License:MIT License


Languages

Language:JavaScript 100.0%