erkobridee / ds-ast

DS ( Data Structure ) AST ( Abstract Syntax Tree )

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ds-ast

DS ( Data Structure ) AST (Abstract Syntax Tree)

Implementations

Definitions

Lexer

A lexer transforms a sequence of characters into a sequence of tokens.

A compiler lexer is a crucial component in the compilation process of a programming language. It is responsible for breaking down the source code into smaller, meaningful units called tokens or lexemes. These tokens are then fed into the parser, which constructs the abstract syntax tree (AST) of the program.

enum TokenType {
  ...
}

interface TokenPosition {
  /** line */
  ln: number;
  /** column */
  col: number;
}

interface Token {
  type: TokenType;
  lexeme: string;

  /** it's useful for debugging proprose later one for a given programming language */
  position?: TokenPosition;
}

type LexerFn = (input: string) => Token[];

Key Features of a Compiler Lexer

  • Tokenization: The lexer converts the source code into tokens, which are the smallest syntactic units of the language. These tokens can be identifiers, keywords, literals, operators, or other special characters.

  • Regular Expressions: The lexer uses regular expressions to define the patterns for identifying these tokens. This approach allows for efficient and flexible token recognition.

  • State Transition Table: The lexer can be implemented using a state transition table, which is a table-driven approach that directly jumps to follow-up states via goto statements. This approach can produce faster lexers than hand-coded ones.

Parser

A parser is a software component that takes input data (typically text) and builds a data structure, often a parse tree or abstract syntax tree (AST), giving a structural representation of the input while checking for correct syntax. It is a crucial part of the compilation process, particularly in compiler design.

RDP - Recursive Descent Parser

AST

An Abstract Syntax Tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree-like representation of the source code, abstracting away the syntax and semantics of the programming language. The AST is designed to preserve essential information such as variable types, the location of each declaration, the order of executable statements, left and right components of binary operations, and identifiers and their assigned values.

Tools

References

Theory

The Language of languages

BNF - Backus-Naur Form

EBNF - Extended Backus-Naur Form

Compiler

Lexer

Parser

AST

Implementation

ANTLR - ANother Tool for Language Recognition

About

DS ( Data Structure ) AST ( Abstract Syntax Tree )

License:MIT License


Languages

Language:TypeScript 100.0%