antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Home Page:http://antlr.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[TorqueScript: C++]Old scripting language conversion

marauder2k7 opened this issue · comments

I am currently converting torquescript to use antlr instead of bison-flex

Almost everything is working well so far:

The original scripting language had functions built in for returning a specific token based on certain criteria and i am having trouble figuring out how to do this as they will require different parser rules.

i have this ID token

ID : LETTER IDTAIL*;

in the original code it would call into a function with this:

{ID}        { return Sc_ScanIdent(); }

static int Sc_ScanIdent()
{
   ConsoleBaseType *type;

   CMDtext[CMDleng] = 0;

   if((type = ConsoleBaseType::getTypeByName(CMDtext)) != NULL)
   {
      /* It's a type */
      CMDlval.i = MakeToken< int >( type->getTypeID(), lineIndex );
      return TYPEIDENT;
   }

   /* It's an identifier */
   CMDlval.s = MakeToken< StringTableEntry >( StringTable->insert(CMDtext), lineIndex );
   return IDENT;
}

IDENT and TYPEIDENT have different parser rules for them so was curious if this type of behavior can be replicated?

target C++

This isn't handled well in Antlr because the lexer is independent from the parse. This is why I've been advocating parser-state aware lexers for Antlr5. This is an example of why this may be useful.

The alternative is to parse the input twice, once to collect definitions in a symbol table, the second time to reference the symbol table during the parse.

no worries, thanks for your reply, this is really something we need since we add a lot of tokens at runtime. Will keep a watchful eye on antlr5 though.