Lexical-Analyzer in C++
Introduction
Lexical Analysis is the first phase of compiler used for specifying patter-action language
called Lex
. Its main task is to read the input character and produce as output a sequence of tokens
that the parser uses for syntax analysis.
In this project, Lexical Analyzer reads the input from the input.txt
file and display all the tokens in input file in the Output
File.
Tokens, Patterns & Lexemes
Tokens
are the sequence of characters that can be treated as a unit/single logical entity.
Pattern
are the set of rules for formation of tokens from input character.
Lexemes
are a sequence of char in source program matched by a pattern for a token.
e.g., Pascal statement const pi = 3.1416; the substring pi is a lexeme from the token “identifier”
When talking about lexical analysis, we use the terms "token," "pattern," and "lexeme" with specific meanings.
Examples of Tokens
Task of Lexical Analyzer
- Generating a sequence of tokens.
- Striping out comments and whitespace.
- Making copy of source program with error message marked in it.
Assumptions
Assumptions, that I have made while writting code for lexical analyzer in C++ are:
- Keyword:
int
cin
cout
- Special Symbol:
;
,
{
}
(
)
- Operators:
+
=
>>
<<
- Identifies:
Single character
or sequence ofletters followed by letters or digits
likesum
A
B
C
. - Pre-processor Directives:
include
- Library:
iostream
Note: This is a Lexical Analyzer for a
sum of two numbers program in C++
. It may not be able to generate tokens for all other programs.