nkrth / LexLuthor

Lexer in C language using DFA state machine (unlike the one used in gcc)

Home Page:https://www.google.com/search?q=lexical+analyzer+using+dfa

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LexLuthor

Lexer in C language using DFA state machine (unlike the one used in gcc)

The current lexer.c is a hardcoded one. Another one called lex.c will be made shortly has been made : it cleanly shows a lexer using DFA in C. Hope this helps you.

The objective here is to implement a lexer using state machines (believeably, discrete finite automata). The lexical analysis must yield a listing of keywords and identifiers. We are allowed to have a hard array of keywords. For this, I am unable to find any tokens.h or keywords.h library containing such an in-built list. So, I'll have to list the keywords of each language to analyse by hand. This is tedious and obviously not the preferred way of doing this sort of thing. The tokenizer has to made with the logic that keyword and identifiers are character followed by a digit or character and so on.

\>> Lots left to do. Gah! <<

(25/09) One idea I have is to provide some visualizations of the number of tokens identified. Contrast the word classifcations of the contents of each sample.txt input. We obviously need to work on actual file handling first.

[tba wiki] Yesterday, my friend Reshma Roy gave a talk on Lexical Analysis of English and Vietnamese French. This has opened my eyes to the possibility that beyond programming languages, a lexer has application in everyday linguistic analysis. Perhaps, using word embeddings (with word2vec) and such, we would be able to expand the LexLuthor to be a very general lexer and word classifier. As of now, it only accepts C programs, mostly C99, and displays which words are keywords, numbers, identifiers, and relational/logical operators.

This has been a good one.

How to compile and run C programs

$ gcc lexer.c
$ ./a.out

Cannibalizing the program

$ cp lexer.c sample.txt
$ ./a.out

Running Lex programs

Lex programs have extension .l

lex prgm.l
gcc lex.yy.c
./a.out

I don't know. This is what Hareesh sir wrote on the board. I haven't gotten to the lex language part yet.

I'm still struggling to construct the required FSM to identify tokens. So, the first step is to think about just operators. Let's say relational operators like ==, !=, <, >, <=, =>. Maybe using switch case and again if-statements, I could identify those but I'm not sure how all the pieces are going to fit together. So better to draw the diagram...I'm sleepy...ZZzzzz...

Things I ought to refer but haven't yet because I am me and hello, nice to meet you. Do you like ring-a-ring-a-roses pockets full of spiders?

Related Projects on Github

Do tell me if you find anything... a simple lexeme categorizer doesn't need ML, does it?

Credits

Myself, Rithu Augustine, Rinu Sibi Kurian, Noora Fathima, Rohit Venugopal, Bob Kane.


Watch, Star, Fork, Pull. Contribute to Free and Open Source.
Support FOSS, Linus Torvalds and Richard Stallman!
Fork this repo today!


About

Lexer in C language using DFA state machine (unlike the one used in gcc)

https://www.google.com/search?q=lexical+analyzer+using+dfa

License:MIT License


Languages

Language:C 95.2%Language:Lex 4.8%