lunacookies / eldiro

Learn to make your own programming language with Rust

Home Page:https://lunacookies.github.io/lang/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lexing comments

Nexalam opened this issue · comments

Hi,
nice project and tutorial series!

I have a question regarding your 14th part of the tutorial: Is there any future benefit to have an explicit comment token? Without doc comments etc. in mind it would be easier to filter comments with a simple logos::skip like:

pub(crate) enum SyntaxKind {
// snip

    #[error]
    #[regex("#.*"), logos::skip)] // added here because Error is some kind of fallback token
    Error,

// snip
}

Less code, easier to understand, possibly faster and a simpler parser tree. The only drawback is that it is not possible to parse the content of comments in the future.

Yes, there is! Ideally Eldiro would eventually get tooling built for it, e.g. a language server. Any feature that has to interact with the source file directly (e.g. automatic refactorings or ‘expand selection’) needs to have information about every single character in the source. To accomplish this, the parser has to be lossless, meaning that its output represents the input fully. Although omitting whitespace, comments and other trivia from the parser output would make it more concise and easier to work with, it makes implementing those features that need to interact with the source text much more difficult.

Someone else has also asked this question on Reddit, so I think I’ll add an explanation to the post.

I’ve added an explanation to the website.

Can I close this issue?

Sorry for the late answer, you can close this issue, question answered :)
Thank you very much!

Glad I could help :)