z7zmey / php-parser

PHP parser written in Go

Home Page:https://php-parser.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some lexer errors are printed to stderr

quasilyte opened this issue · comments

If underlying lexer encounters input error, it uses defaultError handling function that prints an error to stderr.

func (l *Lexer) defaultErrorf(pos token.Pos, msg string) {
	l.Error(fmt.Sprintf("%v: %v", l.File.Position(pos), msg))
}

// Error Implements yyLexer[2] by printing the msg to stderr.
func (l *Lexer) Error(msg string) {
	fmt.Fprintf(os.Stderr, "%s\n", msg)
}

On a large codebase it sometimes leads to this message:

unicode (UTF-8) BOM in middle of file

There is no way to control it and this is the problem.

I'm proposing a change (example below) that will register our own error handling function that will push lex error to the high-level lexer errors list. This way, errors can propagate and be handled without stderr pollution.

diff --git a/scanner/lexer.go b/scanner/lexer.go
index 52d47c7..d54b8bc 100644
--- a/scanner/lexer.go
+++ b/scanner/lexer.go
@@ -4,6 +4,7 @@ package scanner
 import (
        "bufio"
        "bytes"
+       "go/token"
        t "go/token"
        "io"
        "unicode"
@@ -62,23 +63,32 @@ func Rune2Class(r rune) int {
        return classOther
 }
 
+func (l *Lexer) lexErrorFunc(p token.Pos, msg string) {
+       pos := position.NewPosition(
+               l.File.Line(p),
+               l.File.Line(p),
+               int(p),
+               int(p),
+       )
+       l.Errors = append(l.Errors, errors.NewError(msg, pos))
+}
+
 // NewLexer the Lexer constructor
 func NewLexer(src io.Reader, fName string) *Lexer {
+       lexer := &Lexer{
+               StateStack:    []int{0},
+               tokenBytesBuf: &bytes.Buffer{},
+               TokenPool:     &TokenPool{},
+       }
+
        file := t.NewFileSet().AddFile(fName, -1, 1<<31-3)
-       lx, err := lex.New(file, bufio.NewReader(src), lex.RuneClass(Rune2Class))
+       lx, err := lex.New(file, bufio.NewReader(src), lex.RuneClass(Rune2Class), lex.ErrorFunc(lexer.lexErrorFunc))
        if err != nil {
                panic(err)
        }
+       lexer.Lexer = lx
 
-       return &Lexer{
-               Lexer:         lx,
-               StateStack:    []int{0},
-               PhpDocComment: "",
-               FreeFloating:  nil,
-               heredocLabel:  "",
-               tokenBytesBuf: &bytes.Buffer{},
-               TokenPool:     &TokenPool{},
-       }
+       return lexer
 }
 
 func (l *Lexer) Error(msg string) {

You right, errors of the lexer should be saved into lexer.Errors.
Thank you for the proposal, I have used it as is.
Also, I have covered this case by the test.