z7zmey / php-parser

PHP parser written in Go

Home Page:https://php-parser.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scanner doesn't support \x80-\xff identifiers

ganlvtech opened this issue · comments

commented

On dev branch.

varname = /[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/;

There is \x7f-\xff in .rl file.

php-parser/scanner/scanner.go

Lines 4932 to 4952 in a7e37ad

st128:
if (lex.p)++; (lex.p) == (lex.pe) {
goto _test_eof128
}
st_case_128:
if lex.data[(lex.p)] == 95 {
goto st128
}
switch {
case lex.data[(lex.p)] < 65:
if 48 <= lex.data[(lex.p)] && lex.data[(lex.p)] <= 57 {
goto st128
}
case lex.data[(lex.p)] > 90:
if 97 <= lex.data[(lex.p)] && lex.data[(lex.p)] <= 122 {
goto st128
}
default:
goto st128
}
goto tr260

But there isn't code like

if lex.data[(lex.p)] >= 127 {
	goto st128
}

By the way, what I found on php.net says, as a regular expression, it would be expressed thus: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$. See php variables basics.

Test script:

main.go

package main

import (
	"fmt"
	"os"

	"github.com/z7zmey/php-parser/php5"
	"github.com/z7zmey/php-parser/visitor"
)

func main() {
	bytes := []byte("<?php $\x80=1;echo $\x80;")

	parser := php5.NewParser(bytes)
	parser.Parse()

	for _, e := range parser.GetErrors() {
		fmt.Println(e, e.Pos)
	}

	visitor := visitor.Dumper{
		Writer: os.Stdout,
		Indent: "",
	}

	rootNode := parser.GetRootNode()
	rootNode.Walk(&visitor)
}

test.php

Thank you for the pull request. I have also added a test for this case.