z7zmey / php-parser

PHP parser written in Go

Home Page:https://php-parser.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong line number for multiline strings with variables

quasilyte opened this issue · comments

I believe there is a regression bug introduced in 2c64915 e4a208e (#93) that is still present on master branch.

Problematic code example:

$foo = "test
	$var";

That 2-line literal would be scanned as 3-line literal, its StartLine=1 and EndLine=3.

I believe this happens because we emit newlines twice:

  1. Once for the constant_string_new_line actions.
  2. And another time for newline.

We probably want to avoid adding redundant newline in either of these places.

I tried to come up with a good solution today but my knowledge of Ragel limited.

Full Reproducer (in form of test)
// add this test to node/scalar/t_encapsed_test.go and run it.
func TestEncapsedMultiline(t *testing.T) {
	src := `<? "test
	$var";`

	expected := &node.Root{
		Position: &position.Position{
			StartLine: 1,
			EndLine:   2,
			StartPos:  3,
			EndPos:    16,
		},
		Stmts: []node.Node{
			&stmt.Expression{
				Position: &position.Position{
					StartLine: 1,
					EndLine:   2,
					StartPos:  3,
					EndPos:    16,
				},
				Expr: &scalar.Encapsed{
					Position: &position.Position{
						StartLine: 1,
						EndLine:   2,
						StartPos:  3,
						EndPos:    15,
					},
					Parts: []node.Node{
						&scalar.EncapsedStringPart{
							Position: &position.Position{
								StartLine: 1,
								EndLine:   2,
								StartPos:  4,
								EndPos:    10,
							},
							Value: "test\n\t",
						},
						&expr.Variable{
							Position: &position.Position{
								StartLine: 2,
								EndLine:   2,
								StartPos:  10,
								EndPos:    14,
							},
							VarName: &node.Identifier{
								Position: &position.Position{
									StartLine: 2,
									EndLine:   2,
									StartPos:  10,
									EndPos:    14,
								},
								Value: "var",
							},
						},
					},
				},
			},
		},
	}

	php7parser := php7.NewParser([]byte(src), "7.4")
	php7parser.Parse()
	actual := php7parser.GetRootNode()
	assert.DeepEqual(t, expected, actual)

	php5parser := php5.NewParser([]byte(src), "5.6")
	php5parser.Parse()
	actual = php5parser.GetRootNode()
	assert.DeepEqual(t, expected, actual)
}

It seems like this problem described in Maintaining char & line counts in a scanner response:

The only worry is backtracking. If your
scanner patterns backtrack over newlines, then you've got double
counting happening.