ruff check --fix modifies files with SyntaxError

Question

ruff check --fix modifies files with SyntaxError

Dobatymo opened this issue 22 days ago · comments

When running ruff format ruff will not modify files if it encounters a SyntaxError (which is expected I think). However running ruff check --fix will encounter the SyntaxError as well, but will continue to "fix" and modify the files. I don't know if this is expected, but I find that behaviour surprising.

List of keywords you searched for before creating this issue: fix, syntaxerror

Code snippet:

typedef struct _STORAGE_DEVICE_NUMBER_EX {
    ("PartitionNumber", ULONG),
} STORAGE_DEVICE_NUMBER_EX, *PSTORAGE_DEVICE_NUMBER_EX;

ruff check --fix with remove the ; at the end of the line (which I didn't expect since the file is not actually Python.

Version: ruff 0.4.4 (3e8878a 2024-05-09)

Dhruv Manilawala · Answer 1 · Fri May 17 2024 16:11:24 GMT+0800 (China Standard Time)

Can you provide some more details? How are you running Ruff? Is it from the command-line or is it via an editor? If it is in an editor context, could it be that it's another extension which is removing the semicolon?

Dobatymo · Answer 2 · Fri May 17 2024 16:31:08 GMT+0800 (China Standard Time)

Just using the normal command line. Windows 11. I made sure it must be ruff which modifies the file.

error: Failed to parse syntaxerror.py:1:9: Simple statements must be separated by newlines or semicolons
syntaxerror.py:1:9: E999 SyntaxError: Simple statements must be separated by newlines or semicolons
Found 2 errors (1 fixed, 1 remaining).

That's the output.

Dhruv Manilawala · Answer 3 · Fri May 17 2024 20:19:31 GMT+0800 (China Standard Time)

Ruff uses the file extension to determine whether it's a Python file or not as otherwise it's difficult to reliably determine whether a file contains Python source code or not. It seems that the file name is syntaxerror.py which has the .py extension.

Can you provide us any context as to why do you have a non-Python source code in a Python file?

Dobatymo · Answer 4 · Fri May 17 2024 21:56:01 GMT+0800 (China Standard Time)

I am working on simple C header to Python ctypes translator. And I was simply using ruff format to check if the files I generated are syntactically correct. I know I could have just used the python builtin parser, but having a nicely formatted file when the translation produced a syntactically valid file was a nice side effect.
Then for further investigation if the generated files are correct, I can ruff check --fix instead and was surprised to see that the file was modified even when it wasn't syntactially valid (something ruff format didn't do). So if this is expected behaviour this should be documented, but in my opinion it should not do any modifications (just like ruff format).

Zanie Blue · Answer 5 · Sat May 18 2024 01:56:43 GMT+0800 (China Standard Time)

I think we'll move more towards Ruff working on files with syntax errors. I'd recommend either excluding these files from lint checks or selecting E999 to only check for syntax errors.

Charlie Marsh · Answer 6 · Sun May 19 2024 01:30:40 GMT+0800 (China Standard Time)

I honestly thought we didn't apply fixes when a file contains a syntax error (but looks like I'm wrong). I would be open to changing the behavior, not sure what @dhruvmanila thinks.

Dhruv Manilawala · Answer 7 · Mon May 20 2024 13:24:09 GMT+0800 (China Standard Time)

I think this is the side effect of non-AST based rules which work with tokens, logical lines, etc. And, that Ruff checks all of the tokens up to the first error token. In your case, the code produces a valid token stream and so it checks for violations which uses the token stream. Ruff doesn't know that this isn't a syntactically valid code unless it's processed by the parser.

Then for further investigation if the generated files are correct, I can ruff check --fix instead and was surprised to see that the file was modified even when it wasn't syntactially valid (something ruff format didn't do).

The reason ruff format doesn't do this is because it works with the AST and the code you mentioned is invalid syntactically.

I honestly thought we didn't apply fixes when a file contains a syntax error (but looks like I'm wrong).

This isn't exactly possible today because unless the parser processes the token stream, Ruff can't know whether it contains a syntax error. And, the code provided by the author produces a valid token stream. This will change during this week when we combine the lexing and parsing step and then Ruff can get this knowledge.

I'm not opposed to this idea although this does mean that we'd stop fixing code like the following:

x;

# unterminated f-string
f"hello

In the future, we do want to make Ruff capable of applying a fix even if the source code contains syntax error. This will be done after we expand Ruff's capabilities to allow diagnosing issues in a file containing syntax errors.

td;dr I'm more in favor of documenting this behavior rather than disallowing to generate fixes.

Dobatymo · Answer 8 · Mon May 20 2024 15:11:33 GMT+0800 (China Standard Time)

Better documentation sounds good!

Dhruv Manilawala · Answer 9 · Mon May 20 2024 15:30:38 GMT+0800 (China Standard Time)

This seems like good place for the content to be in: https://docs.astral.sh/ruff/linter/#fixes