hhvm / hhast

Mutable AST library for Hack with linting and code migrations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ RFC ] What changes would you like to make to the hhast-lint.json format?

lexidor opened this issue · comments

I'd like to rethink the hhast-lint.json config settings, since there are some short comings.
Here is just a short list of properties that I'd like to have, in no particular order.
This is not an issue that is meant to be implemented; it is meant to start a discussion.

Updating HHAST should not cause more linters to be activated (causing failing travis builds).

When upgrading hhast versions, linters that have been written for that version are automatically turned on. This makes upgrading hhast (which you are sometimes required to do, because of hhvm AST changes) a more painful process than it should be.

There should be more granularity in the built-in linters than 'none' | 'default' | 'all'

Currently all, doesn't mean all linters.
The only way to know that you are missing these linters is by digging into hhast and finding little gems like NoEmptyStatementsLinter or FinalOrAbstractClassLinter.
They might have been left out of the all option (NON_DEFAULT_LINTERS in the source) by mistake.

The default option currently excludes 8 linters, but it is not clear what makes a linter non-default.

The none option is made for people who require a stable output from their linters and who do not want and upgrade to introduce more linters that their code will fail under.
These people will list out every single linter they care about, one by one, and they will not be made aware of new and exiting linters.

A hhast-lint.json file that is valid in version B should be valid for version A, even if linters that are declared in the hhast-lint.json file are not present.

Currently, when you name a linter that does not exist, you get this error:
Expected type 'Facebook\HHAST\BaseLinter', got type 'string'
This stops HHAST dead in its tracks, since it is an uncaught exception.
I'd like for HHAST to report that linter X could not be found and continue linting with the rest of the linters.

I would be nice to be able to refer to groups of linters, instead of individual linters.

I'd like to be able to define Lexidor's pedantic linter pack: 1.4, Fully autofixable linters: *, Code style linters: 0.8 and have that be recognized by hhast.

I would like to introduce a new hhast-lint.lock file.
This would allow us to:

  • Add new linters that would be enabled under the hhast-lint.json config, without having them pop-up upon upgrading hhast.
  • Break BC in the hhast-lint.json format, without bricking every travis job, since hhast-lint.lock would still work under the newer version of hhast.
  • Introduce a hhast-lint-update script, which will tell you which changes it is about to make.
commented

Updating HHAST should not cause more linters to be activated (causing failing travis builds).

I like this suggestion. Perhaps we could tag each new linter with the HHAST version in which it was added (or really any incremental number, since the HHAST version might not be known at the time a linter is added), then people could specify max linter version in their config. That way people could update HHAST itself without causing new lint errors, then later bump the linter version number if/when they're ready.

That said, it's probably not too hard to update HHAST + immediately add any new linters to disabledLinters until they're ready to deal with the errors.

There should be more granularity in the built-in linters than 'none' | 'default' | 'all'
I would be nice to be able to refer to groups of linters, instead of individual linters.

I'm not opposed to this but I can't imagine how we'd decide which linter goes in which group (like you said, right now it's unclear what should be default, if we have more groups it will become even more confusing).

I wonder if we should just have "all". I expect most people would have 1--2 linters they don't like but they are unlikely to be the same ones for everyone, so everyone'd just use disableLinters? (+ maybe the version number)

This discussion should be tied to (not merged with) #279 :)

Updating HHAST should not cause more linters to be activated (causing failing travis builds).

I strongly disagree with this for all; default, I need to think about some more - my initial stance was that we match HHVM versions, and this would hold us to a higher standard than HHVM and Hack, which doesn't feel appropriate.

One of the proposed alternatives to 'all' and 'default' could be better

it is not clear what makes a linter non-default.

Definitely; initially, it was "is something like this needed for FB projects", but it's drifted and pretty arbitrary now.

Currently all, doesn't mean all linters.

I believe this is currently always a bug; the exception I expect at some point is completely contradictory linters - e.g. "always use \ , even in XHP" and "Use : for XHP wherever possible"

There should be more granularity in the built-in linters than 'none' | 'default' | 'all'

Perhaps get rid of default, add all-x.yy ?

The none option is made for people who require a stable output from their linters

This is a valid use case for it, but FWIW, it's made for people who want to gradually start linting their codebase, moving to all or default over time.

I would be nice to be able to refer to groups of linters, instead of individual linters.

#13 :)

That said, now the AST's moving again, I'm not sure how viable third-party repositories of linters are, and this impacts the 'default' vs 'all' discussion:

if we strongly disagree with the intent of a linter, but know some people really want it, it's well-written, should we accept or reject the pull request? Should it be included in whatever our "recommended" set is, whether that's "default" or "all-x.yy" ? Practicality of third-party repositories does change this, but the need for new codegen can be a pain.

I'm going to add another one of these:

Should HHAST be split up in to hhast-lib,hhast-tools, and hhast-linters projects?

  • keep hhast-lib versioning matching HHVM versions
  • hhast-tools and hhast-linters could use SEMVER
  • hhast-tools contains the executables and linting framework, but no linters
  • changes that only affect the codegen - but not the majority of linters - would only affect hhast-lib
  • change the way config works so that hhast-linters don't have any special-casing - third-party linters are on truly equal footing
  • projects could depend on 'any hhast-lib that is compatible with my hhvm', hast-tools a.b, and hhast-linteres c.d

There's a risk of that being substantially more maintenance work, but I think that's my gut reaction - thinking through the last few months, I don't think that would actually be true in practice - @jjergus ?

We could continue publishing an 'hhast' package, that would just depend on all 3, but contain no actual code

More specifically:

  • hhvm/hhast-lib
  • hhvm/hhast-tools
  • facebook/hhast-linters

... and none of the version numbers - even majors - need to match.

In that world:

  • schema changes
    • new hhvm/hhast-lib release
  • we change what is codegened from the schema
    • new hhvm/hhast-lib release
    • probably not a new hhvm/hhast-tools release
    • maybe new releases of any linter packages, depending on what nodes/features they use.
  • we add a new default/all linter
    • new major release of facebook/hhast-linters
    • nothing else
  • we change what is acceptable in hhast-lint.json
    • new major release of hhvm/hhast-tools
    • nothing else

@xixixao also likely has some input here, and #279 more generally (and has several other issues that are related to #279)

So, regardless, we're going to end up with some concept of "we recommend this set/configuration for new projects if you don't want to whitelist specific ones"; it's basically the same problem whether that means:

  • 'all' vs 'default'
  • in facebook/hhast-linters vs "put it in a non-FB repo"

I don't think there's a good answer here: linters are not for soundness, they are for enforcing stylistic choices, which are fundamentally subjective. Even the ones that exist pretty much to highlight common mistakes have a counterpoint of "(I know what I'm doing|we idiomatically use them in a safe way), and we like writing code that way".

The FB internal approach is "anyone can turn it on, and if there's too much push-back it gets disabled/limited", which won't work here.

If we split out the linters to a separate package under facebook/ , perhaps "this matches FB coding standards and would be an appropriate linter at FB" is the correct separator; problems are:

  • that doesn't feel appropriate while this is a monorepo
  • what about the other ones that are already here? One option (that I don't really like, but can't think of anythign better) would be:
    • no new ones that don't enforce part of FB coding styles - they need to go to third-party repos
    • existing ones are 'grandfathered in', but not enabled by default - need an explicit opt-in to get these enabled
    • if someone stops being in FB coding styles, it joins 'off-by-default' like the current ones

would be an appropriate linter at FB

This has non-style factors, such as how much 'good' code it complains about, and whether it's autofixable

commented

There's a risk of that being substantially more maintenance work, but I think that's my gut reaction - thinking through the last few months, I don't think that would actually be true in practice - @jjergus ?

There's definitely some extra maintenance work for each project right now (e.g. an AST change in HHVM requires new HHAST release + potentially a new release of anything else that depends on HHAST like definition-finder, which would now include hhast-tools and hhast-linter).

If we proceed with the plan to merge some projects into a "monorepo" that we've been discussing, that would make it easier (assuming we have good automation to publish a release of each project from the monorepo).

Fair - I"m hoping that "just update hhast-lib without enabling more linters" in downstream projects would offset that

Splitting hhast up into linters, tooling, and core would be the right thing to do for projects that want to depend on hhast.
Most of those libraries that rely on hhast want access to the AST, but not to the linters.

I am torn on the maintainability question.
If we decide in favor of this, it becomes painful to add methods onto Node classes that you want to use in a new linter.
ExampleQualifiedName::isFullyQualified() would require a release of hhast-lib, and hhast-linters would require that fresh hhast-lib version.

This discourages adding things like this into hhast-lib and may lead to people placing functions like this as free standing functions into hhast-linters instead to reduce dependency hell.

One thing I would like to note, since HHAST liners are a "tool" ( not a library to use or build upon ), its okay to have internal BC-breaks, such as changing class names or architecture structure. however, this shouldn't affect projects that disable linters.

therefore, I suggest that each linter now implements a new method getName() and hhast recommend using that in the configuration files when referring to a linter instead of FQN.

Some form of name spacing is desired so that third party linters can be created independently of each other without risk of conflict with other linters, bundles or not.

Having this be distinct to the FQN feels like solving the same problem twice

For conciseness, it does support a configuration equivalent to “use namespace”

Some form of namespacing is desired so that third party linters can be created independently of each other without the risk of conflict with other linters, bundles or not.

the linter name itself can be prefixed, e.g: facebook_async_gen_prefix, nuxed_camel_case_enum_members.

with official linters ( e.i ones shipped with hhast ), not using prefix, e.g: await_in_a_loop

Also, it would be nice to have to ability to disable a linter within a linter.

e.g: CamelCaseEnumMemberssLinter::getConflicts returns a vector of linters classes that should be disabled ( e.g shout case enum members linter )

with official linters ( e.i ones shipped with hhast ), not using prefix, e.g: await_in_a_loop

The intent is for bundled linters to be less special than they currently are; unprefixed is done for suppression as the odds of collision on the same line/node are somewhat low

Also, it would be nice to have to ability to disable a linter within a linter.

Pretty strongly against this: I do not think users would expect installing a library to disable linters they currently have enabled.

it is possible for hhast to inform the users that 2 linters conflicts with each other and the user can decide on which one to use ( hhast can update the configuration file itself, and not run until the conflict is resolved ).

making things harder for end-users is not a good thing.

and having third party libraries reimplement their own runner to provide a better DX, is also not the best idea.

Inversely, adding linters can not be fully automatic, and will need an explicit opt-in for security: installing a library should not lead to code execution without any extra steps.

This also applies to HHAST itself, which is why the vscode (and previously atom) plugins prompt the user before executing it.

I do not think users would expect installing a library to disable linters they currently have enabled.

I don't think so.

when installing a set of linters that ensures enum members use camel case over shout case, I would expect it to disable the shout case linter.

Conflicts and a resolution flow would be OK, as long as it’s pure data (e.g in vendor/foo/bar/hhast-lint.json), no code execution or configuration changes until user says yes

I do not think users would expect installing a library to disable linters they currently have enabled.

I don't think so.

when installing a set of linters that ensures enum members use camel case over shout case, I would expect it to disable the shout case linter.

Say a library I make gets taken over by someone malicious, and they add a fake linter that’s actually a Trojan but says it’s a naming convention change linter. Code can not automatically tell the difference between this and a real one.

I would agree for user expectations when explicitly installing a conflicting linter, but that can’t be distinguished from malicious cases - or changing the project naming linter because of a non-linting libraries preference.

(While yes, updating a library does change the code you’re executing, opening an editor to review those changes does not automatically execute them - when HHAST’s involved, it does)

Say a library I make gets taken over by someone malicious

in this case, being able to disable or enable linters should be the latest concern, libraries can define scripts to run once they are installed in composer.json, or can simply add code to a partial hack file:

<?hh // partial

namespace Foo;

class SomeLibrary {

}


function hack_it(): void { }

hack_it();

or even better, can replace your library's binary ( vendor/hhvm/hhast/bin/hhast ) with their own.


IMO, the best way to go about this is as follow:

  • allow linters to specify other linters that they conflict with with getConflicts()

  • post-install/post-update or while starting, hhast checks for all conflicts, and if any are found, promote the user with a message about the conflict to choose which one to run:

$ composer update

// updates

[HHAST] The following linters conflict with each other:

- [1] `FooBarLinter` 
- [2] `BazQuxLinter`

Please choose which one to use: |

After the user has chosen, hhast should update the configuration file to fix the conflict once and for all.

in this case, being able to disable or enable linters should be the latest concern, libraries can define scripts to run once they are installed in composer.json

This feature will almost certainly not be supported in whatever we eventually replace composer with. A potential security issue elsewhere does not make knowingly introducing another one OK.

pseudomains

Only an issue of code from that file is executed. Pseudomains are also going away

conflicts

Again fine with this existing as long as it is purely data-based, with all code for resolving them being in HHAST itself - or at least HHAST prompting user before any third-party code is executed

Not tested right now, but at least as of 2018 composer only runs post install scripts for top level projects, and didn’t for dependencies for similar security concerns - composer/composer#1193

there's always a workaround.

it's possible to turn your stolen library into a composer plugin and hijack composer downloader ( https://github.com/symfony/flex/ ), or hijack composer autoloader ( https://github.com/ircmaxell/PhpGenerics ).

commented

Back to the original question, I'd add:

  • per-linter configuration
    • we could just have EnumCaseLinter where you specify camel/shout/whatever as an option
    • indentation linter where you specify how many spaces/tabs
    • came up in #298 ("ban newline as first line of a block" -- which kinds of blocks? classes/functions/statements)

There should be more granularity in the built-in linters than 'none' | 'default' | 'all'

Coming back to here because of hh_client --lint (cc @Atry)

  • I think 'all' should be entirely removed (or, officially deprecate it and log a warning when used)
  • we should split 'default' into:
    • 'definitely an error, but technically valid in the type system', e.g. 'condition is always false', 'unreachable'. In some cases, these literally used to be type errors
    • 'code smell', e.g. await in a loop
    • 'fb-style' or 'hhast-style': stuff that isn't really 'bad', just not the way we want people to do things in FB-owned projects, e.g. "DontHaveTwoEmptyLinesInARowLinter"

'default' is currently pretty close to error + codesmell + some fb-style; I'm not sure if that's a good default, but if so, it should be explicit. 'error + codesmell' would be a more objective default, but less useful. I'm not sure what's best.

There should be additional style linters that are not part of fb-style, but it shouldn't be possible to enable 'all': style linters can reasonably have exactly opposite viewpoints/recommendations.

'all' should be removed: it exists because of the incorrect assumption that projects would actually want to enable all linters - or, more specifically, that FB projects would. This implies that HHAST shouldn't merge linters that FB projects don't want. This is my mistake in the original design, we should be more open.

hh_client --lint makes this a bit more pressing, as it reports stuff in all of these categories.

I also philosophically don't like the idea of a 'fb-style' built-in, but:

  • as a practicality, it's what HHAST will use, and it's a good starting point for others
  • BasedOnStyle: LLVM|Google|WebKit|Microsoft|GNU in clang-format feels similar and is definitely useful even for projects not based on any of them
    I also don't like th

default etc: those could also be versioned (as discussed previously), optionally, and it would be nice to be able to specify different constraints. For example, I might want:

  • errors: latest
  • smells: 4.123
  • fb-style: 4.123

Why do we configure linters in a JSON file? Shall we allow the users to create their own functions to select lint rules, for example, a Hack file as the configuration, similar to PAC?

Sure, though that's a bit tangental; regardless of how they write the configuration, we should provide better groupings and versioning data.

Historically:

  • there's not been a need for the level of flexibility that code provides
  • it would either require unsafe dynamic calls to potentially undefined functions that we intend to make impossible in HHVM, or installing the library would have to introduce a type error due to a missing definition
  • it's a bit awkward to design: for example, we probably don't want to ask users to extend a config class - as then they'll need to exclude their hhast config from production builds or make hhast a real dependency rather than a dev dependency

Specifically JSON over other formats like JSONC, YAML, TOML: HHVM has very few built-in data formats. There's serialize(), json, and INI files (if you want exactly PHP's semantics for INI files). JSON seems like the clear winner there; there also weren't really other viable options in the ecosystem.

it's a bit awkward to design: for example, we probably don't want to ask users to extend a config class - as then they'll need to exclude their hhast config from production builds or make hhast a real dependency rather than a dev dependency

no need to, a neat solution would be is to introduce a new binary, call it "hhast-installer", executing this binary will create a new binary in your desired directory ( default to tools/hhast, or bin/hhast ) with the following content:

#!/usr/bin/hhvm
namespace Tools;

use namespace HHAST;

<<__EntryPoint>>
async function main(): Awaitable<noreturn> {
  $config = HHAST\Config::create()
    ->withFoo()
    ->withBar('bar')
  ;

  await HHAST\Application::run($config);

  exit(0);
}

users can then edit that newly created binary to customize HHAST, and execute it instead of vendor/bin/hhast.


Libraries would need to add this generated file to their .gitattribute as /tools/hhast export-ignore ( the installer can do this automatically if the current directory is a git repository, after asking the user permission ).

libraries would need to add this generated file to their .gitattribute as /tools/hhast export-ignore

This doesn't solve the problem for top-level projects (e.g. executables or websites); it would also need ignoring in .hhconfig, removing a lot of the safety

This doesn't solve the problem for top-level projects

On the other hand, the same's true for tests/ dev-depends on hacktest and fbexpect, and that's somethign we all live with.

the same's true for tests/ dev-depends on hacktest

was gonna say that, it should be dealt with the same way people deal with tests/ and other files, strip them from production build.

  • it would either require unsafe dynamic calls to potentially undefined functions that we intend to make impossible in HHVM, or installing the library would have to introduce a type error due to a missing definition

I don't understand this. Linting is simply another type of tests that can be executed by the IDE continuously. If vendor/bin/hacktest is able to execute tests written in Hack, why cannot vendor/bin/hhast_lint execute lint configurations written in Hack?

also, it would be nice to make input and output handle arguments for run functions, something like:

$input = IO\input_handle();
$output = IO\output_handle();

$application = new HHAST\Application($config);

await $application->run($input, $output);

This was it would be possible to test the output of hhast in hhast itself, using MemoryHandle as input/output.

I don't understand this. Linting is simply another type of tests that can be executed by the IDE continuously. If vendor/bin/hacktest is able to execute tests written in Hack, why cannot vendor/bin/hhast_lint execute lint configurations written in Hack?

if it's a generated entrypoint like @azjezz suggests, it's not an issue

Otherwise the difference would be whether it's "execute all lint configs you find" vs "there must be exactly one"

also, it would be nice to make input and output handle arguments for run functions

This is already supported if the LinterCLI is directly constructed instead of using the runAsync() helper

I don't understand this. Linting is simply another type of tests that can be executed by the IDE continuously. If vendor/bin/hacktest is able to execute tests written in Hack, why cannot vendor/bin/hhast_lint execute lint configurations written in Hack?

if it's a generated entrypoint like @azjezz suggests, it's not an issue

Otherwise the difference would be whether it's "execute all lint configs you find" vs "there must be exactly one"

For manual execution of vendor/bin/hhast_lint, I would propose to let it always require the user to specify the linter configuration, like what hacktest does. For IDE executed linters, it could prompt something like: "No linter configuration is found at tests/linters/LintConfig.hack. Do you want to create it? Yes/No"

I would propose to let it always require the user to specify the linter configuration, like what hacktest does.

That IMHO, is annoying, i would rather run ./tools/hhast than ./vendor/bin/hhast -c config/hhast.hack or similar, if hhast would to go with the "hhast-installer" method, i think it would be a good idea to do the same for other tools such as hacktest ( hacktest-installer ), so it can be something like a standard.

Linting is simply another type of tests that can be executed by the IDE continuousl

I don't think this is the right model; I don't think you're concretely suggesting that lint configs extend HackTest, but IMO if they go in tests/ and we think of them like unit tests, they should.

However, they are not automatically-ran unit tests in IDE mode: HHAST is a full language server, a long-running process with request/response communication with the IDE. It's closer to hh_client

I also consider the need to pass a path to hacktest to be a TODO item that I've not got around to yet: it shouldn't be necessary, and there should be a project configuration with a single canonical definition of what does it mean for the project to have a clean test run - it shouldn't depend on each caller (human or script) passing the right arguments.

Similarly, I don't want different behavior for IDE and CLI users; these should generally behave identically, and also the same as CI (which is largely CLI, but with the arguments being fixed).

--

Also, I'm going to ask for github discussions to be enabled in this repository, then we need to split this up a bit :) I think there's three main questions, and we're trying to discuss them all in one thread:

  • what 'bundles' of linters should we have?
    • do we need versioning?
    • can third parties define their own bundle?
  • should third party libraries be able to modify built-in bundles?
  • how should users express their configuration?

I'll withhold my input until discussions are enabled. This whole thing went by pretty fast and I can't quite follow along.

  • #423
  • #425
  • #426 (I previously misphrased this as "should third party libraries be able to modify built-in bundles?")
  • Split up to -tools, -core, -schema: we've discussed this relatively recently, and for now, the maintenance burden is pretty high. There are no plans to change it. Feel free to file a separate discussion if this is something you want to re-raise