spiwn / FactorioApiScraper

A small script that goes through the Factorio API documentation html and exports a json intented to be used by autocomplete extensions such as https://github.com/simonvizzini/vscode-factorio-lua-api-autocomplete

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ideas and suggestions mostly about defines, events and json

JanSharp opened this issue · comments

By accident i found out that if you do

---@type table<string, number>
---@class defines.foo
defines.foo = {
  bar = 0,
  baz = 0,
}

You don't only get intelisense for the fields and the class, but you can also use for k, v in pairs(defines.foo) do end and it knows the types of the keys and values. It might be worth doing since sometimes you do loop through defines.

On that note, note how i used a table constructor. You can save quite a bit of file size using those instead of indexing all the way for every field. It would also make sense for that to load faster, but i have not measured it. I've already tried it and it's not too hard. It just means that the generic writeSingleDefine function doesn't make that much sense anymore, but i've not looked into how neccessary it is to be generic and recursive.

You can also see that i defined a class called defines.foo. This can sometimes also be wanted/needed, in fact the api docs themselves use it sometimes to refer to all values an enum may have.

Also on the topic of defines i don't believe it makes sense to put the event parameter classes in there and name them exactly like the define would be named, though the name is really whatever. Either way i'd like the defines to be purely enums (not talking about defines.prototypes for now) and have the event parameters be in a separete file/location. It just doesn't make logically sense for them to be there. This is a bigger deal for the json format (=> data structure) because when i tried to use that it got more complex than it needed to be trying to get the actual event parameter definitions. I hope this makes sense.

About the json: i'd also move the event stuff out of defines.json, which would come naturally with the previous change. And in general my idea behind the json format would be that it contains as much raw data as possible where using it doesn't require any additional string parsing or data clean up. I believe that is what you have done already, but i just wanted to have it said.

That is all i've run into or noticed so far. Important to note is that these are all my opinions and or wishes and i'm open for discussion about any of them.

The defines need to be worked on - after the move to Lua Doc I just got them to some seemingly working state. What you suggested seems like a good way to represent them.

In the Lua Doc the event parameters are written next to their respective event and the events are part of the defines. In the context.json the parameters are part of the event. This seems sensible.

This is a bigger deal for the json format

Is this the context.json that contains all the data?

it got more complex

I see that the json has some redundant nesting, I might work on that, but otherwise it should be straight forward:

for event in json.defines.defines.events.defines
  for class in event.contains
    print(class.name)

Having the event parameter name be exactly like the event name seems like the best option. I did not see any problems caused by conflicting names, so far. It reduces the clutter in the the autocomplete/intellisense which is already substantial with hundreds of classes and values. Also removes the need for the user to know details about the generated documentation - no need to recall what cryptic name the scraper generates - you already know it and it is right in front of you when you need it - the name of the event, making it easier to annotate your code.

About the json: i'd also move the event stuff out of defines.json

The format for that autocomplete extension can only have those two files (classes.json, defines.json). Most of the actual data has to be in mark down strings. It has very little room for improvement. It gets superseded by Lua Doc. With that in mind, I would rather not support it at all.

The context.json is the raw data. I do not intend to alter its format, beyond small tweaks (to compensate for data loss and weirdness caused by the conversion). That said, it is not guaranteed to be stable (at least for now) - if the internal representation has to be changed, the context.json format will change with it.

The intention is if a specific format is desired (be it json or something else) a formatter/generator/marshaller should be written that will output the desired format by using the internal data representation (like the existing one for Lua Doc). If someone prefers to not do it in python and/or as part of this project, they can use the context.json to achieve the same.

If you contribute a formatter to this project or describe it to me(and wait till I can implement it), it will be the responsibility of this project's contributors (me) to keep it updated if the internal representation changes.

I see, i didn't notice there were multiple different jsons in play here, the only one i knew about was what you are calling context.json. I didn't know you were either currently partially or planning to support the (what i'm assuming is the) current factorio autocomplete extension, or some other generic format i'm not aware of, however i have nothing to add to this other than "i don't think i need it personally".

... I do not intend to alter its format...

That is perfectly fine.

it is not guaranteed to be stable (at least for now)

As the project gets closer to being done it'll be pretty stable - probably - and that's also fine.

Regarding defines and events, to keep it simple, i don't see the need to basically merge and intertwine the events page into the defines page (figuratively speaking).

The reason for that is that i never truly considered the defines to be the "name of the event". In my head it's always been a pain enum like any other and it's always bothered me that the event id is called name in the api. However with this in mind i can understand where you're comming from.

I see that the json has some redundant nesting, I might work on that, but otherwise it should be straight forward:

Thank you for pointing this out, this removes the mentioned complexity and with it any reason for me to push for a change. For some reason i thought i'd have to check if the current defines "enum" as i call them is defines.events and extract data differently for that one as i'm looping through all defines. I don't i explained this well, however it doesn't matter, ...

All you need to know is that how you're doing it now - even if it isn't how i'd do it myself - it works and it's alright with me.

Note: this includes using defines.events.foo as class names being good.

I pushed an updated representation of the defines and a bit more (no release yet, you will have to run the code).

And an update about the "redundant" nesting in json.defines.defines.events.defines: it is not so much redundant nesting, but rather bad naming. json.defines.children.events.children would make it clearer. Having them nested is required because the object (a group of defines) can have more than just its children - some of them have description ( for example: https://lua-api.factorio.com/latest/defines.html#defines.behavior_result)

Update: I can't get the define's table<string, integer> to work with pairs while also having a class/alias (to use as return/parameter type). I will have to experiment more.

I pushed an updated representation of the defines and a bit more (no release yet, you will have to run the code).

Looks good!

And an update about the "redundant" [...]

I like it. Naming improvements are well worth it.

Update: I can't get the define's table<string, integer> to work with pairs while also having a class/alias (to use as return/parameter type). I will have to experiment more.

Oh i see, yea it doesn't work. I didn't think to use inheritance like you did, but i've tried all combinations of inheritance and multi typing i could think of and all have their downsides that are just not worth it. In the rare occasions that one needs to loop through defines it's simple enough to annotate the variables manually. Sorry for not noticing that earlier.