lifting-bits / sleigh

Unofficial CMake build for Ghidra's C++ SLEIGH code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Write a script to check for new files

tetsuo-cpp opened this issue · comments

When new files are added to Sleigh, sometimes our weekly sync with Ghidra continues working fine and we don't notice. This leads us to having missing headers which we have to fix later down the track (#107).

We should write a script as part of our weekly sync that identifies new files and either adds them to the PR or fails loudly so we can manually fix it.

Partial improvement made in this commit b80fefc

Hopefully, it lists the changed (including new and removed) files in the PR and commit message.

@ekilmer

Had a think about this today. I don't think it's realistic for us to automatically update the CMake configuration since there's no way for us to know what sub-library (libsla, libdecomp, etc) the file belongs to. However, I'd like it to be a bit more obvious than it is now since the additions in the git diff output can easily get lost in a sea of modifications.

I want to use a regex to parse the git diff output to figure out what files are added and if there are new sources, have the PR bot leave a comment saying something like: "Manual intervention required. This update contains the following new C++ sources."

Does that seem ok to you?

@tetsuo-cpp

Had a think about this today. I don't think it's realistic for us to automatically update the CMake configuration since there's no way for us to know what sub-library (libsla, libdecomp, etc) the file belongs to. However, I'd like it to be a bit more obvious than it is now since the additions in the git diff output can easily get lost in a sea of modifications.

Very good point, and I agree with both statements

I want to use a regex to parse the git diff output to figure out what files are added and if there are new sources, have the PR bot leave a comment saying something like: "Manual intervention required. This update contains the following new C++ sources."

Does that seem ok to you?

A regex would work but there's also a native way to filter for added, modified, and deleted files that I just learned about recently: diff-filter

git diff --diff-filter=M

and we could probably run it 3 times for each of M (modified), A (added), and D (deleted), where A and D would likely require manual intervention.

Moreover, I think there are some additional improvements to be made for the sleighspec directory: We should do one (or combination) of the following (or some other equivalent)

  1. Ignore java, manifest, etc. files
  2. iterate all of the extensions for sleigh specifications
  3. Be more precise about directories we look for changes in (i.e. Ghidra/Processors/*/data/languages, but there might be other paths we want too)