orangeduck / mpc

A Parser Combinator library for C

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In theory: Can MPC parse C?

IngwiePhoenix opened this issue · comments

I am currently trying to look for a way to strip function definitions and the likes from C source files to use in the V language as a basis for writing wrappers.

Short summary:

struct my_struct_s {int a;};
void something(struct my_struct_s* s);

becomes:

#include "my_struct.h"
struct C.my_struct_s {
  a int
}
fn C.something(C.my_struct_s)

This works with enums, unions and the likes the same way. But to get there I ned to extract related information from header files.

Is it possible to use mpc to do this? And if so, how?

Short answer: no

Long answer: yes

Basically the complete and full parsing of C is difficult because typedefs can mean you don't know if a symbol is a type or an identifier and this changes how you parse things dynamically. It means you will probably struggle to parse it using a grammar only approach since when parsing C you need to evaluate typedefs as you encounter them.

However since mpc allows you to have callbacks or functions you apply associated with particular parsing rules when you build your parser via the functional approach it probably is possible to hack something together which allows for the parsing of C.

Some subset of C may well be fairly straight forward but ultimately I think if you want to parse the complete language of C you are in for a difficult time if you want to do it via mpc.

I am sorry I only get to see this answer now... oops.

So if I understand correctly: Parsing the raw syntax should be possible, at least to generate an AST. But actually extracting full information would mean I would have to evaluate typedefs (as in, creating a typemap as I go and resolving defined types off that as they are encountered)? That would be perfectly fine, as V compiles to C there is no real need to fully resolve types - just to get the plain definition of other symbols (i.e. translate libraryType funcName(); to the equivalent notation in V, which would be fn C.funcName() C.libraryType).

Again, sorry for the super late reply!