Extract identifiers
physikerwelt opened this issue · comments
Moritz Schubotz commented
@alexeygrigorev can we use rseq to extract identifiers and operators from the latex parse tree?
Alexey Grigorev commented
Nope, it's for sequences, not for trees. But if we can flatten this tree, then maybe
Moritz Schubotz commented
Normally, (i.e.without -t or -j) it prints out the token sequence. e.g.:
texvcTokens \hat{U}(t,t_0)=\exp{\left(-\frac{i}\hbar \int_{t_0}^t H(t')dt'\right)}
[
[
"FUN1",
"\\hat"
],
[
"CURLY",
""
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"U"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"("
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"t"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
","
],
[
"DQ",
""
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"t"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"0"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
")"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"="
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"\\exp "
],
[
"CURLY",
""
],
[
"LR",
""
],
[
"TEX_ONLY",
"("
],
[
"TEX_ONLY",
")"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"-"
],
[
"FUN2",
"\\frac"
],
[
"CURLY",
""
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"i"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"\\hbar "
],
[
"FQ",
""
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"\\int "
],
[
"CURLY",
""
],
[
"DQ",
""
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"t"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"0"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"t"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"H"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"("
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"t"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"'"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
")"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"d"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"t"
],
[
"LITERAL",
""
],
[
"TEX_ONLY",
"'"
]
]
Moritz Schubotz commented
Anyhow, maybe it's simpler to write a custom AST visitor, that outputs identifiers only.
Alexey Grigorev commented
This one looks very good for rseq. I'll try to make it available on maven central asap. But will it work for java script?
Moritz Schubotz commented
no... it's not availible in java, but one could pass a json object