physikerwelt / texvcinfo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extract identifiers

physikerwelt opened this issue · comments

@alexeygrigorev can we use rseq to extract identifiers and operators from the latex parse tree?

Nope, it's for sequences, not for trees. But if we can flatten this tree, then maybe

Normally, (i.e.without -t or -j) it prints out the token sequence. e.g.:

texvcTokens \hat{U}(t,t_0)=\exp{\left(-\frac{i}\hbar \int_{t_0}^t H(t')dt'\right)}
[
  [
    "FUN1",
    "\\hat"
  ],
  [
    "CURLY",
    ""
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "U"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "("
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "t"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    ","
  ],
  [
    "DQ",
    ""
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "t"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "0"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    ")"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "="
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "\\exp "
  ],
  [
    "CURLY",
    ""
  ],
  [
    "LR",
    ""
  ],
  [
    "TEX_ONLY",
    "("
  ],
  [
    "TEX_ONLY",
    ")"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "-"
  ],
  [
    "FUN2",
    "\\frac"
  ],
  [
    "CURLY",
    ""
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "i"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "\\hbar "
  ],
  [
    "FQ",
    ""
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "\\int "
  ],
  [
    "CURLY",
    ""
  ],
  [
    "DQ",
    ""
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "t"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "0"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "t"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "H"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "("
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "t"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "'"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    ")"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "d"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "t"
  ],
  [
    "LITERAL",
    ""
  ],
  [
    "TEX_ONLY",
    "'"
  ]
]

Anyhow, maybe it's simpler to write a custom AST visitor, that outputs identifiers only.

This one looks very good for rseq. I'll try to make it available on maven central asap. But will it work for java script?

no... it's not availible in java, but one could pass a json object