gabriel-weaver / xutools

eXtended UNIX text-processing tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Structured Text Processing

gabriel-weaver opened this issue · comments

TXR: a Pattern Matching Language (Not Just) for Convenient Text Extraction

Suggested by Kaz Kylheku on Slashdot.

Lightweight Structure In Text.

Pattern matching is heavily used for searching, filtering, and transforming text, but existing pattern languages offer few opportunities for reuse. Lightweight structure is a new approach that solves the reuse problem. Lightweight structure has three parts: a model of text structure as contiguous segments of text, or regions; an extensible library of structure abstractions (e.g., HTML elements, Java expressions, or English sentences) that can be implemented by any kind of pattern or parser; and a region algebra for composing and reusing structure abstractions. Lightweight structure does for text pattern matching what procedure abstraction does for programming, enabling construction of a reusable library.

Lightweight structure has been implemented in LAPIS, a web browser/text editor that demonstrates several novel techniques:

http://www.cs.cmu.edu/~rcm/papers/thesis/

Coccinelle

Coccinelle: A program matching and transformation tool for systems code, 2011. Retrieved November 11, 2011 from http://coccinelle.lip6.fr/.