eaubin / StructExtract

Extraction structured data from semi-structured files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Datamaran

Datamaran is a toolkit for extracting structure from semi-structured log datasets without human supervision.

Build

cd StructExtract
make datamaran
mv datamaran ../

Usage

./datamaran sample.txt output

This usually takes a few minutes. When it finishes, it will generate a series of output files with name: output_1.tsv,output_2.tsv,... respectively. Each of these files would contain one type of extracted record, and the number of such files depend on the number of record types recognized in the input file.

About

Extraction structured data from semi-structured files


Languages

Language:C++ 97.2%Language:Lex 1.5%Language:Makefile 0.9%Language:Python 0.4%