SCAPAR: A Scalable Text Parser for Data Sciences

In data sciences industry, they are large amounts of data to process. Here SCAPAR is developed to handle these data. Data formats are designed and proper reading routines are implemented. The library is designed for Linux, Unix and Mac systems. The code is written by C.

Build

The simplest way to install is to run commands:

 ./configure
 make
 sudo make install

The default destination is /usr/local/scapar/. However, the destination can be changed by using command:

 ./configure --prefix=DES_PATH

By this, the library will be installed to DES_PATH.

Performance

large.c in example/ parses a large keyword file, which contains 75 million lines of data. SCAPAR can parse the file in around 25 seconds. Its performance is around 3 million lines of data per second.

gen_perf.c generates perf.c. Command:

./gen_perf n

will generate perf.c that contains n integers, n floating-point, n vectors of integers, and n vectors of floating-point numbers. The perf.c will register all these parameters and parse these parameters from perf.dat (also generated by gen_perf).

About

A Scalable Text Parser

GNU General Public License v3.0

Languages

Language:C 95.8%Language:Shell 3.7%Language:Makefile 0.5%