JefersonFG / pdp-compression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Big data file compression - comparison

Final work for the Parallel and Distributed Programming class, this program compares the compression algorithms available for Avro, Parquet and ORC file types.

Dependencies

The program depends on the following python modules:

  • pyarrow
  • pyorc
  • pandavro
  • sklearn
  • pandas

Running the program

Simply execute the main script with a python3 interpreter"

python main.py

The program was developed using Python 3.8

About


Languages

Language:Python 100.0%