harupy / pyspark-util

PySpark Utils

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pyspark-util

test_status Documentation Status Latest Release

A set of pyspark utility functions.

Example:

import pyspark_util as psu

data = [(1, 2, 3)]
columns = ['a', 'b', 'c']
df = spark.createDataFrame(data, columns)
prefixed = psu.prefix_columns(df, 'x')
prefixed.show()

Output:

+---+---+---+
|x_a|x_b|x_c|
+---+---+---+
|  1|  2|  3|
+---+---+---+

Installation

pip install pyspark-util

Development

Setup

docker-compose build
docker-compose up -d

Lint

docker exec psu-cnt ./tools/lint.sh

Test

docker exec psu-cnt ./tools/test.sh

About

PySpark Utils


Languages

Language:Python 90.7%Language:Dockerfile 8.4%Language:Shell 0.9%