Here is my attempt at rephrasing the readme text in a more explanatory, comprehensive, formal, and structured manner:

Snowpark Extensions

Overview

The Snowpark Extensions project aims to simplify the migration process from Apache Spark to Snowpark for Scala developers. It provides a set of helper methods and utilities built as extensions on top of the existing Snowpark Scala APIs.

The core goal is to minimize the amount of manual code changes required when migrating from Spark to Snowpark. This is achieved by leveraging Scala's implicit classes to essentially "overload" existing Snowpark classes like Column, DataFrame, and Session to have additional functionality not available out-of-the-box in Snowflake Snowpark Scala APIs.

Features

The Snowpark Extensions project offers the following features:

Implicit Column Extensions - Additional helper methods for Column to simplify common data transformation tasks
Implicit DataFrame Extensions - Extra functionality for DataFrame to streamline migrations including things like improved join APIs
Implicit Session Extensions - Helper utilities for Session to simplify setup and configuration

By leveraging implicits, these extensions provide overlayed APIs without requiring changes to existing Snowpark imports or references.

In some situations, some functions is easier to implement by registering some SQL or Javascript UDFs. You can find the code for some of them at the scripts folder

Usage

To use the Snowpark Extensions project, simply import the extension classes:

import com.snowflake.snowpark_extensions.Extensions._

This will bring all extended Column, DataFrame, and Session functionalities into scope. You can then utilize the additional methods as if they were available directly on the base classes.

Building

The project uses Maven for building:

mvn clean compile package

This will compile the code and package it into a JAR file for distribution and dependency management.

The output JAR can then be included in any Scala application to leverage the Snowpark Extensions helpers.

SQL Extensions

You can find some SQL scripts here:

UDF	Description
array_zip	Returns a merged array of arrays
conv	Convert num from from_base to to_base
format_string	Returns a formatted string from printf-style format strings.
isnan	Returns true if expr is NaN, or false otherwise.
nanvl	Returns expr1 if it's not NaN, or expr2 otherwise
substring_index	Returns the substring from str before count occurrences of the delimiter
regexp_split	Splits into an array based on regexp
regexp_extract	Extracts the group specified based on regexp
regexp_replaceall	Replaces all matches to a regexp with another string
regexp_like	Returns True/False based on whether a regexp matches
instr	Returns the position of the first occurrence of substr column in the given string

Links

See the full API documentation here: https://snowflake-labs.github.io/snowpark-extensions/

Snowflake-Labs / snowpark-extensions