Here is my attempt at rephrasing the readme text in a more explanatory, comprehensive, formal, and structured manner:
The Snowpark Extensions project aims to simplify the migration process from Apache Spark to Snowpark for Scala developers. It provides a set of helper methods and utilities built as extensions on top of the existing Snowpark Scala APIs.
The core goal is to minimize the amount of manual code changes required when migrating from Spark to Snowpark. This is achieved by leveraging Scala's implicit classes to essentially "overload" existing Snowpark classes like Column, DataFrame, and Session to have additional functionality not available out-of-the-box in Snowflake Snowpark Scala APIs.
The Snowpark Extensions project offers the following features:
- Implicit Column Extensions - Additional helper methods for Column to simplify common data transformation tasks
- Implicit DataFrame Extensions - Extra functionality for DataFrame to streamline migrations including things like improved join APIs
- Implicit Session Extensions - Helper utilities for Session to simplify setup and configuration
By leveraging implicits, these extensions provide overlayed APIs without requiring changes to existing Snowpark imports or references.
In some situations, some functions is easier to implement by registering some SQL or Javascript UDFs. You can find the code for some of them at the scripts folder
To use the Snowpark Extensions project, simply import the extension classes:
import com.snowflake.snowpark_extensions.Extensions._
This will bring all extended Column, DataFrame, and Session functionalities into scope. You can then utilize the additional methods as if they were available directly on the base classes.
The project uses Maven for building:
mvn clean compile package
This will compile the code and package it into a JAR file for distribution and dependency management.
The output JAR can then be included in any Scala application to leverage the Snowpark Extensions helpers.
You can find some SQL scripts here:
UDF | Description |
---|---|
array_zip | Returns a merged array of arrays |
conv | Convert num from from_base to to_base |
format_string | Returns a formatted string from printf-style format strings. |
isnan | Returns true if expr is NaN, or false otherwise. |
nanvl | Returns expr1 if it's not NaN, or expr2 otherwise |
substring_index | Returns the substring from str before count occurrences of the delimiter |
regexp_split | Splits into an array based on regexp |
regexp_extract | Extracts the group specified based on regexp |
regexp_replaceall | Replaces all matches to a regexp with another string |
regexp_like | Returns True/False based on whether a regexp matches |
instr | Returns the position of the first occurrence of substr column in the given string |
See the full API documentation here: https://snowflake-labs.github.io/snowpark-extensions/