snowflakedb / snowflake-cli

Snowflake CLI is an open-source command-line tool explicitly designed for developer-centric workloads in addition to SQL operations.

Home Page:https://docs.snowflake.com/developer-guide/snowflake-cli-v2/index

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SNOW-1137351: deploy snow cli python vectorized UDF

peddadap opened this issue · comments

Description

Does the CLI support creating python vectorized UDF's ?

For example https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch

If one would like to use snow-cli constructs to create a Python vectorized UDF, is that something that snow-cli currently supports?

Context

We use snowcli to do our ci/cd deployments for code written in python, the command line snow snowpark options help us deploy object remotely

It would be nice to support full range of objects

It's possible that we don't need to do anything special to make it working and decorating your python code as described in the docs linked by you will be enough.

We'll prepare a test case confirming work of vectorized UDFs but we will need some time to take care of it. However I encourage to try to just use it and share your observations if you find any troubles.

Of course all above is about SnowCLI v2.0.0 +.

Thank you for the response, tested out the vectorized UDF using SnowCLI v2.0.0. + Of the two variants

Python Vectorized UDF's: https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch

The snow cli "snowflake.yml" bindings on method inputs and outputs were primitives and worked out okay.

The second variant : UDFT's return type in a table, i.e. data-frame itself

https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-tabular-vectorized

From a 'yml' specification for snow-cli deployment of UDFT's what would be the equivalent for a return Type when its a pandas dataframe?

for example in the yaml here what would be the return value if I were sending the same dataframe back ?

all inputs are automatically being converted into pandas series and a final pandas dataframe is passed to method

definition_version: 1
snowpark:
project_name: project_xyz
stage_name: python
src: '/python/udf'
functions:

  • name: vctr_add_inputs
    handler: vectorized_sample.vctr_add_inputs
    signature:
    • type: integer
      name: x
    • type: float
      name: y
      returns: ??

what would be the equivalent type for returning a table , can return have a map structure ?

The link here gives possible hints on what the python to sql equivalent is likely to be, however I'm not clear on what exactly the return type would be in the 'yml' for UDTF

https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch#type-support

SQL CREATE STATEMENT -- need equivalent 'yml' for snow-cli deployment on the return value

create or replace function summary_stats(id varchar, col1 float, col2 float, col3 float, col4 float, col5 float)
returns table (column_name varchar, count int, mean float, std float, min float, q1 float, median float, q3 float, max float)
language python
runtime_version=3.8
packages=('pandas')
handler='handler'
as $$
from _snowflake import vectorized
import pandas

class handler:
@Vectorized(input=pandas.DataFrame)
def end_partition(self, df):
# using describe function to get the summary statistics
result = df.describe().transpose()
# add a column at the beginning for column ids
result.insert(loc=0, column='column_name', value=['col1', 'col2', 'col3', 'col4', 'col5'])
return result
$$;

I think this will work:
returns: "table(column_name varchar, count int, mean float, std float, min float, q1 float, median float, q3 float, max float)"

Closing due to no response and no explicit bug. Please feel free to reopen if you see something is missing