SNOW-1137351: deploy snow cli python vectorized UDF
peddadap opened this issue · comments
Description
Does the CLI support creating python vectorized UDF's ?
For example https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch
If one would like to use snow-cli constructs to create a Python vectorized UDF, is that something that snow-cli currently supports?
Context
We use snowcli to do our ci/cd deployments for code written in python, the command line snow snowpark options help us deploy object remotely
It would be nice to support full range of objects
It's possible that we don't need to do anything special to make it working and decorating your python code as described in the docs linked by you will be enough.
We'll prepare a test case confirming work of vectorized UDFs but we will need some time to take care of it. However I encourage to try to just use it and share your observations if you find any troubles.
Of course all above is about SnowCLI v2.0.0 +.
Thank you for the response, tested out the vectorized UDF using SnowCLI v2.0.0. + Of the two variants
Python Vectorized UDF's: https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch
The snow cli "snowflake.yml" bindings on method inputs and outputs were primitives and worked out okay.
The second variant : UDFT's return type in a table, i.e. data-frame itself
https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-tabular-vectorized
From a 'yml' specification for snow-cli deployment of UDFT's what would be the equivalent for a return Type when its a pandas dataframe?
for example in the yaml here what would be the return value if I were sending the same dataframe back ?
all inputs are automatically being converted into pandas series and a final pandas dataframe is passed to method
definition_version: 1
snowpark:
project_name: project_xyz
stage_name: python
src: '/python/udf'
functions:
- name: vctr_add_inputs
handler: vectorized_sample.vctr_add_inputs
signature:- type: integer
name: x - type: float
name: y
returns: ??
- type: integer
what would be the equivalent type for returning a table , can return have a map structure ?
The link here gives possible hints on what the python to sql equivalent is likely to be, however I'm not clear on what exactly the return type would be in the 'yml' for UDTF
https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch#type-support
SQL CREATE STATEMENT -- need equivalent 'yml' for snow-cli deployment on the return value
create or replace function summary_stats(id varchar, col1 float, col2 float, col3 float, col4 float, col5 float)
returns table (column_name varchar, count int, mean float, std float, min float, q1 float, median float, q3 float, max float)
language python
runtime_version=3.8
packages=('pandas')
handler='handler'
as $$
from _snowflake import vectorized
import pandas
class handler:
@Vectorized(input=pandas.DataFrame)
def end_partition(self, df):
# using describe function to get the summary statistics
result = df.describe().transpose()
# add a column at the beginning for column ids
result.insert(loc=0, column='column_name', value=['col1', 'col2', 'col3', 'col4', 'col5'])
return result
$$;
I think this will work:
returns: "table(column_name varchar, count int, mean float, std float, min float, q1 float, median float, q3 float, max float)"
Closing due to no response and no explicit bug. Please feel free to reopen if you see something is missing