googleapis / python-bigquery-dataframes

BigQuery DataFrames

Home Page:https://cloud.google.com/python/docs/reference/bigframes/latest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error when applying remote function with multiple parameters

NiloFreitas opened this issue · comments

commented

Environment details

Python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
bigframes==1.4.0
google-cloud-bigquery==3.20.1
ibis==8.0.0
pandas==2.2.2
pyarrow==15.0.2
sqlglot==20.11.0

Steps to reproduce

1. Deploy a remote function with multiple parameters using bigframes remote_function decorator

@bpd.remote_function(
    [str, str],
    str,
    bigquery_connection=CONNECTION_ID
)
def extract_json_attribute_rf(pred: str, json_attribute: str) -> str:
    
    import json
    return json.loads(pred)[json_attribute]

2. Get a bigframes dataframe like this one

uri pred
gs://dataproc-metastore-public-binaries/ads_ba... {"interpretation": " The primary message of th...
gs://dataproc-metastore-public-binaries/ads_ba... {"interpretation": " The primary message of th...
gs://dataproc-metastore-public-binaries/ads_ba... {"interpretation": " The primary message of th...

3. Apply this remote function to a dataframe, passing the additional parameter

Like this:
res_df = df.assign(interpretation=df["pred"].apply(extract_json_attribute_rf, args=("interpretation",)))
Or like this:
res_df = df.assign(interpretation=df["pred"].apply(extract_json_attribute_rf, json_attribute="interpretation"))

Which is how we would use following the pandas.Series.apply documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html

Stack trace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
----> 1 res_df = df.assign(interpretation=df["pred"].apply(extract_json_attribute_rf, args=("interpretation",)))

File /opt/conda/lib/python3.10/site-packages/bigframes/core/log_adapter.py:44, in method_logger.<locals>.wrapper(*args, **kwargs)
     42 if api_method_name.startswith("__") or not api_method_name.startswith("_"):
     43     add_api_method(full_method_name)
---> 44 return method(*args, **kwargs)

TypeError: Series.apply() got an unexpected keyword argument 'args'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
----> 1 res_df = df.assign(interpretation=df["pred"].apply(extract_json_attribute_rf, json_attribute="interpretation"))

File /opt/conda/lib/python3.10/site-packages/bigframes/core/log_adapter.py:44, in method_logger.<locals>.wrapper(*args, **kwargs)
     42 if api_method_name.startswith("__") or not api_method_name.startswith("_"):
     43     add_api_method(full_method_name)
---> 44 return method(*args, **kwargs)

TypeError: Series.apply() got an unexpected keyword argument 'json_attribute'

Workaround

As a workaround I need to pass a concatenated string as input and then split, which is not the best thing to do

@bpd.remote_function(
    [str],
    str,
    bigquery_connection=CONNECTION_ID
)
def extract_json_attribute_rf(input_content: str) -> str:
    
    pred = input_content.split("|||")[0]
    json_attribute = input_content.split("|||")[1]
    
    import json
    return json.loads(pred)[json_attribute]
input_remote_function_int = df['pred'] + '|||interpretation'
res_df = df.assign(interpretation=input_remote_function_int.apply(extract_json_attribute_rf))

Hi @NiloFreitas, currently multiple inputs hasn't been supported in remote functions. But we are working on a solution that will be available soon. You can watch #629 and our release notes https://cloud.google.com/python/docs/reference/bigframes/latest/changelog of it.