Error when applying remote function with multiple parameters
NiloFreitas opened this issue · comments
Environment details
Python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
bigframes==1.4.0
google-cloud-bigquery==3.20.1
ibis==8.0.0
pandas==2.2.2
pyarrow==15.0.2
sqlglot==20.11.0
Steps to reproduce
1. Deploy a remote function with multiple parameters using bigframes remote_function decorator
@bpd.remote_function(
[str, str],
str,
bigquery_connection=CONNECTION_ID
)
def extract_json_attribute_rf(pred: str, json_attribute: str) -> str:
import json
return json.loads(pred)[json_attribute]
2. Get a bigframes dataframe like this one
uri | pred |
---|---|
gs://dataproc-metastore-public-binaries/ads_ba... | {"interpretation": " The primary message of th... |
gs://dataproc-metastore-public-binaries/ads_ba... | {"interpretation": " The primary message of th... |
gs://dataproc-metastore-public-binaries/ads_ba... | {"interpretation": " The primary message of th... |
3. Apply this remote function to a dataframe, passing the additional parameter
Like this:
res_df = df.assign(interpretation=df["pred"].apply(extract_json_attribute_rf, args=("interpretation",)))
Or like this:
res_df = df.assign(interpretation=df["pred"].apply(extract_json_attribute_rf, json_attribute="interpretation"))
Which is how we would use following the pandas.Series.apply documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html
Stack trace
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
----> 1 res_df = df.assign(interpretation=df["pred"].apply(extract_json_attribute_rf, args=("interpretation",)))
File /opt/conda/lib/python3.10/site-packages/bigframes/core/log_adapter.py:44, in method_logger.<locals>.wrapper(*args, **kwargs)
42 if api_method_name.startswith("__") or not api_method_name.startswith("_"):
43 add_api_method(full_method_name)
---> 44 return method(*args, **kwargs)
TypeError: Series.apply() got an unexpected keyword argument 'args'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
----> 1 res_df = df.assign(interpretation=df["pred"].apply(extract_json_attribute_rf, json_attribute="interpretation"))
File /opt/conda/lib/python3.10/site-packages/bigframes/core/log_adapter.py:44, in method_logger.<locals>.wrapper(*args, **kwargs)
42 if api_method_name.startswith("__") or not api_method_name.startswith("_"):
43 add_api_method(full_method_name)
---> 44 return method(*args, **kwargs)
TypeError: Series.apply() got an unexpected keyword argument 'json_attribute'
Workaround
As a workaround I need to pass a concatenated string as input and then split, which is not the best thing to do
@bpd.remote_function(
[str],
str,
bigquery_connection=CONNECTION_ID
)
def extract_json_attribute_rf(input_content: str) -> str:
pred = input_content.split("|||")[0]
json_attribute = input_content.split("|||")[1]
import json
return json.loads(pred)[json_attribute]
input_remote_function_int = df['pred'] + '|||interpretation'
res_df = df.assign(interpretation=input_remote_function_int.apply(extract_json_attribute_rf))
Hi @NiloFreitas, currently multiple inputs hasn't been supported in remote functions. But we are working on a solution that will be available soon. You can watch #629 and our release notes https://cloud.google.com/python/docs/reference/bigframes/latest/changelog of it.