googleapis / python-bigquery-dataframes

BigQuery DataFrames

Home Page:https://cloud.google.com/python/docs/reference/bigframes/latest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`groupby(as_index=False).agg()` cause error on Google Colab environment

m-wakatsuru opened this issue · comments

sentence

When I executed the groupby function with as_index and agg() in google colab, I got an error.
I’ve already checked that I can execute only with as_index parameter except agg() and only with agg() except as_index parameter.

I wonder if this is the specification of bigframes or merely a bug.

Environment details

  • OS type and version: Linux(Google Colab)
  • Python version: 3.10.12
  • pip version: 23.1.2
  • bigframes version: 0.14.1

Steps to reproduce

  1. copy and paste a code example below to your google colab
  2. insert your variables into this code

Code example

import bigframes.pandas as bpd

bpd.options.bigquery.project = "{PROJECT_ID}"
df = bpd.read_gbq("{PROJECT_ID}.{DATASET_ID}.{TABLE_NAME}")
df_g = df.groupby("column_x", as_index=False).agg({
    "column_y":["min","median","max"]
})
df_g.head()

Stack trace

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-a11b52d5f92f> in <cell line: 1>()
----> 1 df_g = df.groupby("column_x", as_index=False).agg({
      2     "column_y":["min","median","max"]
      3 })
      4 df_g

/usr/local/lib/python3.10/dist-packages/bigframes/core/log_adapter.py in wrapper(*args, **kwargs)
     39         if api_method_name.startswith("__") or not api_method_name.startswith("_"):
     40             add_api_method(api_method_name)
---> 41         return method(*args, **kwargs)
     42 
     43     return wrapper

/usr/local/lib/python3.10/dist-packages/bigframes/core/groupby/__init__.py in agg(self, func, **kwargs)
    243                 return self._agg_string(func)
    244             elif utils.is_dict_like(func):
--> 245                 return self._agg_dict(func)
    246             elif utils.is_list_like(func):
    247                 return self._agg_list(func)

/usr/local/lib/python3.10/dist-packages/bigframes/core/log_adapter.py in wrapper(*args, **kwargs)
     39         if api_method_name.startswith("__") or not api_method_name.startswith("_"):
     40             add_api_method(api_method_name)
---> 41         return method(*args, **kwargs)
     42 
     43     return wrapper

/usr/local/lib/python3.10/dist-packages/bigframes/core/groupby/__init__.py in _agg_dict(self, func)
    287         )
    288         if want_aggfunc_level:
--> 289             agg_block = agg_block.with_column_labels(
    290                 utils.combine_indices(
    291                     pd.Index(column_labels),

/usr/local/lib/python3.10/dist-packages/bigframes/core/blocks.py in with_column_labels(self, value)
    618         label_list = value.copy() if isinstance(value, pd.Index) else pd.Index(value)
    619         if len(label_list) != len(self.value_columns):
--> 620             raise ValueError(
    621                 f"The column labels size `{len(label_list)} ` should equal to the value"
    622                 + f"columns size: {len(self.value_columns)}."

ValueError: The column labels size `3 ` should equal to the valuecolumns size: 4.

Thanks you for your report. This is indeed a bug with as_index+agg(). I have started work on a fix: #273.