Gsheet - Request too large error
FlorentLvr opened this issue · comments
Florent commented
Build a function to manage large requests to send data to Gsheet
Florent commented
New function added to utils/data and updated in notebooks:
def send_data_to_gsheet(
df: pd.DataFrame,
df_init: pd.DataFrame,
spreadsheet_url: str,
sheet_name: str,
chunck_size: int = 100000
):
"""
This function compares two dataframes and if they are different, sends the data from the second dataframe to a Google Sheet.
:param df: The main dataframe to be sent to Google Sheet.
:param df_init: The initial dataframe to be compared with the main dataframe.
:param spreadsheet_url: The URL of the Google Sheet to send data to.
:param sheet_name: The name of the sheet in the Google Sheet to send data to.
:param chunck_size: The size of the chunks to split the data into for sending. Default is 100000.
"""
# Compare dataframes
df_check = pd.concat([df.astype(str), df_init.astype(str)]).drop_duplicates(keep=False)
# Update or Do nothing
if len(df_check) > 0:
df_size = len(df) * len(df.columns)
if df_size < chunck_size:
gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df, append=False)
print(f"✅ DataFrame successfully sent to Google Sheets!")
else:
max_rows = int(chunck_size / len(df.columns))
start = 0
limit = start + max_rows
gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df[start:limit], append=False)
print(f"✅ Rows {start} to {limit} successfully added to Google Sheets!")
start += max_rows
while start < len(df):
limit = start + max_rows
if limit > len(df):
limit = len(df)
gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df[start:limit], append=True)
print(f"✅ Rows {start} to {limit} successfully added to Google Sheets!")
start += max_rows
else:
print("Noting to update in Google Sheets!")