jupyter-naas / abi

The AI system for your everyday business. WIP. Star the repository to stay updated.

Home Page:https://naas.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gsheet - Request too large error

FlorentLvr opened this issue · comments

Build a function to manage large requests to send data to Gsheet

New function added to utils/data and updated in notebooks:

def send_data_to_gsheet(
    df: pd.DataFrame,
    df_init: pd.DataFrame, 
    spreadsheet_url: str, 
    sheet_name: str, 
    chunck_size: int = 100000
):
    """
    This function compares two dataframes and if they are different, sends the data from the second dataframe to a Google Sheet.

    :param df: The main dataframe to be sent to Google Sheet.
    :param df_init: The initial dataframe to be compared with the main dataframe.
    :param spreadsheet_url: The URL of the Google Sheet to send data to.
    :param sheet_name: The name of the sheet in the Google Sheet to send data to.
    :param chunck_size: The size of the chunks to split the data into for sending. Default is 100000.
    """
    # Compare dataframes
    df_check = pd.concat([df.astype(str), df_init.astype(str)]).drop_duplicates(keep=False)
    
    # Update or Do nothing
    if len(df_check) > 0:
        df_size = len(df) * len(df.columns)
        if df_size < chunck_size:
            gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df, append=False)
            print(f"✅ DataFrame successfully sent to Google Sheets!")
        else:
            max_rows = int(chunck_size / len(df.columns))
            start = 0
            limit = start + max_rows
            gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df[start:limit], append=False)
            print(f"✅ Rows {start} to {limit} successfully added to Google Sheets!")
            start += max_rows
            while start < len(df):
                limit = start + max_rows
                if limit > len(df):
                    limit = len(df)
                gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df[start:limit], append=True)
                print(f"✅ Rows {start} to {limit} successfully added to Google Sheets!")
                start += max_rows
    else:
        print("Noting to update in Google Sheets!")