burnash / gspread

Google Sheets Python API

Home Page:https://docs.gspread.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot use `session` with `Client()` in v6.0.0

liam-clifford opened this issue · comments

Hey there, I am using this in a function and I think the recent changes here might be causing things to break (though I'm not 100% sure). @lavigne958 do you mind taking a look at #1159?

specifically: def authorize(credentials: Credentials, client_factory: Type[Client] = Client):

Here's a snippet of the error that i'm getting around this:

---> 47 client = Client(credentials,session)
     48 # auth
     50 if len(x) == 44:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-477f2a82-afcb-4603-8095-3ff1ba3c7ab7/lib/python3.9/site-packages/gspread/client.py:41, in Client.__init__(self, auth, http_client)
     38 def __init__(
     39     self, auth: Credentials, http_client: HTTPClientType = HTTPClient
     40 ) -> None:
---> 41     self.http_client = http_client(auth)

TypeError: 'AssertionSession' object is not callable

The original function is being used in a spark environment.

here is the code:
import json
import gspread
import pandas as pd
import re
import time
from gspread import Client
from authlib.integrations.requests_client import AssertionSession
from oauth2client.service_account import ServiceAccountCredentials

def read_google_sheet(x, y, *args, **kwargs):
    # x --> Spreadsheet name OR Spreadsheet key
    # y --> Sheet name
    # args --> Name of Temp Table
    # kwargs --> cell_range / make_columns_unique / append_to_col_header / clean_column_names (optional)

    scope = ['https://spreadsheets.google.com/feeds',
             'https://www.googleapis.com/auth/drive']

    credentials = ServiceAccountCredentials.from_json_keyfile_dict(json_object)

    def create_assertion_session(scopes, subject=None):
        token_url = json_object['token_uri']
        issuer = json_object['client_email']
        key = json_object['private_key']
        key_id = json_object['private_key_id']
        header = {'alg': 'RS256'}

        if key_id:
            header['kid'] = key_id
        # Google puts scope in payload
        claims = {'scope': ' '.join(scopes)}

        return AssertionSession(
            grant_type=AssertionSession.JWT_BEARER_GRANT_TYPE,
            token_endpoint=token_url,
            issuer=issuer,
            audience=token_url,
            claims=claims,
            subject=subject,
            key=key,
            header=header,
        )

    session = create_assertion_session(scope)
    client = Client(credentials, session)

    if len(x) == 44:
        try:
            sht = client.open_by_key(x)
        except:
            sht = client.open(x)
    else:
        sht = client.open(x)

    # Added lines below 3.17.22
    ss = client.open_by_key(sht.id)

    if len(re.sub("[^0-9]", "", str(y))) == len(str(y)):
        ws = sht.get_worksheet_by_id(int(y))
    else:
        ws = sht.worksheet(y)

    # Added lines below 3.17.22
    if kwargs is not None and 'cell_range' in kwargs and len(re.sub("[^0-9]", "", str(y))) == len(str(y)):
        range_data = f'{ws.title}{kwargs["cell_range"]}'
    elif kwargs is not None and 'cell_range' in kwargs:
        range_data = str(kwargs['cell_range'])
    else:
        range_data = ws.title

    values = ss.values_get(range_data)
    values = values['values']
    pd_df = pd.DataFrame(values)
    pd_df.fillna("", inplace=True)  # convert None --> ''
    # clean up dataset

    headers = pd_df.iloc[0]
    headers = [x.lower() for x in headers]  # convert series to list
    # get column headers from 0 index (ie. first) row

    cols = []

    if kwargs is not None and 'make_columns_unique' in kwargs:
        for x in range(0, len(headers)):
            if headers[:x].count(headers[x]) >= 1:
                cols.append(f'{headers[x]}{headers[:x].count(headers[x])}')
            else:
                cols.append(str(headers[x]))
    else:
        cols = headers

    pd_df = pd.DataFrame(pd_df.values[1:], columns=cols)
    # make columns unique --> if duplicate columns, then append the column index to end of respective column name

    if kwargs is not None and 'append_to_col_header' in kwargs:
        pd_df.columns = [f'{kwargs["append_to_col_header"]}{col_name}' for col_name in pd_df.columns]
        cols = [x.lower() for x in pd_df.columns.values]
        # append kwargs['append_to_col_header'] to beginning of column name

    try:
        spark_df = spark.createDataFrame(pd_df)
    except:
        cleaned_cols = []
        for x in range(0, len(cols)):
            if cols[:x].count(cols[x]) >= 1:
                cleaned_cols.append(f'{cols[x]}{cols[:x].count(cols[x])}')
            else:
                cleaned_cols.append(str(cols[x]))

        pd_df = pd.DataFrame(pd_df.values[1:], columns=cleaned_cols)
        spark_df = spark.createDataFrame(pd_df)
        # if there are still duplicates, then map the column index to the end of the column name by default

    if kwargs is not None and 'clean_column_names' in kwargs:
        pandas_before = spark_df.toPandas()

        pandas_before.columns = [re.sub("[^a-zA-Z0-9]+", "_", x) for x in pandas_before.columns]
        headers = [re.sub('__', '_', re.sub('___', '_', x)) for x in pandas_before.columns.values]
        headers = [x[:len(x) - 3] if x.endswith('___', 0, len(x)) else x.lower() for x in headers]
        headers = [x[:len(x) - 2] if x.endswith('__', 0, len(x)) else x.lower() for x in headers]
        headers = [x[:len(x) - 1] if x.endswith('_', 0, len(x)) else x.lower() for x in headers]
        pandas_before.columns = headers

        cols = []
        headers = [x.lower() for x in pandas_before.columns.values]
        for x in range(0, len(headers)):
            if headers[:x].count(headers[x]) >= 1:
                cols.append(f'{headers[x]}{headers[:x].count(headers[x])}')
            else:
                cols.append(str(headers[x]))

        pandas_before.columns = cols
        pd_df = pandas_before

    if args:
        spark_df = spark.createDataFrame(pd_df)
        spark_df.createOrReplaceTempView(args[0])

    return spark_df, pd_df, sht, ws, ss

Hi I'm sorry to read that things break with the latest release.

We made new major release, this allowed us to introduced breaking changes.

As such:

  • the object gspread.Client is now a class that only handles spreadsheet actions
  • we created a new class that handles pure HTTP calls: gspread.http_client.HTTPClient
  • when we init a new gspread.Client we take 2 arguments:
    • auth the credentials
    • http_client the actual Client class type to instantiate

This makes me realize you can't pass your own Session object anymore !

this is a bug we need to address, it will be fixed shortly we simply need to add extra optional argument to functions/methods that init a new client a Session object so you can pass you own session object.

We'll fix this ASAP.

Wow, thanks for the quick reply back + for the added context. A lot of this goes over my head, but i'm glad to hear you were able to discover a bug. If there's anything I can do to help (re: providing any additional info), please do let me know. Again, really appreciate the help + keeping this project going!

Wow, thanks for the quick reply back + for the added context. A lot of this goes over my head, but i'm glad to hear you were able to discover a bug. If there's anything I can do to help (re: providing any additional info), please do let me know. Again, really appreciate the help + keeping this project going!

My pleasure, we introduced an obvious regression, thank you for catching it and letting us know.

Yes I updates the code and I will do a new release soon today, in order to use you own session object please use the following API from gspread:

Import gspread

client = gspread.authorize(None, gspread.HTTPClient, Mysessionobject)

No thank you I have everything needed already, thanks to the above details + code example + investigation.

until this fix is released in v6.0.0, you can use v5.12.4 with

pip install gspread==5.12.4