Issue writing Panda Dataframe back to Impala
waldenrt opened this issue · comments
I have the following code:
from impala.dbapi import connect
from impala.util import as_pandas
import sys
import sqlalchemy
conn = connect(host='impalaprd.tenethealth.net',
port=21050,
use_ssl=True,
auth_mechanism='GSSAPI',
kerberos_service_name='impala')
cursor = conn.cursor()
engine = sqlalchemy.create_engine('impala://', creator=conn)
sql = 'SELECT DBID from ACEDTA.DBINFO LIMIT 1000'
cursor.execute(sql)
mypanda = as_pandas(cursor)
print(mypanda)
mypanda.to_sql('default.dbinfo_dataframe_bulkload',engine)
When I call panda.to_sql, I get the following error:
'HiveServer2Connection' object is not callable
Am I missing a package installed in my environment or something with the code?
Thanks,
Richard
from impala.dbapi import connect
from impala.util import as_pandas
import sqlalchemy
conn = connect(host='impalaprd.tenethealth.net',
port=21050,
use_ssl=True,
auth_mechanism='GSSAPI',
kerberos_service_name='impala')
cursor = conn.cursor()
engine = sqlalchemy.create_engine('impala://', creator=conn)
sql = 'SELECT DBID from ACEDTA.DBINFO LIMIT 1000'
cursor.execute(sql)
mypanda = as_pandas(cursor)
print(mypanda)
mypanda.to_sql('default.dbinfo_dataframe_bulkload', engine)
It' seems you issue is is about connection. In which line the error happens?
I am having the same problem
- python 3.6.10
- sqlalchemy 1.3.18
- impyla 0.16.2
seems like the function expects a connection factory, instead of a single Hive connection.
/home/cdsw/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check)
654 try:
655 self.starttime = time.time()
--> 656 connection = pool._invoke_creator(self)
657 pool.logger.debug("Created new connection %r", connection)
658 self.connection = connection
/home/cdsw/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py in (crec)
247 argspec = util.get_callable_argspec(self._creator, no_self=True)
248 except TypeError:
--> 249 return lambda crec: creator()
250
251 defaulted = argspec[3] is not None and len(argspec[3]) or 0
TypeError: 'HiveServer2Connection' object is not callable
@rafaelreuber can you tell me what is your sqlalchemy version ?
I managed to make it work, but using the connection url :
engine = sqlalchemy.create_engine('impala://host:port/database?use_ssl=True&auth_mechanism=GSSAPI')
And then assigning String type to a TEXT field using dtype.
from sqlalchemy.types import String
mypanda.to_sql('poc_dashboard_test', engine, if_exists='append', index=False, dtype={'name':String()})