pylivy sessions with hive
iliakliuchnikov opened this issue · comments
IlyaK commented
Hi!
Im trying to start livy sessions with hive:
from livy import LivySession
import datetime
LIVY_URL = "http://mylivy:80"
with LivySession.create(
LIVY_URL,
jars=[
"gs://mybacket/hotfix/jars/iceberg-spark3-runtime-0.9.0.jar",
"gs://mybacket/hotfix/jars/spark_etl-1.0-SNAPSHOT.jar",
"gs://mybacket/hotfix/jars/spark-bigquery-with-dependencies_2.12-0.17.3.jar"
],
py_files=["gs://mybacket/hotfix/dags/package.zip"],
num_executors=1,
name=f"add-attribution-window-hours-{datetime.datetime.now()}",
spark_conf={
"spark.kubernetes.container.image.pullPolicy": "Always",
"spark.kubernetes.driverEnv.ETL_ENV": "prod",
"spark.executorEnv.ETL_ENV": "prod",
"spark.kubernetes.driverEnv.HIVE_CONF_DIR": "/opt/spark/conf/hive-site",
"spark.sql.warehouse.dir": "gs://mybacket/hive/",
"spark.sql.catalogImplementation": "hive",
"spark.kubernetes.driver.secrets.hive-site": "/opt/spark/conf/hive-site",
"spark.executor.memory": "16g",
"spark.executor.cores": "6",
"spark.eventLog.enabled": "true",
"spark.kubernetes.namespace": "default"
}
) as session:
# Run some code on the remote cluster
session.run("spark.sql('show databases;').show(20, False)")
# Retrieve the result
#local_df = session.download("df")
#local_df.show()
but 'show databases' in hive always empty (like local empty hive-metastore)
in log i see its trying to start hive:
21/12/01 12:22:24 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('gs://mybucket/hive/').
21/12/01 12:22:24 INFO SharedState: Warehouse path is 'gs://mybucket/hive/'.
21/12/01 12:22:26 INFO CodeGenerator: Code generated in 267.63079 ms
21/12/01 12:22:26 INFO CodeGenerator: Code generated in 11.710265 ms
21/12/01 12:22:26 INFO CodeGenerator: Code generated in 17.16884 ms
when i start livy batch with same spark_conf, always working fine, i have access to all tables, and log looks like that:
21/12/01 09:29:03 INFO HiveConf: Found configuration file file:/opt/spark/conf/hive-site/hive-site.xml
21/12/01 09:29:03 INFO HiveUtils: Initializing HiveMetastoreConnection version 2.3.7 using Spark classes.
21/12/01 09:29:03 INFO HiveConf: Found configuration file file:/opt/spark/conf/hive-site/hive-site.xml
21/12/01 09:29:03 INFO SessionState: Created HDFS directory: /tmp/hive/root
21/12/01 09:29:03 INFO SessionState: Created local directory: /tmp/root
21/12/01 09:29:03 INFO SessionState: Created HDFS directory: /tmp/hive/root/c42cf693-a56b-44d4-8b4f-5b67ed85c721
21/12/01 09:29:03 INFO SessionState: Created local directory: /tmp/root/c42cf693-a56b-44d4-8b4f-5b67ed85c721
21/12/01 09:29:03 INFO SessionState: Created HDFS directory: /tmp/hive/root/c42cf693-a56b-44d4-8b4f-5b67ed85c721/_tmp_space.db
21/12/01 09:29:03 INFO HiveClientImpl: Warehouse location for Hive client (version 2.3.7) is gs://mybucket/hive/
21/12/01 09:29:04 INFO metastore: Trying to connect to metastore with URI thrift://myhive-metastore.us-north1-a.c.myproject.internal:9083
21/12/01 09:29:04 INFO metastore: Opened a connection to metastore, current connections: 1
21/12/01 09:29:04 INFO metastore: Connected to metastore.
how to correctly set hive-config for livy sessions?
IlyaK commented
answer:
in config just set
livy.repl.enable-hive-context = true
and hive enabled on sessions