Does Koalas support reading hive table by default?
amznero opened this issue · comments
Hi,
I'm trying to use Koalas to load a hive table on the remote cluster. In https://koalas.readthedocs.io/en/latest/reference/io.html#spark-metastore-table, it says that I can use ks.read_table
API to read spark-table, but it failed when I use ks.read_table
to read the table.
import pandas as pd
import numpy as np
import databricks.koalas as ks
from pyspark.sql import SparkSession
koalas_df = ks.read_table("xxx.yyy")
Error log:
AnalysisException: "Table or view not found: `xxx`.`yyy`;;\n'UnresolvedRelation `xxx`.`yyy`\n"
However, I can load it successfully by directly using pyspark+pandas+pyarrow.
some snippets
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
spark_df = spark.read.table("xxx")
pandas_df = spark_df.toPandas()
...
And I check some source codes in
koalas/databricks/koalas/namespace.py
Line 556 in e971d6f
It uses default_session(without option configures) to load the table, but it does not set enableHiveSupport
option.
koalas/databricks/koalas/utils.py
Lines 433 to 456 in e971d6f
So, I'm a little confused about ks.read_table
, where does it load tables from?
Maybe link to Spark-warehouse?