databricks / koalas

Hi,

I'm trying to use Koalas to load a hive table on the remote cluster. In https://koalas.readthedocs.io/en/latest/reference/io.html#spark-metastore-table, it says that I can use ks.read_table API to read spark-table, but it failed when I use ks.read_table to read the table.

import pandas as pd
import numpy as np
import databricks.koalas as ks
from pyspark.sql import SparkSession

koalas_df = ks.read_table("xxx.yyy")

Error log:

AnalysisException: "Table or view not found: `xxx`.`yyy`;;\n'UnresolvedRelation `xxx`.`yyy`\n"

However, I can load it successfully by directly using pyspark+pandas+pyarrow.

some snippets

from pyspark.sql import SparkSession

spark = SparkSession.builder.enableHiveSupport().getOrCreate()
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

spark_df = spark.read.table("xxx")
pandas_df = spark_df.toPandas()
...

And I check some source codes in

koalas/databricks/koalas/namespace.py

Line 556 in e971d6f

    
           def read_table(name: str, index_col: Optional[Union[str, List[str]]] = None) -> DataFrame:

It uses default_session(without option configures) to load the table, but it does not set enableHiveSupport option.

koalas/databricks/koalas/utils.py

Lines 433 to 456 in e971d6f

    
           def default_session(conf=None): 
        
               if conf is None: 
        
                   conf = dict() 
        
               should_use_legacy_ipc = False 
        
               if LooseVersion(pyarrow.__version__) >= LooseVersion("0.15") and LooseVersion( 
        
                   pyspark.__version__ 
        
               ) < LooseVersion("3.0"): 
        
                   conf["spark.executorEnv.ARROW_PRE_0_15_IPC_FORMAT"] = "1" 
        
                   conf["spark.yarn.appMasterEnv.ARROW_PRE_0_15_IPC_FORMAT"] = "1" 
        
                   conf["spark.mesos.driverEnv.ARROW_PRE_0_15_IPC_FORMAT"] = "1" 
        
                   conf["spark.kubernetes.driverEnv.ARROW_PRE_0_15_IPC_FORMAT"] = "1" 
        
                   should_use_legacy_ipc = True 
        
               builder = spark.SparkSession.builder.appName("Koalas") 
        
               for key, value in conf.items(): 
        
                   builder = builder.config(key, value) 
        
               # Currently, Koalas is dependent on such join due to 'compute.ops_on_diff_frames' 
        
               # configuration. This is needed with Spark 3.0+. 
        
               builder.config("spark.sql.analyzer.failAmbiguousSelfJoin", False) 
        
               if LooseVersion(pyspark.__version__) >= LooseVersion("3.0.1") and is_testing(): 
        
                   builder.config("spark.executor.allowSparkContext", False) 
        
               session = builder.getOrCreate()

So, I'm a little confused about ks.read_table, where does it load tables from?
Maybe link to Spark-warehouse?

	def default_session(conf=None):
	if conf is None:
	conf = dict()
	should_use_legacy_ipc = False
	if LooseVersion(pyarrow.__version__) >= LooseVersion("0.15") and LooseVersion(
	pyspark.__version__
	) < LooseVersion("3.0"):
	conf["spark.executorEnv.ARROW_PRE_0_15_IPC_FORMAT"] = "1"
	conf["spark.yarn.appMasterEnv.ARROW_PRE_0_15_IPC_FORMAT"] = "1"
	conf["spark.mesos.driverEnv.ARROW_PRE_0_15_IPC_FORMAT"] = "1"
	conf["spark.kubernetes.driverEnv.ARROW_PRE_0_15_IPC_FORMAT"] = "1"
	should_use_legacy_ipc = True

	builder = spark.SparkSession.builder.appName("Koalas")
	for key, value in conf.items():
	builder = builder.config(key, value)
	# Currently, Koalas is dependent on such join due to 'compute.ops_on_diff_frames'
	# configuration. This is needed with Spark 3.0+.
	builder.config("spark.sql.analyzer.failAmbiguousSelfJoin", False)

	if LooseVersion(pyspark.__version__) >= LooseVersion("3.0.1") and is_testing():
	builder.config("spark.executor.allowSparkContext", False)

	session = builder.getOrCreate()

Does Koalas support reading hive table by default?