JuliaDatabases / JDBC.jl

Julia interface to Java database drivers

Repository from Github https://github.comJuliaDatabases/JDBC.jlRepository from Github https://github.comJuliaDatabases/JDBC.jl

JDBC.load() hangs on load from Netezza data source

metanoid opened this issue · comments

I'm unable to get the results from a sql query on a netezza database to load, the call simply hangs until the julia process is ended.

It's not clear what the problem is because there are no warnings or errors.

To reproduce:

using DataFrames
using JavaCall
JavaCall.addClassPath("C:/JDBC/nzjdbc.jar")
using JDBC
JDBC.init()
classforname("org.netezza.Driver")
hostname = "netezza_system"
port = "5480"
database = "SYSTEM"
username = ENV["netezza_user"];
password = ENV["netezza_pwd"];
connectionstring = "jdbc:netezza://$(hostname):$(port)/$(database);user=$(username);password=$(password)";
myquery = "SELECT *   FROM EXAMPLE_TABLE  LIMIT 1000"

cnxn = JDBC.Connection(connectionstring)
csr = cursor(cnxn)
execute!(csr, myquery)
src = JDBC.Source(csr)
df = JDBC.load(DataFrame, src)

The final JDBC.load call above never returns a result.

Is there something I can do to diagnose the issue?

julia> versioninfo()
Julia Version 1.0.5
Commit 3af96bcefc (2019-09-09 19:06 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4

Package versions:

  [a93c6f00] DataFrames v0.18.4
  [6042db11] JDBC v0.5.0
  [494afd89] JavaCall v0.7.2

I think the issue is this:

JDBC.load(DataFrame, src) calls Tables.matrializer(DataFrame)(src) (in JDBC :: tables.jl)
which calls columntable(src) (from Tables :: namedtuples.jl)
which calls columns(src) (from Tables :: fallbacks.jl)

And it is this function which hangs.

Whereas this is able to print all the data:

for i in rows(csr)
    println(i)
end

Yeah, I was about to say, can you just use the low level functions

stmt = createStatement(conn)
rs = executeQuery(stmt, myquery)
for r in rs
     println(getInt(r, 1),  getString(r,"FIELDNAME")) ....
end

Yeah, I was about to say, can you just use the low level functions

stmt = createStatement(conn)
rs = executeQuery(stmt, myquery)
for r in rs
     println(getInt(r, 1),  getString(r,"FIELDNAME")) ....
end

Yes, I can, that works, I just need to figure out how to get the output into DataFrames form now

The below seems to work for my needs:

df_schema = Tables.schema(src)
df = DataFrame(collect(df_schema.types), collect(df_schema.names))
for i in rows(csr)
    push!(df, i)
end