ufukomer / node-impala

Node Client for Impala using Apache Thrift

Home Page:https://www.npmjs.com/package/node-impala

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

query returns one row only, my code style is es5

jhsea3do opened this issue · comments

Hi ufukomer,

Here ais my es5 codes, it always return the first row of results
i think it may caused by the 'pending' state, but how can i get all 50 rows after the pending finished?


> var sql = "select system_name from itm.system group by system_name";
> var client = require('node-impala').createClient({"host": "hadoop3"});
> client.query(sql, function(err, data){ console.log('err', err, 'data', data) })
{ state: 'pending' }
> err null data [ [ 'CASCECUP01:KUX' ],
[ { name: 'system_name', type: 'string', comment: '' } ] ]


> client.resultType = 'map'
'map'
> client.query(sql, function(err, data){ console.log('err', err, 'data', data) })
{ state: 'pending' }
> err null data Map { 'system_name' => [ 'CASCECUP01:KUX' ] }


> var client = require('node-impala').createClient({"host": "hadoop3", "resultType": 'map'})
undefined
> client.query(sql, function(err, data){ console.log('err', err, 'data', data) })
{ state: 'pending' }
> err null data Map { 'system_name' => [ 'CASCECUP01:KUX' ] }

Please check same query under impala-shell :

[hadoop3:21000] > select system_name from itm.system group by system_name;
Query: select system_name from itm.system group by system_name
+----------------+
| system_name |
+----------------+
| CASCECUP01:KUX |
| ASCECUP14:KUX |
| ASCECSP02:KUX |
| ASCECUP15:KUX |
| ASCECMP01:KUX |
| ASCECUP10:KUX |
| ASCECUP09:KUX |
| ESCECUP03:KUX |
| DBCECUP02:KUX |
| ASCECUP11:KUX |
| ASCECUP07:KUX |
| DBCECMP02:KUX |
| ASCECUP08:KUX |
| ASCECSP01:KUX |
| FSCECUP01:KUX |
| CASCECMP01:KUX |
| ASCECUP13:KUX |
| ASCECMP02:KUX |
| ASCECUP05:KUX |
| ASCECUP20:KUX |
| ASCECUP02:KUX |
| ASCECUP03:KUX |
| ASCECUP04:KUX |
| ESCECUP01:KUX |
| ESCECUP06:KUX |
| MQCECUP01:KUX |
| MQCECUP06:KUX |
| CESCECUP02:KUX |
| ASCECUP18:KUX |
| ASCECUP06:KUX |
| ASCECUP01:KUX |
| CASCECUP02:KUX |
| MQCECUP03:KUX |
| MQCECUP04:KUX |
| MQCECUP05:KUX |
| CESCECUP01:KUX |
| CASCECUP04:KUX |
| ESCECUP02:KUX |
| ESCECUP05:KUX |
| ASCECUP16:KUX |
| CASCECUP03:KUX |
| DBCECUP01:KUX |
| ASCECUP12:KUX |
| DBCECMP01:KUX |
| ASCECUP17:KUX |
| CFSCECUP01:KUX |
| FSCECUP02:KUX |
| ESCECUP04:KUX |
| MQCECUP02:KUX |
| ASCECUP19:KUX |
+----------------+
Fetched 50 row(s) in 0.78s

@jhsea3do As I understand, this is the major problem of Beeswax Service. Each INSERT into HDFS creates a new data file. Unfortunately, Beeswax reads only one of them, sometimes all of them. I have inserted two sample data into sample_08 database, which means two separate data files:

sample_08

sample_08

But Beeswax reads only one of them:

// node-impala: output of query (SELECT * FROM sample_08)
[ { code: '10-0000',
    description: 'Yow',
    total_emp: '1112',
    salary: '2000' } ]

Thus, that issue never happens as long as we keep all data in one data file. But of course, that is not the solution. It seems to me that the only suitable solution is using HiveServer2 rather than Beeswax. Although, that is not straightforward way since I should implement a sasl transport something similar to its Java and Python versions.

I would be glad to hear alternative solutions if you have an idea.

Hi! Do you have other suggestions for using Impala with Node.js?

@tiejian create a command line app then use impala-shell via this app. I have never tried in NodeJS but there are many in GitHub, e.g. commander.js, cli. I'm not sure if these tools satisfy your need, so make your own search.

If your impala host is remote, you would probably need a socket (e.g. using socket io) to connect from command line app that runs in your local machine to command-line app that runs in remote. Hence, you could use impala-shell in this way, probably.

If I couldn't explain well, please don't hesitate to ask details.

Hi,

Hitting that issue as well. This is a major problem and make the entire library not usable...

How can we do to help and fix it ?

Regards

@kwent we should make this library use HiveServer2 in a way that is similar to its python client as I mentioned in the comment above.

commented

Hello Hello everybody,I encountered this problem recently. And maybe I found a way of bypassing this problem!But I am not sure,So I need you to verify it. and let's talk about the reason。
Similar to you,when I query SELECT * FROM atable limit 10 i got 10 rows;

[
{"a":"a","b":2},
{"a":"a","b":2},
{"a":"a","b":2},
{"a":"a","b":2},
{"a":"a","b":2},
{"a":"a","b":2},
{"a":"b","b":2},
{"a":"b","b":2},
{"a":"b","b":2},
{"a":"b","b":2},
]

While I query SELECT a,count(*) FROM atable group BY a i got only 1 row! This is where the problem lies!

[
  {
    "a": "a",
    "count(*)": "6"
  }
]

After my research I found a way to get the expected result!
I query SELECT a,count(*) FROM atable group BY a order by a!!! the order by is the key.

[
  {
    "a": "a",
    "count(*)": "6"
  },
  {
    "a": "b",
    "count(*)": "4"
  }
]

Try it!I'm Looking forward to your feedback!
@ufukomer