adaltas / node-hbase

Asynchronous HBase client for NodeJs using REST

Home Page:https://hbase.js.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scanner returning incorrect result on re-runs

pushkarnagpal opened this issue · comments

Hi, I was working with the scanner.
When I run this code for the first time on the server, it gives me the correct number of records. But every re-run after that, the number of records read changes.

Also, would appreciate if you could provide an example to use scanner.get for the same instead.

var latest = ''
var callback = function (err) {
    if (err)
        console.error(err);
    var scanner = client
        .table(tblName)
        .scan({
            batch: '1000',
            maxVersions: 1,
            startRow: latest,
            filter: {
                "type": "FirstKeyOnlyFilter"
            }
        },
            function (err, cells) {
                if (err)
                    console.error(err);
                else{
                    if (cells.length > 1) {
                        cells.splice(0, 1);
                        records = records.concat(cells);
                        latest = cells[cells.length - 1]['key']
                        console.log(cells.length);
                        scanner.delete(callback);
                    }
                    else {
                        console.log(records.length, "length");
                        scanner.delete(function (err) {
                            if (err)
                                console.error(err);
                        })
                    }
                }
            });
}

var scanner = client
    .table(tblName)
    .scan({
        batch: 1000,
        filter: {
            "type": "FirstKeyOnlyFilter"
        }
    }, function (err, cells) {
        if (err)
            console.error(err);
        else {
            if (cells.length > 1) {
                records = records.concat(cells);
                latest = cells[cells.length - 1]['key'];
                console.log(cells.length);
                scanner.delete(callback);
            }
        }
    });

Please simplify your code in such a way that is is easy to read and illustrate your problem in an obvious manner. If you have one problem, then there should be only one example generating the issue without any additional feature like calling scanner.delete, if you have two problems, then create two issues.

You will find plenty of example in the test/scanner.coffee test. As you can see, there are two API, using a callback function as a second argument (simple but not scalable because the all dataset must feet in memory) or the stream API (more complex).

Hi, I've simplified the code.
To explain - First scanner generated gives me the first set of records, followed by the callback and it then the scanner is in a recursive loop until we reach end of the hbase table size.
I had a look at the test cases. Didn't solve my issue.
Following is the code:-

var scanner = client.table(tblName).scan({}, function (err, cells) {
        if (cells.length > 1) {
                count+ = cells.length-1;
                latest = cells[cells.length - 1]['key'];
                scanner.delete(callback);
        }
    });

var callback = function (err) {
        var scanner = client.table(tblName).scan({startRow : latest}, function (err, cells){
                    if (cells.length > 1) {
                        count+ = cells.length-1;
                        latest = cells[cells.length - 1]['key']
                        console.log(cells.length);
                        scanner.delete(callback);
                    else {
                        //End of table scan.
                        console.log(count);
                        scanner.delete(function (err) { if (err) console.error(err);)  }
                }
            });
com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
java.lang.NullPointerException

This is the error I found in hbase-rest logs.

I believe you should use the stream API and let it run the all scanner instead of trying to throttle the query by yourself.

How do I implement the stream API to get the next records scanned?

var rows = [];
scanner.on('readable', function(){
var chunk;
    //_results = [];
    while (chunk = scanner.read()) {
        rows.push(chunk);
    }        
    });
scanner.on('error', function(err) {
    console.error(err);
    });
scanner.on('end', function(){
    console.log(rows.length); 
    })

This seems correct, what's your problem ?

This stops after 1000 rows. How do I get records after that.

This open issue had the last comment showing the same problem

Could you please try again and let me know, I believe scan was broken and I made a few changes in latest version.

Closing since there was no feedback and it seems to have been solved.