Scanner returning incorrect result on re-runs

Question

Scanner returning incorrect result on re-runs

pushkarnagpal opened this issue 6 years ago · comments

Hi, I was working with the scanner.
When I run this code for the first time on the server, it gives me the correct number of records. But every re-run after that, the number of records read changes.

Also, would appreciate if you could provide an example to use scanner.get for the same instead.

var latest = ''
var callback = function (err) {
    if (err)
        console.error(err);
    var scanner = client
        .table(tblName)
        .scan({
            batch: '1000',
            maxVersions: 1,
            startRow: latest,
            filter: {
                "type": "FirstKeyOnlyFilter"
            }
        },
            function (err, cells) {
                if (err)
                    console.error(err);
                else{
                    if (cells.length > 1) {
                        cells.splice(0, 1);
                        records = records.concat(cells);
                        latest = cells[cells.length - 1]['key']
                        console.log(cells.length);
                        scanner.delete(callback);
                    }
                    else {
                        console.log(records.length, "length");
                        scanner.delete(function (err) {
                            if (err)
                                console.error(err);
                        })
                    }
                }
            });
}

var scanner = client
    .table(tblName)
    .scan({
        batch: 1000,
        filter: {
            "type": "FirstKeyOnlyFilter"
        }
    }, function (err, cells) {
        if (err)
            console.error(err);
        else {
            if (cells.length > 1) {
                records = records.concat(cells);
                latest = cells[cells.length - 1]['key'];
                console.log(cells.length);
                scanner.delete(callback);
            }
        }
    });

Worms David · Answer 1 · Mon Apr 09 2018 16:17:51 GMT+0800 (China Standard Time)

Please simplify your code in such a way that is is easy to read and illustrate your problem in an obvious manner. If you have one problem, then there should be only one example generating the issue without any additional feature like calling scanner.delete, if you have two problems, then create two issues.

You will find plenty of example in the test/scanner.coffee test. As you can see, there are two API, using a callback function as a second argument (simple but not scalable because the all dataset must feet in memory) or the stream API (more complex).

pushkarnagpal · Answer 2 · Mon Apr 09 2018 17:29:10 GMT+0800 (China Standard Time)

Hi, I've simplified the code.
To explain - First scanner generated gives me the first set of records, followed by the callback and it then the scanner is in a recursive loop until we reach end of the hbase table size.
I had a look at the test cases. Didn't solve my issue.
Following is the code:-

var scanner = client.table(tblName).scan({}, function (err, cells) {
        if (cells.length > 1) {
                count+ = cells.length-1;
                latest = cells[cells.length - 1]['key'];
                scanner.delete(callback);
        }
    });

var callback = function (err) {
        var scanner = client.table(tblName).scan({startRow : latest}, function (err, cells){
                    if (cells.length > 1) {
                        count+ = cells.length-1;
                        latest = cells[cells.length - 1]['key']
                        console.log(cells.length);
                        scanner.delete(callback);
                    else {
                        //End of table scan.
                        console.log(count);
                        scanner.delete(function (err) { if (err) console.error(err);)  }
                }
            });

pushkarnagpal · Answer 3 · Mon Apr 09 2018 20:15:31 GMT+0800 (China Standard Time)

com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
java.lang.NullPointerException

This is the error I found in hbase-rest logs.

Worms David · Answer 4 · Mon Apr 09 2018 23:37:45 GMT+0800 (China Standard Time)

I believe you should use the stream API and let it run the all scanner instead of trying to throttle the query by yourself.

pushkarnagpal · Answer 5 · Wed Apr 11 2018 03:04:40 GMT+0800 (China Standard Time)

How do I implement the stream API to get the next records scanned?

var rows = [];
scanner.on('readable', function(){
var chunk;
    //_results = [];
    while (chunk = scanner.read()) {
        rows.push(chunk);
    }        
    });
scanner.on('error', function(err) {
    console.error(err);
    });
scanner.on('end', function(){
    console.log(rows.length); 
    })

Worms David · Answer 6 · Wed Apr 11 2018 03:07:42 GMT+0800 (China Standard Time)

This seems correct, what's your problem ?

pushkarnagpal · Answer 7 · Wed Apr 11 2018 03:28:37 GMT+0800 (China Standard Time)

This stops after 1000 rows. How do I get records after that.

This open issue had the last comment showing the same problem

Worms David · Answer 8 · Fri Apr 13 2018 01:50:31 GMT+0800 (China Standard Time)

Could you please try again and let me know, I believe scan was broken and I made a few changes in latest version.

Worms David · Answer 9 · Wed Aug 29 2018 06:55:21 GMT+0800 (China Standard Time)

Closing since there was no feedback and it seems to have been solved.