tjake / Solandra

Solandra = Solr + Cassandra

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Query finding matching documents but not returning any results

pcoleman opened this issue · comments

We are using Solandra on a 16 core server, currently with 18 shards. When we run a query the response has 0 results, but the number of documents found is > 0. If we run a query using isShard=true we get results back for a single shard.

These are the settings from solandra.properties

solandra.maximum.docs.per.shard = 524288

solandra.index.id.reserve.size = 32768

solandra.shards.at.once = 8

Do you know what might be causing this? Are we missing an obvious parameter?

Thanks

Anything in the logs?

No errors in the logs, here are the last two lines:

INFO 14:12:23,702 [datafiniti] webapp=/solandra path=/select params={fl=name,key&q=city:houston&ids=[us/de/houston/2419williamsvillerd],[us/de/houston/4074williamsvillerd],[us/de/houston/1155pinest],[us/de/houston/77millst],[us/de/houston/1098messobovrd],[us/de/houston/432schoolst],[us/de/houston/234blairspondrd],[us/de/houston/567thistlewoodrd],[us/de/houston/3880williamsvillerd],[us/de/houston/4075gunandrodclubrd]&isShard=true&wt=javabin&version=2} status=0 QTime=1

INFO 14:12:23,703 [datafiniti] webapp=/solandra path=/select params={fl=name&wt=javabin&q=city:houston&version=2} status=0 QTime=1781

An update on the issue:

I have downloaded and installed the most recent Solandra build on a test server.

Server specs:
8 cores
32 GB of memory (though we are only allocating 16GB for Solandra)

Using the default settings in Solandra Properties, we added in 2,000,000 documents to ensure there were two shards on the same machine. When we query against this data set we are seeing the same results. Solandra is able to find all the matching documents, but still does not return any results. I have copied the logs with Debugging turned on.

For a query that has not been cached:

DEBUG 10:28:55,004 core: df
DEBUG 10:28:55,005 Adding shard(df): 10.1.10.200:8983/solandra/df0
DEBUG 10:28:55,005 Adding shard(df): 10.1.10.200:8983/solandra/df
1
DEBUG 10:28:55,014 Fetching 0 Docs
INFO 10:28:55,015 [df] webapp=/solandra path=/select params={fl=key,score&start=0&q=province:az&isShard=true&wt=javabin&fsv=true&rows=10&version=2} hits=0 status=0 QTime=3
INFO 10:28:55,821 GC for ParNew: 258 ms, 586012000 reclaimed leaving 2122387984 used; max is 16955473920
DEBUG 10:28:58,034 Fetching 10 Docs
DEBUG 10:28:58,035 Going to bulk load 10 documents
DEBUG 10:28:58,099 Document read took: 63ms
INFO 10:28:58,099 [df] webapp=/solandra path=/select params={fl=key,score&start=0&q=province:az&isShard=true&wt=javabin&fsv=true&rows=10&version=2} hits=99470 status=0 QTime=3087
DEBUG 10:28:58,101 Document read took: 1ms
DEBUG 10:28:58,102 Document read took: 1ms
DEBUG 10:28:58,104 Document read took: 1ms
DEBUG 10:28:58,105 Document read took: 1ms
DEBUG 10:28:58,107 Document read took: 2ms
DEBUG 10:28:58,108 Document read took: 1ms
DEBUG 10:28:58,109 Document read took: 1ms
DEBUG 10:28:58,110 Document read took: 1ms
DEBUG 10:28:58,112 Document read took: 1ms
DEBUG 10:28:58,113 Document read took: 1ms
DEBUG 10:28:58,118 Fetching 0 Docs
INFO 10:28:58,118 [df] webapp=/solandra path=/select params={isShard=true&wt=javabin&q=province:az&ids=[us/az/yuma/1152s4thave],[us/az/tempe/208sriverdr],[us/az/mundspark/475pinewoodblvd],[us/az/phoenix/2338wstellaln],[us/az/tucson/3341wwildwooddr],[us/az/surprise/15128wbellrd],[us/az/phoenix/3222egeorgiaave],[us/az/lakehavasucity/2250catamarandr],[us/az/huachucacity/264shuachucablvd],[us/az/tucson/6161sparkave]&version=2} status=0 QTime=1
INFO 10:28:58,119 [df] webapp=/solandra path=/select params={wt=javabin&q=province:az&version=2} status=0 QTime=3115

For a query that has been cached:

DEBUG 10:27:36,350 core: df
INFO 10:27:36,351 ShardInfo for df has expired
INFO 10:27:36,353 Found reserved shard1(106758077800188110322537822484278066430):178410 TO 180224
DEBUG 10:27:36,353 Adding shard(df): 10.1.10.200:8983/solandra/df0
DEBUG 10:27:36,353 Adding shard(df): 10.1.10.200:8983/solandra/df
1
DEBUG 10:27:36,359 Fetching 0 Docs
INFO 10:27:36,360 [df] webapp=/solandra path=/select params={fl=key,score&start=0&q=province:ak&isShard=true&wt=javabin&fsv=true&rows=10&version=2} hits=0 status=0 QTime=2
DEBUG 10:27:36,362 Fetching 10 Docs
DEBUG 10:27:36,363 Found doc in cache
INFO 10:27:36,363 [df] webapp=/solandra path=/select params={fl=key,score&start=0&q=province:ak&isShard=true&wt=javabin&fsv=true&rows=10&version=2} hits=14707 status=0 QTime=5
DEBUG 10:27:36,363 Found doc in cache
DEBUG 10:27:36,363 Found doc in cache
DEBUG 10:27:36,363 Found doc in cache
DEBUG 10:27:36,364 Found doc in cache
DEBUG 10:27:36,364 Found doc in cache
DEBUG 10:27:36,364 Found doc in cache
DEBUG 10:27:36,364 Found doc in cache
DEBUG 10:27:36,364 Found doc in cache
DEBUG 10:27:36,365 Found doc in cache
DEBUG 10:27:36,365 Found doc in cache
DEBUG 10:27:36,369 Fetching 0 Docs
INFO 10:27:36,369 [df] webapp=/solandra path=/select params={isShard=true&wt=javabin&q=province:ak&ids=[us/ak/fairbanks/1483ballainerd],[us/ak/anchorage/4451etudorrd],[us/ak/anchorage/600cordovast],[us/ak/anchorage/6048e6thave],[us/ak/anchorage/940tyonekdr],[us/ak/fairbanks/3800universityaves],[us/ak/kenai/47189sherwoodcir],[us/ak/anchorage/12801oldsewardhwy],[us/ak/anchorage/8400raintreecir],[us/ak/juneau/9150skywoodln]&version=2} status=0 QTime=1
INFO 10:27:36,370 [df] webapp=/solandra path=/select params={wt=javabin&q=province:ak&version=2} status=0 QTime=20

In case it was due to the way we were adding the documents, here is the log for a single write/update:

INFO 10:33:14,666 {add=[us/ak/anchorage/429industrialway]} 0 8
INFO 10:33:14,666 [df] webapp=/solandra path=/update params={wt=javabin&version=2} status=0 QTime=8
DEBUG 10:33:14,683 update for document 606298
DEBUG 10:33:14,683 Adding 606298 to df0
DEBUG 10:33:14,685 Deleted all terms for: 606298
DEBUG 10:33:14,685 df
0 - firstTerm: 142390502986727170797762286641249888713���key
DEBUG 10:33:14,685 df0 - firstTerm: 142390502986727170797762286641249888713���category
DEBUG 10:33:14,686 df
0 - firstTerm: 142390502986727170797762286641249888713���category
DEBUG 10:33:14,686 df0 - firstTerm: 142390502986727170797762286641249888713���category
DEBUG 10:33:14,686 df
0 - firstTerm: 142390502986727170797762286641249888713���category
DEBUG 10:33:14,687 df0 - firstTerm: 142390502986727170797762286641249888713���category
DEBUG 10:33:14,687 df
0 - firstTerm: 142390502986727170797762286641249888713���source
DEBUG 10:33:14,687 df0 - firstTerm: 142390502986727170797762286641249888713���dateAdded
DEBUG 10:33:14,687 df
0 - firstTerm: 142390502986727170797762286641249888713���dateUpdated
DEBUG 10:33:14,687 df0 - firstTerm: 142390502986727170797762286641249888713���type
DEBUG 10:33:14,687 df
0 - firstTerm: 142390502986727170797762286641249888713���encoding
DEBUG 10:33:14,688 df0 - firstTerm: 142390502986727170797762286641249888713���postalcode
DEBUG 10:33:14,688 df
0 - firstTerm: 142390502986727170797762286641249888713���sic
DEBUG 10:33:14,688 df0 - firstTerm: 142390502986727170797762286641249888713���phone
DEBUG 10:33:14,688 df
0 - firstTerm: 142390502986727170797762286641249888713���long
DEBUG 10:33:14,688 df0 - firstTerm: 142390502986727170797762286641249888713���city
DEBUG 10:33:14,688 df
0 - firstTerm: 142390502986727170797762286641249888713���country
DEBUG 10:33:14,689 df0 - firstTerm: 142390502986727170797762286641249888713���address
DEBUG 10:33:14,689 df
0 - firstTerm: 142390502986727170797762286641249888713���name
DEBUG 10:33:14,689 df0 - firstTerm: 142390502986727170797762286641249888713���province
DEBUG 10:33:14,689 df
0 - firstTerm: 142390502986727170797762286641249888713���lat

Other than this (frustrating) bug, we really love Solandra. Hopefully you can give us some guidance on where to go from here.

Thanks

When we search for an individual id (one of the ones listed in the logs) we get zero results as well. The problem is that we have tried using the same setup with a single shard and have no issue with getting the results back, and I have checked to make sure that we are storing the id/key field.

Here is the log for querying against one shard, when there is only 400 documents stored.

DEBUG 12:52:33,422 core: datafiniti
DEBUG 12:52:33,425 Flushed cache: datafiniti~0
DEBUG 12:52:33,458 Document read took: 2ms
DEBUG 12:52:33,459 Document read took: 1ms
DEBUG 12:52:33,460 Document read took: 1ms
DEBUG 12:52:33,461 Document read took: 1ms
DEBUG 12:52:33,462 Document read took: 1ms
DEBUG 12:52:33,463 Document read took: 1ms
DEBUG 12:52:33,464 Document read took: 1ms
DEBUG 12:52:33,465 Document read took: 1ms
DEBUG 12:52:33,466 Document read took: 1ms
DEBUG 12:52:33,466 Document read took: 0ms
DEBUG 12:52:33,467 Fetching 10 Docs
DEBUG 12:52:33,468 Found doc in cache
INFO 12:52:33,468 [datafiniti] webapp=/solandra path=/select params={wt=javabin&q=province:ak&version=2} hits=388 status=0 QTime=91
DEBUG 12:52:33,469 Found doc in cache
DEBUG 12:52:33,470 Found doc in cache
DEBUG 12:52:33,472 Found doc in cache
DEBUG 12:52:33,473 Found doc in cache
DEBUG 12:52:33,473 Found doc in cache
DEBUG 12:52:33,473 Found doc in cache
DEBUG 12:52:33,473 Found doc in cache
DEBUG 12:52:33,474 Found doc in cache
DEBUG 12:52:33,474 Found doc in cache
DEBUG 12:52:33,474 Found doc in cache

This is very odd. Do you get the same problem with a small number of docs?

Can you reproduce with a clustered version of the reuters demo?

We have tried this with only 400 documents with a shard size of 256 and reserved index of 16. The same issue was present. I am running the reuters demo right now. I will update once it finishes.

Could you supply your schema as well?

Here is my schema:

 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 <schema name="df" version="1.0">
        <types>
                <fieldType class="solr.TrieIntField" name="tint" omitNorms="true" positionIncrementGap="0" precisionStep="8"/>
                <fieldType class="solr.SortableDoubleField" name="sdouble" omitNorms="true"/>
                <fieldType class="solr.TextField" name="text">
                        <analyzer>
                                <tokenizer class="solr.StandardTokenizerFactory"/>
                        </analyzer>
                </fieldType>
                <fieldType class="solr.TextField" name="icu_text">
                        <analyzer>
                                <tokenizer class="solr.ICUTokenizerFactory"/>
                        </analyzer>
                </fieldType>
                <fieldType class="solr.TextField" name="name">
                        <analyzer>
                                <tokenizer class="solr.StandardTokenizerFactory"/>
                                <filter class="solr.SynonymFilterFactory" expand="false" ignoreCase="true" synonyms="nicknames.txt"/>
                        </analyzer>
                </fieldType>

                <fieldType class="solr.StrField" name="string"/>
                <fieldType class="solr.SortableIntField" name="sint" omitNorms="true"/>
                <fieldType class="solr.IntField" name="int" omitNorms="true"/>
                <fieldType class="solr.DoubleField" name="double" omitNorms="true"/>
        </types>
        <fields>
                <field indexed="true" name="key" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="long" omitTermFreqAndPositions="true" stored="true" type="sdouble"/>
                <field indexed="true" name="lat" omitTermFreqAndPositions="true" stored="true" type="sdouble"/>
                <field indexed="true" name="city" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="province" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="country" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="secondary" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="address" omitTermFreqAndPositions="true" stored="true" type="icu_text"/>
                <field indexed="true" name="name" omitTermFreqAndPositions="true" stored="true" type="icu_text"/>
                <field indexed="true" name="organization" stored="true" type="icu_text"/>
                <field indexed="true" name="encoding" multiValued="false" omittermfreqandpositions="true" stored="true" type="string"/>
                <field indexed="true" name="type" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="subtype" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="website" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="phone" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="postalcode" omitTermFreqAndPositions="true" stored="true" type="sint"/>
                <field indexed="true" name="source" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="dateAdded" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" name="dateUpdated" omitTermFreqAndPositions="true" stored="true" type="string"/>
                <field indexed="true" multiValued="true" name="category" omitTermFreqAndPositions="true" stored="true" type="icu_text"/>
                <field indexed="true" multiValued="true" name="sic" omitTermFreqAndPositions="true" stored="true" type="string"/>
        </fields>
        <uniqueKey>key</uniqueKey>
        <defaultSearchField>category</defaultSearchField>
 </schema>

The Reuters demo worked, so it is most likely something to do with the schema. I have tried removing omitTermFreqAnd Positions, but it didn't affect the results. Could it be the ICUTokenizer, I know its relatively new to Solr.

Thanks for the help!

I was able to get this working using the reuters schema as a template. I am going to close the issue, but I am going to keep looking over the old schema to try to find what exactly caused the issue. If I find it I will update the ticket.