sul-dlss / vt-arclight

An Arclight-based discovery application for materials from the Virtual Tribunals project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it expected that Goering, Goring, and Göring return different numbers of results in search?

marlo-longley opened this issue · comments

Most likely. The SynonymFilterFactory only operates on the text_en type (*_te*) fields. Any fields that are strings (*_s*) don't have synonyms. We are quering on these fields: https://github.com/sul-dlss/vt-arclight/blob/main/solr/conf/solrconfig.xml#L88-L99 You can do a search in solr with the debug field passed and it will show you how it makes matches.

See all these fields are copied into the text field

<copyField source="normalized_title_ssm" dest="text" />
<copyField source="places_ssim" dest="text" />
<copyField source="names_ssim" dest="text" />
<copyField source="access_subjects_ssim" dest="text" />
<!-- grab the searchable notes -->
<copyField source="abstract_tesim" dest="text" />
<copyField source="accessrestricct_tesim" dest="text" />
<copyField source="accruals_tesim" dest="text" />
<copyField source="acqinfo_tesim" dest="text" />
<copyField source="altformavail_tesim" dest="text" />
<copyField source="appraisal_tesim" dest="text" />
<copyField source="arrangement_tesim" dest="text" />
<copyField source="bibliography_tesim" dest="text" />
<copyField source="bioghist_tesim" dest="text" />
<copyField source="custodhist_tesim" dest="text" />
<copyField source="did_note_tesim" dest="text" />
<copyField source="fileplan_tesim" dest="text" />
<copyField source="materialspec_tesim" dest="text" />
<copyField source="note_tesim" dest="text" />
<copyField source="odd_tesim" dest="text" />
<copyField source="originalsloc_tesim" dest="text" />
<copyField source="physdesc_tesim" dest="text" />
<copyField source="physloc_tesim" dest="text" />
<copyField source="phystech_tesim" dest="text" />
<copyField source="processinfo_tesim" dest="text" />
<copyField source="relatedmaterial_tesim" dest="text" />
<copyField source="scopecontent_tesim" dest="text" />
<copyField source="separatedmaterial_tesim" dest="text" />
<copyField source="userestrict_tesim" dest="text" />
<!-- grab structured data that's important -->
<copyField source="unitid_ssm" dest="text" />
which is a text field
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.KeywordRepeatFilterFactory" />
<filter class="solr.ICUFoldingFilterFactory" />
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>