getzola / zola

A fast static site generator in a single binary with everything built-in. https://www.getzola.org

Home Page:https://www.getzola.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The integrated elasticlunr search can't work with language zh

t3link opened this issue · comments

Bug Report

Hello, I'm a Chinese user and fresh to Zola. According to the document, I build a Zola-zh from source (tag v0.13.0) locally to enable Chinese index. And I'm using a theme DeepThought which already support search function. However the javascript code runs error.

Uncaught Error: Cannot load un-registered function: trimmer-zh
    at elasticlunr.min.js:10
    at Array.forEach (<anonymous>)
    at Function.t.Pipeline.load (elasticlunr.min.js:10)
    at Function.t.Index.load (elasticlunr.min.js:10)
    at search (site.js:140)
    at HTMLInputElement.<anonymous> (site.js:212)
    at HTMLInputElement.dispatch (jquery-3.5.1.min.js:2)
    at HTMLInputElement.v.handle (jquery-3.5.1.min.js:2)

There are some bad things for elasticlunr.

  1. don't support Chinese and no longer be maintained.
  2. the offline search result seems kind of inaccurate...
  3. build time cost much. I've tested 10,000 short Chinese articles locally and the build progress even hanged.

Is there any idea to support other search service like algolia? Zola can just generate a simple structured json data filled with page title, content, tags and so on. Then on the user side, we import the json data to suitable search service using a deploy script or manually...

Thanks :d

Chinese/Japanese search has been disabled by default in https://github.com/getzola/zola/blob/master/components/search/Cargo.toml#L8 because it inflated the binary size a lot (I guess shipping a dictionary or something? The binary went up to something like 90MB when they were included).

Elasticlunr is definitely not the best solution when you have a lot of data. Does Algolia have some defined data format they need?

@Keats https://www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/
Here is an example. =v=

[
    {
        "objectID":"a unique string id",
        "title":"${page.title}",
        "description":"${page.description}",
        "content":"${page.content}",
        "created":"${page.date}",
        "updated":"${page.updated}",
        "categories":"${page.taxonomies.categories}",
        "tags":"${page.taxonomies.tags}",
        "permalink":"${page.permalink}"
    }
]
  • objectID: used to create or update index.If null, Algolia server will auto generate one. If existed, Algolia will do update.
  • title description content : for searching.
  • created updated categories tags : for filtering or customizing ranking.
  • permalink : for displaying
  • date attributes should be formatted to unix timestamp.
  • json array is for bulk request.

I have the same issue with italian language:

Uncaught Error: Cannot load un-registered function: trimmer-it
    load http://127.0.0.1:1111/elasticlunr.min.js:10
    load http://127.0.0.1:1111/elasticlunr.min.js:10
    load http://127.0.0.1:1111/elasticlunr.min.js:10
    initSearch http://127.0.0.1:1111/assets/js/search.js:145

I really don't get why this is happening. I tried with es and it
gives me the same error for the equivalent missing function: trimmer-es.
I tried with a tiny site and a large one, truncating the content with full content
and without the content. It just complain about this unregistered function.

Any ideas?
Thanks a lot for your work

Oh, I got it: it's all explained in the official documentation of elasticlunr. Language support requires two more js files as explained here.

Anyone can add the files this way:

<script src="{{ get_url(path='assets/js/lunr.stemmer.support.js', trailing_slash=false) | safe }}"></script>
<script src="{{ get_url(path='assets/js/lunr.$LANG.js', trailing_slash=false) | safe }}"></script>

Better deferring:

<script defer src="{{ get_url(path='assets/js/lunr.stemmer.support.js', trailing_slash=false) | safe }}"></script>
<script defer src="{{ get_url(path='assets/js/lunr.$LANG.js', trailing_slash=false) | safe }}"></script>

We could add this to zola docs, maybe.

Hmm I'm not using those in the docs? https://github.com/getzola/zola/blob/master/docs/templates/index.html#L105-L107

Ah it looks required for languages other than English?

Yes, those files are are mandatory for other languages. I extended my theme like this:

{% if config.build_search_index %}
<script src="{{ get_url(path='assets/js/search.js', trailing_slash=false) | safe }}"></script>
<script defer src="{{ get_url(path='elasticlunr.min.js', trailing_slash=false) | safe }}"></script>
{%- if config.default_language or config.default_language != "en" -%}
{%- set search_index_file = "search_index." ~ config.default_language ~ ".js" %}
{%- set lunr_lang_file = "assets/js/lunr-languages/lunr." ~ config.default_language ~ ".min.js" -%}
<script defer src="{{ get_url(path='assets/js/lunr-languages/lunr.stemmer.support.min.js', trailing_slash=false) | safe }}"></script>
<script defer src="{{ get_url(path=lunr_lang_file, trailing_slash=false) | safe }}"></script>
<script defer src="{{ get_url(path=search_index_file, trailing_slash=false) | safe }}"></script>
{%- else -%}
<script defer src="{{ get_url(path='search_index.en.js', trailing_slash=false) | safe }}"></script>
{%- endif -%}
{% endif %}
{% endmacro script %}

Here the commit with the new assets for my theme. I can make a pull request with all the useful changes.

@mr-chrome can you do a PR fo the docs?

commented

I've the same issue and fix that in my blog.

use this one , https://blog.gaxxx.me/js/lunr.zh.js , modified from MihaiValentin's version

remember to add lunr.stemmer.support.js as well, something like this

 <script src="https://blog.gaxxx.me/js/elasticlunr.min.js"></script>
  <script src="https://blog.gaxxx.me/search_index.zh.js"></script>
 <script src="https://blog.gaxxx.me/js/lunr.stemmer.support.js"></script>
 <script src="https://blog.gaxxx.me/js/lunr.zh.js"></script>
 <script src="https://blog.gaxxx.me/js/search.js"></script>

Hope this could help you out.

Spend an hour just to decide to use Zola or Docosaurus. The only reason not to choose Zola is the docsearch support.

I saw some projects also had similar issue codewars/docs#248

Will Zola support Algolia Docsearch?

I would take a PR to emit the search index data in the algolia format instead of elasticlunr

Can someone interested open either a new issue or a PR for the Algolia support? I'll close that one once the docs are updated to fix the original issue.

Can someone interested open either a new issue or a PR for the Algolia support? I'll close that one once the docs are updated to fix the original issue.

#1745