facebook / docusaurus

Easy to maintain open source documentation websites.

Home Page:https://docusaurus.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Output HTML contains NULL chracters in at least CJK languages

tats-u opened this issue · comments

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

Docusarus sometimes contaminate output HTMLs with NULL chracters.
NULL characters confuses some HTML parsers used in some document scraper like https://github.com/meilisearch/docs-scraper. (it uses lxml written in Python)
Also it prevents Windows' copy-and-paste feature from copying the complete source code.

Reproducible demo

No response

Steps to reproduce

curl -LsSf https://docusaurus.io/zh-CN/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'
curl -LsSf https://docusaurus-i18n-staging.netlify.app/ja/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use ut
f8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'
curl -LsSf https://docusaurus.io/ko/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[
\[NULL)/...\1/'

Note

  • rg is ripgrep.
  • Perl is used for trimming of the results.

For your own documents

Write your documents in CJK or possibly other non-latin languages and then do:

npm run build
 rg '\x00' -a -r '[[NULL]]' --color=always -t html build | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'

Note

Built JS files do not seem to be affected. (no NULs are found there)

Expected behavior

No outputs (NULL characters are not found)

Actual behavior

🇨🇳

...res"><span >Markdown 特[[NULL]][[NULL]]性</span></a><meta  content="2"></li><li itemscope=""  itemtype="https://schema.org/ListItem" class="breadcrumbs__item breadcrumbs__item--active"><span class="breadcrumbs__link" >标题和目录</span><meta  content="3"></li></ul></nav><span class="theme-doc-version-badge badge badge--secondary">版本:3.1.1</span><div class="tocCollapsible_BEWm theme-doc-toc-mobile tocMobile_NSfz"><button type="button" class="clean-btn tocCollapsibleButton_IbtT">本页总览</button></div><div class="theme-doc-markdown markdown"><h1>标题和目录</h1>
...ia-label="链接到 示例小节 1 a III" title="链接[[NULL]][[NULL]]到 示例小节 1 a III">​</a></h4>

🇯🇵

...ocusaurus</b></a><nav aria-label="ドキュ[[NULL]]メントのサイドバー" class="menu thin-scrollbar menu_rWGR menuWithAnnouncementBar_Pf08"><ul class="theme-doc-sidebar-menu menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-1 menu__list-item"><a class="menu__link" href="/ja/docs">はじめに</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/category/getting-started">入門編</a><button aria-label="Collapse sidebar category &#x27;入門編&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/installation">インストール</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/configuration">設定</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/playground">プレイグラウンド</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/typescript-support">TypeScript サポート</a></li></ul></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist menu__link--active" href="/ja/docs/category/guides">ガイド</a><button aria-label="Collapse sidebar category &#x27;ガイド&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/creating-pages">Pages</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" tabindex="0" href="/ja/docs/docs-introduction">ドキュメント</a><button aria-label="Expand sidebar category &#x27;ドキュメント&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/blog">ブログ</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist menu__link--active" tabindex="0" href="/ja/docs/markdown-features">マークダウンの機能</a><button aria-label="Collapse sidebar category &#x27;マークダウンの機能&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/react">MDX and React</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/tabs">Tabs</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/code-blocks"> コードブロック</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/admonitions">注意書き</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link menu__link--active" aria-current="page" tabindex="0" href="/ja/docs/markdown-features/toc">見出しと目次</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/assets">Assets</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/links">Markdown links</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/plugins">MDX Plugins</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/math-equations">数式</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/diagrams">図</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/head-metadata">Head metadata</a></li></ul></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/styling-layout">Styling and Layout</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/swizzling">スウィズリング(Swizzling)</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/static-assets">静的アセット</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/search">検索</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/browser-support">ブラウザ対応</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/seo">SEO</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/using-plugins">プラグインの利用</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/deployment">デプロイ</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" tabindex="0" href="/ja/docs/i18n/introduction">国際化 (i18n)</a><button aria-label="Expand sidebar category &#x27;国際化 (i18n)&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/guides/whats-next">What&#x27;s next?</a></li></ul></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/advanced">上級者向けガイド</a><button aria-label="Expand sidebar category &#x27;上級者向けガイド&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/migration">Upgrading</a><button aria-label="Expand sidebar category &#x27;Upgrading&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li></ul></nav><button type="button" title="サ イドバーを隠す" aria-label="サイドバーを隠す" class="button button--secondary button--outline collapseSidebarButton_PUyN"><svg width="20" height="20" aria-hidden="true" class="collapseSidebarButtonIcon_DI0B"><g fill="#7a7a7a"><path d="M9.992 10.023c0 .2-.062.399-.172.547l-4.996 7.492a.982.982 0 01-.828.454H1c-.55 0-1-.453-1-1 0-.2.059-.403.168-.551l4.629-6.942L.168 3.078A.939.939 0 010 2.528c0-.548.45-.997 1-.997h2.996c.352 0 .649.18.828.45L9.82 9.472c.11.148.172.347.172.55zm0 0"></path><path d="M19.98 10.023c0 .2-.058.399-.168.547l-4.996 7.492a.987.987 0 01-.828.454h-3c-.547 0-.996-.453-.996-1 0-.2.059-.403.168-.551l4.625-6.942-4.625-6.945a.939.939 0 01-.168-.55 1 1 0 01.996-.997h3c.348 0 .649.18.828.45l4.996 7.492c.11.148.168.347.168.55zm0 0"></path></g></svg></button></div></div></aside><main class="docMainContainer_EfwR"><div class="container padding-top--md padding-bottom--lg"><div class="row"><div class="col docItemCol_n6xZ"><div class="docItemContainer_RhpI"><article><nav class="theme-doc-breadcrumbs breadcrumbsContainer_Wvrh" aria-label="パンくずリスト"><ul class="breadcrumbs" itemscope="" itemtype="https://schema.org/BreadcrumbList"><li class="breadcrumbs__item"><a aria-label="ホーム画面" class="breadcrumbs__link" href="/ja/"><svg viewBox="0 0 24 24" class="breadcrumbHomeIcon_uaSn"><path d="M10 19v-5h4v5c0 .55.45 1 1 1h3c.55 0 1-.45 1-1v-7h1.7c.46 0 .68-.57.33-.87L12.67 3.6c-.38-.34-.96-.34-1.34 0l-8.36 7.53c-.34.3-.13.87.33.87H5v7c0 .55.45 1 1 1h3c.55 0 1-.45 1-1z" fill="currentColor"></path></svg></a></li><li itemscope=""  itemtype="https://schema.org/ListItem" class="breadcrumbs__item"><a class="breadcrumbs__link"  href="/ja/docs/category/guides"><span >ガイド</span></a><meta  content="1"></li><li itemscope=""  itemtype="https://schema.org/ListItem" class="breadcrumbs__item"><a class="breadcrumbs__link"  href="/ja/docs/markdown-features"><span >マークダウンの機能</span></a><meta  content="2"></li><li itemscope=""  itemtype="https://schema.org/ListItem" class="breadcrumbs__item breadcrumbs__item--active"><span class="breadcrumbs__link" >見出しと目次</span><meta  content="3"></li></ul></nav><div class="tocCollapsible_BEWm theme-doc-toc-mobile tocMobile_NSfz"><button type="button" class="clean-btn tocCollapsibleButton_IbtT">このページ</button></div><div class="theme-doc-markdown markdown"><h1>見出しと目次</h1>
...itle="Example subsubsection 3 b I への直[[NULL]]リンク">​</a></h4>

🇰🇷

...를 사용하는 경우에는 각 ID가 각 페이지에서 정확하게 한 번만 표[[NULL]]시되는지 확인하세요. 그렇지 않으면 같은 ID를 가진 두 개의 DOM 요소가 존재하게 됩니다. 이는 잘못된 HTML이며 제목과 적절하게 연결할 수 없게 됩니다.</p></div></div>
...iv class="admonitionContent_Knsx"><p>[[NULL]][[NULL]]아래는 현재 페이지에서 더 많은 목차 항목을 사용할 수 있는 더미 콘텐츠입니다.</p></div></div>

Note

  • Other pages are likely to be affected.
  • The same pages in latin languages are not affected.

Your environment

First found private document site written in Japanese:

  • Public source code: N/A
  • Public site URL: N/A
  • Docusaurus version used: 3.1.1
  • Environment name and version (e.g. Chrome 89, Node.js 16.4): Node 20 (latest LTS)
  • Operating system and version (e.g. Ubuntu 20.04.2 LTS): Ubuntu (GitHub Actions)

The above commands are run in Ubuntu 22.04 on WSL on Windows 11.

Self-service

  • I'd be willing to fix this bug myself.

Have you checked if it's an MDX issue? Hard to believe Docusaurus has anything to do here. I can also test later.

I will check other CJK sites built with other software (e.g. Astro & Nextra).

When I'm debugging this, I usually isolate an MDX compiler with the same setup as Docusaurus, and invoke it programmatically.

None of Astro & Nextra sites seem to be affected.

Rspress, which also uses MDX (maybe uses mdxjs-rs or markdown-rs instead), is not affected.

However, The document of Ant Design is affected. (They do not use Docusaurus or MDX but only remark.

Also, the demo of @easyops-cn/docusaurus-search-local is affected only when the UI language is Chinese despite the fact that the page content is the same one written in Chinese. This is strange and interesting.

Hey

To be honest I'm not super familiar with any of those concepts and won't have the bandwidth to investigate much 😅

I was just wondering, couldn't this be a Crowdin translation issue?

I'm not super skilled in rg and perl, can you tell me if you see anything weird in these input MD files?

zh-CN.zip

can you tell me if you see anything weird in these input MD files?

No NULL characters are found in html, md, mdx, json, or css files in your ZIP archive.

I was just wondering, couldn't this be a Crowdin translation issue?

I found this issue in my (our) site where i18n is not applied, so I am convinced that Crowdin is not concerned with it.

Thanks for investigating.

Also worth giving a try to use this env variable on your site when building: process.env.SKIP_HTML_MINIFICATION === 'true'

Neither of $env:SKIP_HTML_MINIFICATION = "true" (I am using PowerShell) nor --no-minify helped.
MDX part was still minified.
Also, changing the locale from "ja" to "en" did not, either.

https://typescriptbook.jp/ (https://github.com/yytypescript/book)

This site uses Docusaurus 2.4.1, and NULL chars are not found there.

I will check this afternoon. There's a chance that there's something environment specific.

I found both Docusaurus and Ant Design website have div whose class has markdown.
However, none of Nextra, Rspress, or Astro have.

And looks like https://ant.design/docs/blog/line-ellipsis-cn doesn't contain NULL now.

I found the top page of the Docusaurus homepage in some languages has NULL: