Ruby rb tags ignored by extractHtml.js

Question

Ruby rb tags ignored by extractHtml.js

MichaelPetre opened this issue 3 years ago · comments

If you try to convert a Japanese webpage containing ruby tags, the rb tags are ignored by the parser.

<ruby><rb>私</rb><rp>（</rp><rt>わたくし</rt><rp>）</rp></ruby>
gets saved as
<ruby class="MG357"><rt class="WF360">わたくし</rt></ruby>
As a result, you have the ruby furigana but are missing the kanji in the epub file.
Expected output:
私（わたくし）
Real output:
わたくし

This is caused by line 15 of extractHtml.js:
'dfn', 'em', 'i', 'img', 'kbd', 'mark', 'q', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span',

Adding the rb tag solves the issue:
'dfn', 'em', 'i', 'img', 'kbd', 'mark', 'q', 'rb', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span',

MichaelPetre · Answer 1 · Wed Nov 17 2021 15:01:28 GMT+0800 (China Standard Time)

Fixed in pull request #56