Sicos1977 / ChromiumHtmlToPdf

Convert HTML to PDF with a Chromium based browser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for Chinese / Special characters

Harisanthosh opened this issue · comments

I am sure there are many who already use the lib to genreate PDF in Chinese.

Here is my ConverterService

private PageSettings GetPageSettings(string htmlHeader = "", string htmlFooter = "") => new PageSettings(PaperFormat.A4)
{
PrintBackground = true,
PreferCSSPageSize = true,
HeaderTemplate = htmlHeader.Equals("") ? _defaultHeaderFooterString : htmlHeader,
FooterTemplate = htmlFooter.Equals("") ? _defaultHeaderFooterString : htmlFooter,
DisplayHeaderFooter = htmlFooter.Equals("") && htmlHeader.Equals("") ? false: true
};

    private Converter GetConverter()
    {
        var converter = new Converter();
        //converter.CaptureSnapshot = true;
        converter.AddChromiumArgument("--no-sandbox");
        return converter;
    }

    public MemoryStream Convert(string htmlBody, string htmlHeader = "", string htmlFooter = "", OutputFileFormat outputFileFormat = OutputFileFormat.Pdf)
    {
        var stream = new MemoryStream();
        var converter = GetConverter();
        var pageSettings = GetPageSettings(htmlHeader: htmlHeader, htmlFooter: htmlFooter);
        if (outputFileFormat == OutputFileFormat.Pdf)
        {
            converter.ConvertToPdf(htmlBody, stream, pageSettings);
        }
        else if (outputFileFormat == OutputFileFormat.Png)
        {
            converter.ConvertToImage(htmlBody, stream, pageSettings);
        }
        else
            throw new NotImplementedException($"{outputFileFormat} missing.");

        return stream;
    }

I tried setting up custom Font-family in the html template before starting the conversion, however the chinese characters aren't rendered and showcased as shown below. Any suggestions will be much appreciated. Thanks in advance
image

commented

As far as I know HTML is alway just plain ascii and any special chars need to be encoded before these are put inside HTML. So you could try to use the HttpUtility.HtmlEncode("Your Chinese text"); class that is inside .net first before feeding the HTML string into the converter.

Do not use it on the complete HTML string but just on the Chinese parts.

Or try https://html-agility-pack.net/download to see if that can do this for you and then feed what is coming out it into the converter. I don't know if there is any logic inside html agility pack that does this but you can always try it. Is is a very good HTML processing library.

commented

This is probably all you need to do.

var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(<your html string>);
htmlDocument.Save(<your wanted output>);

Please let us know if this solves the issue so that other people with a simular issue in the near future know how to solve it.

Thanks for the quick reply @Sicos1977 , unfortunately loading the document using HTMLAgilityPack didn't do the trick for me. I also tried decoding while loading the HtmlDoc and then used the text loaded to the Converter

htmlDocument.LoadHtml(HttpUtility.HtmlDecode(html_page_source));

Let me try some other workarounds and keep you posted

As suspected, it seems that I haven't added the required fonts in my Docker. Upon adding the fonts along with chromium and chromium-driver, i was able to render the Chinese text in the generated PDF. Thanks @Sicos1977 for the support

RUN apk add chromium chromium-chromedriver chromium-lang

RUN apk add wqy-zenhei --update-cache --repository https://nl.alpinelinux.org/alpine/edge/testing
commented

Good to hear that you fixed it and that edge is working for you in a linux container. As far as I knew edge is still in a beta version on Linux?