Support for Chinese / Special characters
Harisanthosh opened this issue · comments
I am sure there are many who already use the lib to genreate PDF in Chinese.
Here is my ConverterService
private PageSettings GetPageSettings(string htmlHeader = "", string htmlFooter = "") => new PageSettings(PaperFormat.A4)
{
PrintBackground = true,
PreferCSSPageSize = true,
HeaderTemplate = htmlHeader.Equals("") ? _defaultHeaderFooterString : htmlHeader,
FooterTemplate = htmlFooter.Equals("") ? _defaultHeaderFooterString : htmlFooter,
DisplayHeaderFooter = htmlFooter.Equals("") && htmlHeader.Equals("") ? false: true
};
private Converter GetConverter()
{
var converter = new Converter();
//converter.CaptureSnapshot = true;
converter.AddChromiumArgument("--no-sandbox");
return converter;
}
public MemoryStream Convert(string htmlBody, string htmlHeader = "", string htmlFooter = "", OutputFileFormat outputFileFormat = OutputFileFormat.Pdf)
{
var stream = new MemoryStream();
var converter = GetConverter();
var pageSettings = GetPageSettings(htmlHeader: htmlHeader, htmlFooter: htmlFooter);
if (outputFileFormat == OutputFileFormat.Pdf)
{
converter.ConvertToPdf(htmlBody, stream, pageSettings);
}
else if (outputFileFormat == OutputFileFormat.Png)
{
converter.ConvertToImage(htmlBody, stream, pageSettings);
}
else
throw new NotImplementedException($"{outputFileFormat} missing.");
return stream;
}
I tried setting up custom Font-family in the html template before starting the conversion, however the chinese characters aren't rendered and showcased as shown below. Any suggestions will be much appreciated. Thanks in advance
As far as I know HTML is alway just plain ascii and any special chars need to be encoded before these are put inside HTML. So you could try to use the HttpUtility.HtmlEncode("Your Chinese text"); class that is inside .net first before feeding the HTML string into the converter.
Do not use it on the complete HTML string but just on the Chinese parts.
Or try https://html-agility-pack.net/download to see if that can do this for you and then feed what is coming out it into the converter. I don't know if there is any logic inside html agility pack that does this but you can always try it. Is is a very good HTML processing library.
This is probably all you need to do.
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(<your html string>);
htmlDocument.Save(<your wanted output>);
Please let us know if this solves the issue so that other people with a simular issue in the near future know how to solve it.
Thanks for the quick reply @Sicos1977 , unfortunately loading the document using HTMLAgilityPack didn't do the trick for me. I also tried decoding while loading the HtmlDoc and then used the text loaded to the Converter
htmlDocument.LoadHtml(HttpUtility.HtmlDecode(html_page_source));
Let me try some other workarounds and keep you posted
As suspected, it seems that I haven't added the required fonts in my Docker. Upon adding the fonts along with chromium and chromium-driver, i was able to render the Chinese text in the generated PDF. Thanks @Sicos1977 for the support
RUN apk add chromium chromium-chromedriver chromium-lang
RUN apk add wqy-zenhei --update-cache --repository https://nl.alpinelinux.org/alpine/edge/testing
Good to hear that you fixed it and that edge is working for you in a linux container. As far as I knew edge is still in a beta version on Linux?