danfickle / openhtmltopdf

An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)!

Home Page:https://danfickle.github.io/pdf-templates/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RTL alignment issue & overlapping of text when generating Arabic using justified text alignment

lzhy1101 opened this issue · comments

Hi,

Please help on the two issues I encountered when I tried to generate Arabic documents using justified text alignment.

  1. the last word at the first punctuation always overlapped with the first word after that punctuation. Please refer to the circle labelled as "1" in the picture file "arabic generation error.jpg".
  2. Even though I was using RTL direction and the direction of words was right, the alignment for words that did not fill in the entire line was wrong. Please refer to the underlined words labelled as "2" in the picture file "arabic generation error.jpg"
    arabic generation error

My preferred outcome is in the picture file "correct arabic text generation.jpg".
correct arabic text generation

My code for PDF generation is:
FileOutputStream fos = new FileOutputStream(file);

	PdfRendererBuilder builder = new PdfRendererBuilder();
	builder.useUnicodeBidiSplitter(new ICUBidiSplitter.ICUBidiSplitterFactory());
	builder.useUnicodeBidiReorderer(new ICUBidiReorderer());
    builder.useSVGDrawer(new BatikSVGDrawer());
    builder.useUnicodeBidiSplitter(new ICUBidiSplitter.ICUBidiSplitterFactory());
    builder.useUnicodeBidiReorderer(new ICUBidiReorderer());
	if (messageLocale != null && messageLocale.isRtl()) {
		builder.defaultTextDirection(TextDirection.RTL);
	} else {
		builder.defaultTextDirection(TextDirection.LTR);
	}

	Document htmlDoc = new W3CDom().fromJsoup(Jsoup.parse(new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8)), StandardCharsets.UTF_8.name(), "/"));
	PdfBoxRenderer pdfBoxRenderer = builder.withW3cDocument(htmlDoc, null).buildPdfRenderer();
	pdfBoxRenderer.layout();
	pdfBoxRenderer.createPDF(fos, false);
	PDDocument doc = pdfBoxRenderer.getPdfDocument();
	doc.save(fos);
	doc.close();
	pdfBoxRenderer.close();
	fos.close();

My test content is


<title></title> <style> a,b,body,caption,center,code,dd,div,dl,dt,form,h1,h2,h3,h4,h5,h6, html,i,img,label,legend,li,ol,p,pre,s,small,span,strike,strong,sup, table,tbody,td,tfoot,th,thead,tr,ul{ margin:0; padding:0; border:0; font:inherit; vertical-align:baseline; } p,table,ol,ul{ margin-bottom:20px } @font-face{ font-family: 'Arial Unicode MS'; src: url(file:///c:/chrysalis/fonts/Arial-Unicode-Regular.ttf); font-weight: normal; font-style: normal; } @font-face{ font-family: 'Arial Unicode MS'; src: url(file:///c:/chrysalis/fonts/Arial-Unicode-Bold.ttf); font-weight: bold; font-style: normal; } @font-face{ font-family: 'Arial Unicode MS'; src: url(file:///c:/chrysalis/fonts/Arial-Unicode-Bold-Italic.ttf); font-weight: bold; font-style: italic; } @font-face{ font-family: 'Arial Unicode MS'; src: url(file:///c:/chrysalis/fonts/Arial-Unicode-Italic.ttf); font-weight: normal; font-style: italic; } body{ line-height:22px; font-size:16px; font-family: Arial Unicode MS; } .rtl{ width: 100%; text-align: justify; direction: rtl } </style>

عندما يريد العالم أن ‪يتكلّم ‬ ، فهو يتحدّث بلغة يونيكود. تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود (Unicode Conference)، الذي سيعقد في 10-12 آذار 1997 بمدينة مَايِنْتْس، ألمانيا. و سيجمع المؤتمر بين خبراء من كافة قطاعات الصناعة على الشبكة العالمية انترنيت ويونيكود، حيث ستتم، على الصعيدين الدولي والمحلي على حد سواء مناقشة سبل استخدام يونكود في النظم القائمة وفيما يخص التطبيقات الحاسوبية، الخطوط، تصميم النصوص والحوسبة متعددة اللغات.

Please help me. Please feel free to tell me any parts that I missed.

Please try builder.useFastMode

Please try builder.useFastMode

Hi, I have tried, but I am still getting the same result.

Thanks @lzhy1101, aiming for a release on Sunday. If you are feeling motivated you could download and build and confirm the fix works on your system.

@danfickle , pulled and tested. It works now. Thank you very much!