brechtm / rinohtype

The Python document processor

Home Page:http://www.mos6581.org/rinohtype

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The first word on a line is never hyphenated

jwhitham opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

PDF produced by rinohtype

target.pdf

On page 5 there are two long lines. The first line consists of a single long word ("/Example/Demo/Example...") which overflows the right-hand side of the page.

The second line consists of a short word and the same long word (i.e. "word" then "/Example/Demo/Example..."). In this case, the second word is split onto two lines with a hyphen.

Expected behavior: both of these long words ought to be split onto multiple lines with hyphens

Actual behavior: if the long word is the first word in a line, then a long word is not hyphenated.

I think that this is the same issue reported in #188 . Adding zero-width spaces to the text will avoid the problem (though it introduces another problem, see #415 ). However, as the long word can be hyphenated, it would be better if it could just be hyphenated - regardless of whether it is the first word, second word, or any other word.

The problem also occurs if all or part of a long word becomes the first word in a line as a result of earlier overflows. The last line on page 5 has this problem: notice that the word spills into the right margin. It ought to be hyphenated again, splitting over three lines, but it is not.

The problem can happen to short words too. When a very long word is in the second column of a table, the first column may be "squeezed", becoming so narrow that even a relatively short word needs to be hyphenated. However, if that word is the first word on the line, it can't be hyphenated, so it overflows into the second column.

I think

is possibly the place which introduces different behavior for the first word in a line. Is there any way that such a word could be hyphenated?

Source files

no-hyphenation-for-first-word.zip

The bug can be reproduced by running "demo.bat".

Versions

c:\doctools\.venv\lib\site-packages\rinoh\resource.py:44: UserWarning: The stylesheet 'sphinx' is also provided by:
* rinohtype
Using the one from 'rinohtype'
  warn("The {} '{}' is also provided by:\n".format(cls.resource_type,
rinohtype 0.5.4 (2022-06-17)
Sphinx 7.0.1
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.19041-SP0

Thanks for the detailed bug reports, @jwhitham.

Unfortunately, nowadays I'm unlikely to spend much free time on rinohtype. Since you are using rinohtype in a commercial setting, and assuming it is providing value to your company, there are some options for getting these issues fixed in a timely manner:

  • you provide a pull request which fixes an issue, which I will happily review and merge to the main branch.
  • your employer can hire me to fix these issues, so I can justify spending time on this (at the cost of time spent on my current ongoing consultance project). I can provide you with fixed price quote for each bug-fix or feature implementation.

Both options would help to keep the project sustainable.

Thanks. I'm grateful for your support. I have created a simple pull request for issue 415.

For this issue, I did write a possible fix, but I'm not happy with the code quality, and I think for now I'd rather deal with this problem using the workaround of inserting zero-width spaces, which seems to work fairly well for the documents I have done so far.

I will ask about the possibility of sponsoring your project within the company.

The current master branch will now automatically split "words" at slashes and it also fixes hyphenation of the first word on a line. See for example hyphenation.pdf.

There would be benefit in handling splitting separately for paths, URLs and regular text, but that requires semantic information.

  • URLs are detected by docutils, so that would be fairly easy to implement (and e.g. add splitting on dots).
  • For paths, reStructuredText doesn't seem to provide a custom role, nor is it practical to identify them. For example: dir/Makefile versus left/right. These should be split as dir _ /Makefile and left/ _ right, respectively. Sphinx does offer the :file: role, however.