mgeisler / textwrap

An efficient and powerful Rust library for word wrapping text.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow URLs to not be split

c3potheds opened this issue · comments

While #384 is marked as resolved, that issue explicitly mentioned not splitting URLs as a motivation for the request and that use case is still not satisfied.

In this example, even with a break_words(false) specified into the Options, the URL is still split across lines. If printed to a terminal, the newline character interrupts any system URL recognition and makes it impossible to shift-click to open the link.

    assert_eq!(
        textwrap::fill(
            "http://example.com/long/url/that/should/not/be/split",
            textwrap::Options::new(40).break_words(false)
        ),
        "http://example.com/long/url/that/should/not/be/split"
    );

This test fails with this output:

assertion `left == right` failed
  left: "http://example.com/long/url/that/should/\nnot/be/split"
 right: "http://example.com/long/url/that/should/not/be/split"

I would like a way to force textwrap to not insert the newline character in a URL even if it would wrap to the next line. In addition, if the URL could fit fully on the next line, it should not be split up:

    assert_eq!(
        textwrap::fill(
            "I wonder what's for dinner http://example.com/long/url/that/should/not/be/split",
            textwrap::Options::new(60).initial_indent("").break_words(false)
        ),
        "I wonder what's for dinner\nhttp://example.com/long/url/that/should/not/be/split"
    );
assertion `left == right` failed
  left: "I wonder what's for dinner http://example.com/long/url/that/\nshould/not/be/split"
 right: "I wonder what's for dinner\nhttp://example.com/long/url/that/should/not/be/split"

It's not clear if this is because / is interpreted as a character that is allowed to be split.

The NoHyphenation WordSplitter option doesn't change anything:

    assert_eq!(
        textwrap::fill(
            "I wonder what's for dinner http://example.com/long/url/that/should/not/be/split",
            textwrap::Options::new(60)
                .initial_indent("")
                .word_splitter(textwrap::WordSplitter::NoHyphenation)
                .break_words(false)
        ),
        "I wonder what's for dinner\nhttp://example.com/long/url/that/should/not/be/split"
    );
assertion `left == right` failed
  left: "I wonder what's for dinner http://example.com/long/url/that/\nshould/not/be/split"
 right: "I wonder what's for dinner\nhttp://example.com/long/url/that/should/not/be/split"

I tried a couple of tricks with WordSplitter and WordSeparator but so far to no avail.

Hi @c3potheds! Thanks for writing, I very much agree with you that this is confusing and could be made better...

In short, the option you're looking for is WordSeparator::AsciiSpace. See this playground:

    assert_eq!(
        textwrap::fill(
            "I wonder what's for dinner http://example.com/long/url/that/should/not/be/split",
            textwrap::Options::new(60)
                .break_words(false)
                .word_separator(textwrap::WordSeparator::AsciiSpace)
        ),
        "I wonder what's for dinner\nhttp://example.com/long/url/that/should/not/be/split"
    );

The problem is that the default UnicodeBreakProperties variant is too smart in this case: it sees / as a valid place to break a line. This is great when dealing with text like "你好" or "😂😍", but in this case it's not useful.

A first step to improve this could be to update the documentation to call this out. A second step could perhaps be to detect suppress this break point with a new option — I'm not 100% sure what to do here.