xolox / python-humanfriendly

Human friendly input/output for text interfaces using Python

Home Page:https://humanfriendly.readthedocs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sizes incorrectly parsed as powers of two by default, ignoring IEEE 1541

anarcat opened this issue · comments

i believe this is incorrect:

    >>> parse_size('42')
    42
    >>> parse_size('1 KB')
    1024
    >>> parse_size('5 kilobyte')
    5120

It should rather be:

    >>> parse_size('1 KiB')
    1024
    >>> parse_size('5 kibibyte')
    5120
    >>> parse_size('1.5 GiB')
    1610612736
    >>> parse_size('1 KB')
    1000
    >>> parse_size('5 kilobyte')
    5000
    >>> parse_size('1.5 GB')
    1000000000

i know it will complicate parsing because you can't assume that the first letter is enough to tell them apart, but maybe check the two first letters? :)

see https://en.wikipedia.org/wiki/Kibibyte for more details.

Hi @anarcat and thanks for the feedback.

You are technically 100% correct and to be honest I'm not worried about complicating the parsing, however changing this now would break backwards compatibility and I'm quite religious about that :-). There's also the question of doing what is technically correct versus the DWIM aspect.

The least I can do is clearly document the caveat you've pointed out, but I won't change the implementation as suggested. What I can do is provide a second implementation that provides the parsing you suggest (it could be a keyword argument that has to be given to the parse_size() function to switch it away from backwards compatible mode).

maybe having a parse_disk_size and parse_memory_size? or parse_size_si and parse_size_eic? with parse_size being an alias to parse_size_eic...

I have to say I was discussing with another developer about using this module for our application but we were astonished to find it that it is defaulting to base 2 yet using the SI base 10 prefix, making it unusable. Your coding standards and documentation seems quite meticulous but this is quite a glaring issue regarding unit prefix standards and continuing the confusion for end users.

@xolox: I'm totally agreeing on providing some sort of backwards compatibility. Let's keep in mind, that even Windows does it wrong. Providing a second function to get true binary sizes would be great.

On 2016-05-04 16:53:24, Manuel Leonhardt wrote:

@xolox: I'm totally agreeing on providing some sort of backwards compatibility. Let's keep in mind, that even Windows does it wrong. Providing a second function to get true binary sizes would be great.

"Windows does it wrong" should never be an argument against doing the
right thing.

Otherwise everything would always be wrong and we would see no
progress.

Being cynical is the only way to deal with modern civilization — you
can't just swallow it whole.
- Frank Zappa

@anarcat: I'm totally agreeing. If it's wrong, it's wrong, period. Maybe I should have mentioned that I'm also in favour of a correct SI-formatting. I was just replying on @xolox point on DWIM.

Any chance of this happening? It should be possible to maintain backwards compatibility of the interface with something like this, shouldn't it?

>>> parse_size('1 KB')
1024
>>> parse_size('1 KB', correctly=True)
1000

I created a pull request making it clearer in the documentation which units are used: #9

Hi everyone, sorry for taking so long to respond but thanks to everyone who chimed in! Reading through #4, #8 and #9 have convinced me to change format_size() and parse_size() to default to powers of ten instead of powers of two.

The logic in format_size() is almost as simple as it was before, but I've enhanced parse_size() to recognize symbols and names like KB, KiB, kilobyte and kibibyte. Both functions can revert to the old behavior by passing the keyword argument binary=True.

Because these changes are backwards incompatible I bumped the major version number to 2.0. On the other hand I'm not sure what would actually break or how fatal that breakage would be, and the discussion here definitely convinced me that the default needed to change :-).

I hope the new implementation satisfies everyone's concerns! Feedback is welcome.

thank you so much for doing the right thing! i haven't reviewed the actual implementation in details, but the logic is what i believe is the correct way.

thanks again!