eevee / yelp_bytes

Utilities for dealing with byte strings, invented and maintained by Yelp.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

yelp_bytes

Build Status

yelp_bytes contains several utility functions to help ensure that the data you're using is always either Unicode or byte strings, taking care of the edge cases for you so that you don't have to worry about them. We handle ambiguous bytestrings by leveraging our our "internet" encoding. This allows you to write functions that need unicode, but can accept arbitrary values, without crashing.

Installation

For a primer on pip and virtualenv, see the Python Packaging User Guide.

TL;DR: pip install yelp_bytes

Usage

The from_bytes function is the most interesting one. It takes an object and returns its unicode representation. This function never fails, except for extremely rare edge cases (that we haven't ourselves encountered). from_utf8 is similar, but uses 'UTF-8' rather than 'internet' encoding, and so will fail if given poorly-encoded bytes. to_bytes and to_utf8 both take an object and return its UTF-8 bytestring representation.

python
>>> import yelp_bytes

>>> euro = u'€'

>>> print yelp_bytes.from_bytes(euro.encode('UTF-8'))
€
>>> print yelp_bytes.from_bytes(euro.encode('cp1252'))
€
>>> print yelp_bytes.from_bytes(euro)
€

We also handle objects with (certain common classes of) encoding issues, and all the other various edge cases we've encountered.

python
>>> error = AssertionError(euro)
>>> print error
Traceback (most recent call last):
    ...
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

>>> print yelp_bytes.from_utf8(error)
€
>>> yelp_bytes.to_utf8(error) == euro.encode('UTF-8')
True

Check out the source to learn more about the input parameters and return values.

About

Utilities for dealing with byte strings, invented and maintained by Yelp.

License:The Unlicense


Languages

Language:Python 93.3%Language:Makefile 6.7%