4tm4j33tk4ur / base100

base๐Ÿ’ฏ - Encode your data into emoji

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

crates.io

Base๐Ÿ’ฏ

Encode things into Emoji.

Base๐Ÿ’ฏ can represent any byte with a unique emoji symbol, therefore it can represent binary data with zero printable overhead (see caveats for more info).

Usage

$ echo "the quick brown fox jumped over the lazy dog" | base100
๐Ÿ‘ซ๐Ÿ‘Ÿ๐Ÿ‘œ๐Ÿ—๐Ÿ‘จ๐Ÿ‘ฌ๐Ÿ‘ ๐Ÿ‘š๐Ÿ‘ข๐Ÿ—๐Ÿ‘™๐Ÿ‘ฉ๐Ÿ‘ฆ๐Ÿ‘ฎ๐Ÿ‘ฅ๐Ÿ—๐Ÿ‘๐Ÿ‘ฆ๐Ÿ‘ฏ๐Ÿ—๐Ÿ‘ก๐Ÿ‘ฌ๐Ÿ‘ค๐Ÿ‘ง๐Ÿ‘œ๐Ÿ‘›๐Ÿ—๐Ÿ‘ฆ๐Ÿ‘ญ๐Ÿ‘œ๐Ÿ‘ฉ๐Ÿ—๐Ÿ‘ซ๐Ÿ‘Ÿ๐Ÿ‘œ๐Ÿ—๐Ÿ‘ฃ๐Ÿ‘˜๐Ÿ‘ฑ๐Ÿ‘ฐ๐Ÿ—๐Ÿ‘›๐Ÿ‘ฆ๐Ÿ‘ž๐Ÿ

Base๐Ÿ’ฏ will read from stdin unless a file is specified, will write UTF-8 to stdout, and has a similar API to GNU's base64. Data is encoded by default, unless --decode is specified; the --encode flag does nothing and exists solely to accommodate lazy people who don't want to read the docs (like me).

USAGE:
    base100 [FLAGS] [input]

FLAGS:
    -d, --decode     Tells base๐Ÿ’ฏ to decode this data
    -e, --encode     Tells base๐Ÿ’ฏ to encode this data
    -F, --fast       Go twice as fast, but crash on imperfect input (decode only)
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <input>    The input file to use

Caveats

Base๐Ÿ’ฏ is very space inefficient. It bloats the size of your data by around 3x, and should only be used if you have to display encoded binary data in as few printable characters as possible. It is, however, very suitable for human interaction. Encoded hashes and checksums become very easy to verify at a glance, and take up much less space on a terminal.

Performance

$ base100 --version
base๐Ÿ’ฏ 0.2.0

$ base64 --version
base64 (GNU coreutils) 8.28

$ cat /dev/urandom | base100 | pv > /dev/null
 [ 502MiB/s]

$ cat /dev/urandom | base64 | pv > /dev/null
 [ 232MiB/s]

$ cat /dev/urandom | base100 | base100 -dF | pv > /dev/null
 [ 223MiB/s]

$ cat /dev/urandom | base64 | base64 -d | pv > /dev/null
 [ 176MiB/s]

In both scenarios, base๐Ÿ’ฏ compares favorably to GNU base64. It should be noted that base๐Ÿ’ฏ in fast-mode sacrifices all sanity checks and makes zero guarantees about gracefully handling malformed input. base๐Ÿ’ฏ also bloats the size of the output more than base64, making these readings somewhat exaggerated.

Future plans

  • Allow data to be encoded with the full 1024-element emoji set
  • Add further optimizations and ensure we're abusing SIMD as much as possible
  • Make fast mode less fragile while maintaining fastness

About

base๐Ÿ’ฏ - Encode your data into emoji


Languages

Language:Rust 100.0%