bottom-software-foundation / spec

A spec for the bottom encoding format.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Official Bottom specification

v0.2.0

Bottom is a lightweight encoding format used by Discord and Tumblr users from all around the world. This document aims to detail the Bottom specification officially, so that implementing it correctly is as easy as possible.

Character table

Each character in Bottom holds a purpose of some sort. These are detailed here for your convenience, and will be referred to in depth below.

Value characters

Unicode escape(s) Character Value
U+1FAC2 πŸ«‚ Integer 200
U+1F496 πŸ’– Integer 50
U+2728 ✨ Integer 10
U+1F97A πŸ₯Ί Integer 5
U+002C , Integer 1
U+2764, U+FE0F ❀️ Integer 0

Special characters

Unicode escape(s) Character Purpose
U+1F449, U+1F448 πŸ‘‰πŸ‘ˆ Byte terminator

Notes on encoding

  • The input stream must be valid UTF-8 encoded text. Encoding invalid UTF-8 is illegal.
  • The output stream will be a sequence of groups of value characters (see table above) with each group terminated by the byte terminator character, i.e
    πŸ’–βœ¨βœ¨βœ¨πŸ‘‰πŸ‘ˆπŸ’–πŸ’–πŸ₯Ί,,,πŸ‘‰πŸ‘ˆπŸ’–πŸ’–,πŸ‘‰πŸ‘ˆπŸ’–βœ¨βœ¨βœ¨βœ¨πŸ₯Ί,,πŸ‘‰πŸ‘ˆπŸ’–πŸ’–βœ¨πŸ₯ΊπŸ‘‰πŸ‘ˆπŸ’–πŸ’–,πŸ‘‰πŸ‘ˆπŸ’–βœ¨,,,πŸ‘‰πŸ‘ˆ
    
  • The total numerical value of each group must equal the decimal value of the corresponding input byte.
    • For example, the numerical value of πŸ’–πŸ’–,,,,, as according to the character table above, is 50 + 50 + 1 + 1 + 1 + 1, or 104. This sequence would thus represent U+0068 or h, which has a decimal value of 104.
    • Note the ordering of characters within groups. Groups of value characters must be in descending order. While character order (within groups) technically does not affect the output in any way, arbitrary ordering can encroach significantly on decoding speed and is considered both illegal and bad form.
  • The encoding can be represented succintly in EBNF:
    bottom -> values (BYTE_TERMINATOR values)* BYTE_TERMINATOR
    values -> value_character+ | null_value
    value_character -> πŸ«‚ | πŸ’– | ✨ | πŸ₯Ί | ,
    null_value -> ❀️
    BYTE_TERMINATOR -> πŸ‘‰πŸ‘ˆ
    
    Note that EBNF fails to capture any notion of semantic validity, i.e character ordering. It's technically possible to encode character ordering rules into the grammar, but that is not shown here for the sake of brevity and simplicity.
  • Byte terminators that do not follow a group of value characters are illegal, i.e πŸ’–πŸ’–,,,,πŸ‘‰πŸ‘ˆπŸ‘‰πŸ‘ˆ or πŸ‘‰πŸ‘ˆπŸ’–πŸ’–,,,,πŸ‘‰πŸ‘ˆ. As such, πŸ‘‰πŸ‘ˆ alone is illegal.
  • Groups of value characters must be followed by a byte terminator. πŸ’–πŸ’–,,,, alone is illegal, but πŸ’–πŸ’–,,,,πŸ‘‰πŸ‘ˆ is valid.
  • The null value must be followed by a byte terminator. πŸ’–πŸ’–,,,,πŸ‘‰πŸ‘ˆβ€οΈπŸ‘‰πŸ‘ˆπŸ’–πŸ’–,,,,πŸ‘‰πŸ‘ˆ and πŸ’–πŸ’–,,,,πŸ‘‰πŸ‘ˆβ€οΈπŸ‘‰πŸ‘ˆ are valid, but πŸ’–πŸ’–,,,,πŸ‘‰πŸ‘ˆβ€οΈ alone is illegal.

Notes on decoding

  • Decoding is quite simple and there aren't many special considerations to be made. If you find it difficult, consider reading the source of one of the existing Bottom decoders.
    • If speed is a priority, you may want to generate a hashmap (or similar) mapping each possible encoded byte to its decoded form. This drastically improves the decode speed of correctly encoded text.

Example encoding implementation

For each byte b of the input stream:

  • Let v be the decimal value of b.
  • Let o be a buffer of Unicode scalar values.
  • If v is zero, encode this byte as ❀️ (U+2764, U+FE0F)
  • If v is non-zero, repeat the below until v is zero:
    • Find the largest value character (see table above) where the relationship v >= character_value is satisfied. Let this be character_value.
    • Push the Unicode scalar values corresponding to character_value to o.
    • Subtract character_value from v.
  • Push the Unicode scalar values representing the byte terminator to o.

An implementation can thus be expressed as the following pseudo-code:

let o = new string
for b in input_stream:
    let v = b as number

    if v is 0:
        o.append("❀️")
    else:
        loop:
            if v >= 200:
                o.append("πŸ«‚")
                v = v - 200
            else if v >= 50:
                o.append("πŸ’–")
                v = v - 50
            else if v >= 10:
                o.append("✨")
                v = v - 10
            else if v >= 5:
                o.append("πŸ₯Ί")
                v = v - 5
            else if v >= 1:
                o.append(",")
                v = v - 1
            else:
                break

    o.append("πŸ‘‰πŸ‘ˆ")

return o

About

A spec for the bottom encoding format.

License:MIT License