jbaber / usv

Unicode Separated Values (USV) data markup for units, records, groups, files, streaming, and more.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unicode Separated Values (USV)

Unicode Separated Values (USV) is a data format that uses Unicode characters to mark parts.

USV builds on ASCII separated values (ASV) and contrasts with comma separated values (CSV).

USV offers pragmatic ways to edit data in text editors by using visual symbols and line breaks.

USV has capabilities for spreadsheet folios and sheets, databases schemas and tables, and more.

FAQRFCCodeComparisonsCriticismsTODOXKCD

USV characters

Separators:

  • Unit Separator (US) is U+001F or U+241F ␟

  • Record Separator (RS) is U+001E or U+241E ␞

  • Group Separator (GS) is U+001D or U+241D ␝

  • File Separator (FS) is U+001C or U+241C ␜

Modifiers:

  • Escape (ESC) is U+001B or U+241B ␛

  • End of Transmission (EOT) is U+0004 or U+2404 ␄

Liners:

  • Carriage Return (CR) is U+000D

  • Line Feed (LF) is U+000A

Hello World

The USV unit "hello" and USV unit "world":

hello␟world␟

Liners can prettify the visual display layout:

hello␟
world␟

Parsing can use libraries such as the USV Rust crate:

use usv::*;
let input = "hello␟world␟";
let units = input.units().collect();

Comparisons to text data formats

Capability USV ASV CSV
Units / Cells / Fields
Records / Lines / Rows
Groups / Sheets / Tables
Files / Folios / Schemas
Visible separators
Separator line spacing 🟡
IETF.org standards-track 🟡
Unicode UTF-8 default

Comparisons to spreadsheets and databases

USV semantics are units, records, groups, files.

Spreadsheet semantics are cells, lines, sheets, folios.

Databases semantics are fields, rows, tables, schemas.

Documentation

Commentary:

Specification:

Character details:

How to:

Context:

Editor notes:

Example files:

Examples

USV with 2 units by 2 records by 2 groups by 2 files:

a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜

Liners can prettify the visual display layout:

a␟b␟␞
c␟d␟␞
␝
e␟f␟␞
g␟h␟␞
␝␜
i␟j␟␞
k␟l␟␞
␝
m␟n␟␞
o␟p␟␞
␝␜

Parsing example with the USV Rust crate and its iterators:

use usv::*;
let input = "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜";
for f in input.files() {
    for g in file.groups() {
        for r in group.records() {
            for u in r.units() {
                println!(&u);
            }
        }
    }
}

Why use USV?

USV can handle data that contains commas, semicolons, quotes, tabs, newlines, and other special characters, all without escaping.

USV can format units/columns/cells and records/rows/lines and groups/tables/grids and files/schemas/folios.

USV aims to be an international standard, and has a official IETF RFCXML Internet Draft submitted.

USV uses Unicode characters that are semantically meaningful.

USV works well with any typical modern editor, font, terminal, shell, search, and language.

USV uses visible letter-width characters, and these are easy to view, select, copy, paste, search.

USV is easy and friendly

USV is intended to be easy to use and friendly to try.

USV works with many kinds of data, and many kinds of editors. Any editor that can render the USV characters will work. We use vim, emacs, helix, Zed, VS Code, JEOTrains IDEs, Nova, TextMate, Sublime, Notepad++, etc.

USV works with many kinds of tools. Any tool that can parse the USV characters will work. We use awk, sed, grep, rg, miller, etc.

USV works with many kinds of languages. Any language that can handle UTF-8 character encoding and rendering should work. We use C, C++, C#, Elixir, Erlang, Go, Java, JavaScript, Julia, Kotlin, Perl, PHP, Python, R, Ruby, Rust, Swift, TypeScript, etc.

Legal protection for standardization

The USV project aims to become a free open source IETF standard and IANA standard, much like the standards for CSV and TDF.

Until the standardization happens, the terms "USV" and "Unicode Separated Values" are trademarks of this project, and this repository is copyright 2022-2024. When IETF and IANA approve the submissions as a standard, then the trademarks and copyrights will go to a free libre open source software advocacy foundation.

Conclusion

USV is helping us with data projects. We hope USV may help you too.

We welcome constructive feedback about USV, as well as git issues, pull requests, and standardization help.

FAQRFCCodeComparisonsCriticismsTODOXKCD

About

Unicode Separated Values (USV) data markup for units, records, groups, files, streaming, and more.


Languages

Language:Shell 50.0%Language:Python 50.0%