Number parsing builtins

Question

Number parsing builtins

LightAndLight opened this issue a year ago · comments

I often find myself converting strings to numbers.

# Parses strings that match: `-?(0b)?[01]+`
int.parseBin : String -> (| Some : Int, None : () |)

# Parses strings that match: `-?(0o)?[0-7]+`
int.parseOct : String -> (| Some : Int, None : () |)

# Parses strings that match: `-?[0-9]+`
int.parseDec : String -> (| Some : Int, None : () |)

# Parses strings that match: `-?(0x)?[0-9a-fA-F]+`
int.parseHex : String -> (| Some : Int, None : () |)

Out of scope (see #413) (kept here for history)

And from #393 (comment):

# int.printBin 5 == "101"
# int.printBin -5 == "-101"
int.printBin : Int -> String

# int.printOct 10 == "12"
# int.printOct -10 == "-12"
int.printOct : Int -> String

# int.printDec 15 == "15"
# int.printDec -15 == "-15"
int.printDec : Int -> String

# int.printDec 30 == "1e"
# int.printDec -30 == "-1e"
int.printHex : Int -> String

And some base-generic functions:

int.binary : Array Char
int.octal : Array Char
int.decimal : Array Char
int.hexLower : Array Char
int.hexUpper : Array Char

int.printBase : Array Char -> Int -> String
int.printBase base = int.printBasePrefixed base ""

int.printBasePrefixed : Array Char -> String -> Int -> String
int.printBasePrefixed base prefix n =
  case int.toBase base n of
    { negative, value } -> "${if negative then "-" else ""}$prefix$value"

int.toBase : Array Char -> Int -> { negative : Bool, value : String }

Isaac Elliott · Answer 1 · Tue May 23 2023 08:27:43 GMT+0800 (China Standard Time)

Should all the parsing functions allow an optional leading - for negative numbers?

Isaac Elliott · Answer 2 · Tue May 23 2023 08:31:47 GMT+0800 (China Standard Time)

Should all the parsing functions allow an optional leading - for negative numbers?

I think having corresponding printing functions would ground this question. If printHex : Int -> String can print negative numbers (i.e. -0xFFF), then parseHex should successfully parse negative numbers.

Isaac Elliott · Answer 3 · Tue May 23 2023 08:41:34 GMT+0800 (China Standard Time)

I'm a bit skeptical about the parsers accepting an optional leading prefix (0b, 0o, 0x). These aren't actually a feature of the counting systems; they're syntactic conventions. Would it be too much hassle for the user to strip these prefixes before calling the parsing function?

Isaac Elliott · Answer 4 · Tue May 23 2023 09:03:41 GMT+0800 (China Standard Time)

What do other languages do?

Ruby

Parsing

String#to_i

Defaults to base 10. User can provide a base between 2 and 36 (inclusive). Base 0 means "infer base using prefix" like 0x, 0o, and 0b. Otherwise, base prefixes are allowed when then they match the specified base (i.e. "0b101".to_i 2` succeeds). Parses negative numbers.

Printing

Integer#to_s

Defaults to base 10. User can provide a base between 2 and 36 (inclusive). Does not include a base prefix.

Python

Parsing

int

Defaults to base 10. User can provide a base between 2 and 36 (inclusive). Base 0 means "infer base using prefix" like 0x, 0o, and 0b. Otherwise, base prefixes are allowed when then they match the specified base (i.e. int("0b101", 2)` succeeds). Parses negative numbers.

Printing

bin - Print an integer as a binary number with a leading 0b.
oct - Print an integer as an octal number with a leading 0o.
str - Print an integer as a decimal number.
hex - Print an integer as an octal number with a leading 0x.

Isaac Elliott · Answer 5 · Tue May 23 2023 09:26:58 GMT+0800 (China Standard Time)

I don't know whether or not it would be more convenient to have the printX functions add the base prefix. I'll leave it off, and let users add it with int.printBasePrefixed base basePrefix. Later on, if I find that most use cases required the base prefix, then I could flip it around: printX adds the base prefix and int.printBase base prints without it.

Isaac Elliott · Answer 6 · Tue May 23 2023 09:29:49 GMT+0800 (China Standard Time)

I'm a bit skeptical about the parsers accepting an optional leading prefix (0b, 0o, 0x). These aren't actually a feature of the counting systems; they're syntactic conventions. Would it be too much hassle for the user to strip these prefixes before calling the parsing function?

I'll make the parsing functions accept these prefixes for "least surprise" reasons. Being a little more liberal with the string inputs (where it doesn't sacrifice correctness) seems like a good thing.

Isaac Elliott · Answer 7 · Thu May 25 2023 09:33:08 GMT+0800 (China Standard Time)

In the interest of keeping this issue small, I've factored out the printing functions into #413. Number parsing is much more useful to me right now.

Isaac Elliott · Answer 8 · Wed May 31 2023 10:11:00 GMT+0800 (China Standard Time)

I've also left out the panicking versions:

int.parseHex! : String -> Int
int.parseDec! : String -> Int
int.parseOct! : String -> Int
int.parseBin! : String -> Int